scispace - formally typeset
Search or ask a question

Showing papers on "Probability mass function published in 2020"


Proceedings Article
30 Apr 2020
TL;DR: By sampling text from the dynamic nucleus of the probability distribution, which allows for diversity while effectively truncating the less reliable tail of the distribution, the resulting text better demonstrates the quality of human text, yielding enhanced diversity without sacrificing fluency and coherence.
Abstract: Despite considerable advances in neural language modeling, it remains an open question what the best decoding strategy is for text generation from a language model (e.g. to generate a story). The counter-intuitive empirical observation is that even though the use of likelihood as training objective leads to high quality models for a broad range of language understanding tasks, maximization-based decoding methods such as beam search lead to degeneration — output text that is bland, incoherent, or gets stuck in repetitive loops. To address this we propose Nucleus Sampling, a simple but effective method to draw considerably higher quality text out of neural language models. Our approach avoids text degeneration by truncating the unreliable tail of the probability distribution, sampling from the dynamic nucleus of tokens containing the vast majority of the probability mass. To properly examine current maximization-based and stochastic decoding methods, we compare generations from each of these methods to the distribution of human text along several axes such as likelihood, diversity, and repetition. Our results show that (1) maximization is an inappropriate decoding objective for open-ended text generation, (2) the probability distributions of the best current language models have an unreliable tail which needs to be truncated during generation and (3) Nucleus Sampling is the best decoding strategy for generating long-form text that is both high-quality — as measured by human evaluation — and as diverse as human-written text.

682 citations


Journal ArticleDOI
TL;DR: An extension of Hamiltonian Monte Carlo that can efficiently explore target distributions with discontinuous densities and enables efficient sampling from ordinal parameters though embedding of probability mass functions into continuous spaces is presented.
Abstract: Hamiltonian Monte Carlo has emerged as a standard tool for posterior computation. In this article, we present an extension that can efficiently explore target distributions with discontinuous densities. Our extension in particular enables efficient sampling from ordinal parameters though embedding of probability mass functions into continuous spaces. We motivate our approach through a theory of discontinuous Hamiltonian dynamics and develop a corresponding numerical solver. The proposed solver is the first of its kind, with a remarkable ability to exactly preserve the Hamiltonian. We apply our algorithm to challenging posterior inference problems to demonstrate its wide applicability and competitive performance.

69 citations


Posted Content
TL;DR: The GEBM samples on image-generation tasks are of much better quality than those from the learned generator alone, indicating that all else being equal, the GEBM will outperform a GAN of the same complexity.
Abstract: We introduce the Generalized Energy Based Model (GEBM) for generative modelling. These models combine two trained components: a base distribution (generally an implicit model), which can learn the support of data with low intrinsic dimension in a high dimensional space; and an energy function, to refine the probability mass on the learned support. Both the energy function and base jointly constitute the final model, unlike GANs, which retain only the base distribution (the "generator"). GEBMs are trained by alternating between learning the energy and the base. We show that both training stages are well-defined: the energy is learned by maximising a generalized likelihood, and the resulting energy-based loss provides informative gradients for learning the base. Samples from the posterior on the latent space of the trained model can be obtained via MCMC, thus finding regions in this space that produce better quality samples. Empirically, the GEBM samples on image-generation tasks are of much better quality than those from the learned generator alone, indicating that all else being equal, the GEBM will outperform a GAN of the same complexity. When using normalizing flows as base measures, GEBMs succeed on density modelling tasks, returning comparable performance to direct maximum likelihood of the same networks.

44 citations


Journal ArticleDOI
TL;DR: In this paper, a new two-parameter exponentiated discrete Lindley distribution is introduced and a wide range of its structural properties are investigated, including the shape of the probability mass.
Abstract: This paper introduces a new two-parameter exponentiated discrete Lindley distribution. A wide range of its structural properties are investigated. This includes the shape of the probability mass fu...

42 citations


Journal ArticleDOI
TL;DR: The proposed distributionally robust framework for solving the OPF problem benefits from the observed correlation amongst the uncertain parameters to mitigate branch flow limit violations, and yet maintain an acceptable worst-case expected operational cost as compared to RO and SO solutions.
Abstract: Recent research on optimal power flow (OPF) in networks with renewable power involves optimizing both first and second stage variables that adjust the decision once the uncertainty is revealed. In general, only partial information on the underlying probability distribution of renewable power production is available. This paper considers a distributionally robust framework for solving the OPF problem. The formulation stipulates a probability mass function of wind power production, whose probabilities and scenario locations vary in a box of ambiguity with bounds that can be tuned based on historical data. Distributionally robust optimization (DRO) is used to derive new conditional value-at-risk (CVaR) constraints that limit the frequency and severity of branch flow limit violations whenever the renewable power generation deviates from its forecast. Numerical results are reported on networks with up to 2736 nodes and contrasted with classical robust optimization (RO) and stochastic optimization (SO) solutions. The results show an advantage to adopting the proposed DRO for load flow control. In particular, the solution benefits from the observed correlation amongst the uncertain parameters to mitigate branch flow limit violations, and yet maintain an acceptable worst-case expected operational cost as compared to RO and SO solutions.

40 citations


Journal ArticleDOI
TL;DR: A histogram-based normalization method is put forward that provides a unified, probabilistic interpretation of any measure over data sets with different class distributions and shows a direct connection between class ratio changes and certain types of concept drift, which could be influential in designing new types of classifiers and drift detectors for imbalanced data streams.
Abstract: As each imbalanced classification problem comes with its own set of challenges, the measure used to evaluate classifiers must be individually selected. To help researchers make this decision in an informed manner, experimental and theoretical investigations compare general properties of measures. However, existing studies do not analyze changes in measure behavior imposed by different imbalance ratios. Moreover, several characteristics of imbalanced data streams, such as the effect of dynamically changing class proportions, have not been thoroughly investigated from the perspective of different metrics. In this paper, we study measure dynamics by analyzing changes of measure values, distributions, and gradients with diverging class proportions. For this purpose, we visualize measure probability mass functions and gradients. In addition, we put forward a histogram-based normalization method that provides a unified, probabilistic interpretation of any measure over data sets with different class distributions. The results of analyzing eight popular classification measures show that the effect class proportions have on each measure is different and should be taken into account when evaluating classifiers. Apart from highlighting imbalance-related properties of each measure, our study shows a direct connection between class ratio changes and certain types of concept drift, which could be influential in designing new types of classifiers and drift detectors for imbalanced data streams.

37 citations


Journal ArticleDOI
TL;DR: The results reveal the significance of considering the inhomogeneity of S-MJLSs and the importance of the constructed Lyapunov function, which depends on both the current mode and the elapsed time in the current modes.

35 citations


Journal ArticleDOI
28 May 2020-Entropy
TL;DR: A new probability mass function is proposed by creating a natural discrete analog to the continuous Lindley distribution as a mixture of geometric and negative binomial distributions that has many interesting properties.
Abstract: In this paper, we propose and study a new probability mass function by creating a natural discrete analog to the continuous Lindley distribution as a mixture of geometric and negative binomial distributions. The new distribution has many interesting properties that make it superior to many other discrete distributions, particularly in analyzing over-dispersed count data. Several statistical properties of the introduced distribution have been established including moments and moment generating function, residual moments, characterization, entropy, estimation of the parameter by the maximum likelihood method. A bias reduction method is applied to the derived estimator; its existence and uniqueness are discussed. Applications of the goodness of fit of the proposed distribution have been examined and compared with other discrete distributions using three real data sets from biological sciences.

29 citations


Journal ArticleDOI
TL;DR: The probability mass functions of system’s life time, the time spent by the system in a perfectly functioning state, and the total time spent in partially working states are derived for the proposed model.

28 citations


Journal ArticleDOI
28 Feb 2020
TL;DR: The results show that the performance indicators considered in the literature such as average or peak AoI may give misleading insights into the real AoI performance.
Abstract: Age-of-information (AoI) is a metric quantifying information freshness at the receiver. It captures the delay together with packet loss and packet generation rate. However, the existing literature focuses on average or peak AoI and neglects the complete distribution. In this letter, we consider a ${N}$ -hop network with time-invariant packet loss probabilities on each link. We derive closed form equations for the probability mass function of AoI. We verify our findings with simulations. Our results show that the performance indicators considered in the literature such as average or peak AoI may give misleading insights into the real AoI performance.

24 citations


Journal ArticleDOI
TL;DR: To demonstrate the importance of the proposed distribution, three data sets on coronavirus, length of stay at psychiatric ward and monthly counts of larceny calls are analyzed and the maximum likelihood approach is used to estimate the model parameters.
Abstract: The aim of this article is to propose a new three-parameter discrete Lindley distribution. A wide range of its structural properties are investigated. This includes the shape of the probability mass function, hazard rate function, moments, skewness, kurtosis, index of dispersion, mean residual life, mean past life and stress-strength reliability. These properties are expressed in explicit forms. The maximum likelihood approach is used to estimate the model parameters. A detailed simulation study is carried out to examine the bias and mean square error of the estimators. Using the proposed distribution, a new first-order integer-valued autoregressive process is introduced for the over-dispersed, equi-dispersed and under-dispersed time series of counts. To demonstrate the importance of the proposed distribution, three data sets on coronavirus, length of stay at psychiatric ward and monthly counts of larceny calls are analyzed.

Posted Content
TL;DR: The definition of anti-concentration is that the expected collision probability, that is, the probability that two independently drawn outcomes will agree, is only a constant factor larger than if the distribution were uniform, and it is shown that when the 2-local gates are each drawn from the Haar measure, at least $\Omega(n \log(n)$ gates are needed for this condition to be met on an qudit circuit.
Abstract: We consider quantum circuits consisting of randomly chosen two-local gates and study the number of gates needed for the distribution over measurement outcomes for typical circuit instances to be anti-concentrated, roughly meaning that the probability mass is not too concentrated on a small number of measurement outcomes. Understanding the conditions for anti-concentration is important for determining which quantum circuits are difficult to simulate classically, as anti-concentration has been in some cases an ingredient of mathematical arguments that simulation is hard and in other cases a necessary condition for easy simulation. Our definition of anti-concentration is that the expected collision probability, that is, the probability that two independently drawn outcomes will agree, is only a constant factor larger than if the distribution were uniform. We show that when the 2-local gates are each drawn from the Haar measure (or any two-design), at least $\Omega(n \log(n))$ gates (and thus $\Omega(\log(n))$ circuit depth) are needed for this condition to be met on an $n$ qudit circuit. In both the case where the gates are nearest-neighbor on a 1D ring and the case where gates are long-range, we show $O(n \log(n))$ gates are also sufficient, and we precisely compute the optimal constant prefactor for the $n \log(n)$. The technique we employ relies upon a mapping from the expected collision probability to the partition function of an Ising-like classical statistical mechanical model, which we manage to bound using stochastic and combinatorial techniques.

Posted Content
TL;DR: The results show that subcube conditioning, as a natural model for accessing high-dimensional distributions, enables significant savings in learning and testing junta distributions compared to the standard sampling model.
Abstract: We study the problems of learning and testing junta distributions on $\{-1,1\}^n$ with respect to the uniform distribution, where a distribution $p$ is a $k$-junta if its probability mass function $p(x)$ depends on a subset of at most $k$ variables. The main contribution is an algorithm for finding relevant coordinates in a $k$-junta distribution with subcube conditioning [BC18, CCKLW20]. We give two applications: 1. An algorithm for learning $k$-junta distributions with $\tilde{O}(k/\epsilon^2) \log n + O(2^k/\epsilon^2)$ subcube conditioning queries, and 2. An algorithm for testing $k$-junta distributions with $\tilde{O}((k + \sqrt{n})/\epsilon^2)$ subcube conditioning queries. All our algorithms are optimal up to poly-logarithmic factors. Our results show that subcube conditioning, as a natural model for accessing high-dimensional distributions, enables significant savings in learning and testing junta distributions compared to the standard sampling model. This addresses an open question posed by Aliakbarpour, Blais, and Rubinfeld [ABR17].

Journal ArticleDOI
TL;DR: The Tweedie's compound Poisson model is a popular method to model insurance claims with probability mass at zero and nonnegative, highly right-skewed distribution as discussed by the authors.
Abstract: Tweedie’s compound Poisson model is a popular method to model insurance claims with probability mass at zero and nonnegative, highly right-skewed distribution. In particular, it is not uncommon to ...

Journal ArticleDOI
TL;DR: The closed-form pairwise error probability is derived for the first time, which also validates the superiority of the proposed PS scheme in comparison with the uniform schemes with elaborate rotations.
Abstract: Probabilistic shaping (PS) combined with bipolar amplitude shift keying modulation is a promising transmission scheme. However, such 1-dimensional PS scheme inevitably suffers shaping loss in Rayleigh fading channels due to the lack of signal diversity. In this paper, we design a rotated quadrature amplitude modulation based PS strategy. With the discrete inputs, the ergodic constellation constrained (CC) capacity is derived based on the mutual information conditioned on full channel state information available at the receiver, which is found to be mutually influenced by both the probability mass function (PMF) and the rotation angle (RA) of the input signals. Furthermore, the closed-form cutoff rate is also derived to simplify the analysis, which is the lower bound of the mutual information with finite decoding complexity. Correspondingly, the optimizations of RA and PMF are both considered for maximizing the ergodic CC capacity and the cutoff rate, respectively, where the single parameter RA is firstly obtained by the exhaustive search. And then, the optimal PMFs of the subproblem for the fixed RAs and the fixed constellation scaling factor are obtained. Furthermore, with this non-equiprobable transmission scheme, the closed-form pairwise error probability is derived for the first time, which also validates the superiority of the proposed PS scheme in comparison with the uniform schemes with elaborate rotations.

Journal ArticleDOI
TL;DR: This paper proposes a novel evidential reasoning approach that combines the virtual linguistic trust degree with basic unit-interval monotonic function and presents the whole framework of an extended evidential Reasoning algorithm.

Posted Content
TL;DR: An asynchronous constrained batch-parallel Bayesian optimization method is proposed to efficiently solve the computationally-expensive simulation-based optimization problems on the HPC platform, with a budgeted computational resource, where the maximum number of simulations is a constant.
Abstract: High-fidelity complex engineering simulations are highly predictive, but also computationally expensive and often require substantial computational efforts. The mitigation of computational burden is usually enabled through parallelism in high-performance cluster (HPC) architecture. In this paper, an asynchronous constrained batch-parallel Bayesian optimization method is proposed to efficiently solve the computationally-expensive simulation-based optimization problems on the HPC platform, with a budgeted computational resource, where the maximum number of simulations is a constant. The advantages of this method are three-fold. First, the efficiency of the Bayesian optimization is improved, where multiple input locations are evaluated massively parallel in an asynchronous manner to accelerate the optimization convergence with respect to physical runtime. This efficiency feature is further improved so that when each of the inputs is finished, another input is queried without waiting for the whole batch to complete. Second, the method can handle both known and unknown constraints. Third, the proposed method considers several acquisition functions at the same time and sample based on an evolving probability mass distribution function using a modified GP-Hedge scheme, where parameters are corresponding to the performance of each acquisition function. The proposed framework is termed aphBO-2GP-3B, which corresponds to asynchronous parallel hedge Bayesian optimization with two Gaussian processes and three batches. The aphBO-2GP-3B framework is demonstrated using two high-fidelity expensive industrial applications, where the first one is based on finite element analysis (FEA) and the second one is based on computational fluid dynamics (CFD) simulations.

Journal ArticleDOI
TL;DR: In this paper, a two-parameter discrete distribution for counting data is introduced, where the probability mass function is very simple and it may have a zero v... and the probability distribution may be useful for modeling count data.
Abstract: In this paper we introduce a new two-parameter discrete distribution which may be useful for modeling count data. Additionally, the probability mass function is very simple and it may have a zero v...

Journal ArticleDOI
TL;DR: Computational means are used to demonstrate the fitness of each of these MTPT methods for simulating 1-D advective-dispersive transport with uniform coefficients and reveal that increased accuracy is not always justified relative to increased computational complexity.

Journal ArticleDOI
TL;DR: In this article, a methodology is developed for data analysis based on empirically constructed geodesic metric spaces, which reveal properties of the data based on geometry, such as those that are difficult to see from the raw Euclidean distances.
Abstract: A methodology is developed for data analysis based on empirically constructed geodesic metric spaces. For a probability distribution, the length along a path between two points can be defined as the amount of probability mass accumulated along the path. The geodesic, then, is the shortest such path and defines a geodesic metric. Such metrics are transformed in a number of ways to produce parametrised families of geodesic metric spaces, empirical versions of which allow computation of intrinsic means and associated measures of dispersion. These reveal properties of the data, based on geometry, such as those that are difficult to see from the raw Euclidean distances. Examples of application include clustering and classification. For certain parameter ranges, the spaces become CAT(0) spaces and the intrinsic means are unique. In one case, a minimal spanning tree of a graph based on the data becomes CAT(0). In another, a so-called “metric cone” construction allows extension to CAT(k) spaces. It is shown how to empirically tune the parameters of the metrics, making it possible to apply them to a number of real cases.

Journal ArticleDOI
TL;DR: The theoretical bounds on optimal bed demand prediction accuracy are defined and a flexible statistical model to approximate the probability mass function of future bed demand is developed to approximate that of an ideal forecast is developed.
Abstract: Failing to match the supply of resources to the demand for resources in a hospital can cause non-clinical transfers, diversions, safety risks, and expensive under-utilized resource capacity. Forecasting bed demand helps achieve appropriate safety standards and cost management by proactively adjusting staffing levels and patient flow protocols. This paper defines the theoretical bounds on optimal bed demand prediction accuracy and develops a flexible statistical model to approximate the probability mass function of future bed demand. A case study validates the model using blinded data from a mid-sized Massachusetts community hospital. This approach expands upon similar work by forecasting multiple days in advance instead of a single day, providing a probability mass function of demand instead of a point estimate, using the exact surgery schedule instead of assuming a cyclic schedule, and using patient-level duration-varying length-of-stay distributions instead of assuming patient homogeneity and exponential length of stay distributions. The primary results of this work are an accurate and lengthy forecast, which provides managers better information and more time to optimize short-term staffing adaptations to stochastic bed demand, and a derivation of the minimum mean absolute error of an ideal forecast.

Posted Content
10 Mar 2020
TL;DR: This work usesLegendre duality to provide a variational lowerbound for the Kullback-Leibler divergence, and shows that this estimator, the KL Approximate Lower-bound Estimate (KALE), provides a maximum likelihood estimate (MLE) and extends this procedure to adversarial training.
Abstract: We introduce the Generalized Energy Based Model (GEBM) for generative modelling. These models combine two trained components: a base distribution (generally an implicit model), which can learn the support of data with low intrinsic dimension in a high dimensional space; and an energy function, to refine the probability mass on the learned support. Both the energy function and base jointly constitute the final model, unlike GANs, which retain only the base distribution (the "generator"). GEBMs are trained by alternating between learning the energy and the base. We show that both training stages are well-defined: the energy is learned by maximising a generalized likelihood, and the resulting energy-based loss provides informative gradients for learning the base. Samples from the posterior on the latent space of the trained model can be obtained via MCMC, thus finding regions in this space that produce better quality samples. Empirically, the GEBM samples on image-generation tasks are of much better quality than those from the learned generator alone, indicating that all else being equal, the GEBM will outperform a GAN of the same complexity. When using normalizing flows as base measures, GEBMs succeed on density modelling tasks, returning comparable performance to direct maximum likelihood of the same networks.

Posted Content
09 Feb 2020
TL;DR: An operational interpretation of the measure is given based on the decisions that an agent should take if given only the shared information, which lends itself for example to local learning in artificial neural networks.
Abstract: Partial information decomposition (PID) of the multivariate mutual information describes the distinct ways in which a set of source variables contains information about a target variable. The groundbreaking work of Williams and Beer has shown that this decomposition can not be determined from classic information theory without making additional assumptions, and several candidate measures have been proposed, often drawing on principles from related fields such as decision theory. None of these measures is differentiable with respect to the underlying probability mass function. We here present a novel measure that draws only on the principle linking the local mutual information to exclusion of probability mass. This principle is foundational to the original definition of the mutual information by Fano. We reuse this principle to define a measure of shared information based on the shared exclusion of probability mass by the realizations of source variables. Our measure is differentiable and well-defined for individual realizations of the random variables. Thus, it lends itself for example to local learning in artificial neural networks. We show that the measure can be interpreted as local mutual information with the help of an auxiliary variable. We also show that it has a meaningful Moebius inversion on a redundancy lattice and obeys a target chain rule. We give an operational interpretation of the measure based on the decisions that an agent should take if given only the shared information.

Journal ArticleDOI
TL;DR: The Kummer confluent hypergeometric function is demonstrated to be computationally most efficient and a new computationally efficient formula is derived for the probability mass function of the number of renewals by a given time.
Abstract: Convolutions of independent gamma variables are encountered in many applications such as insurance, reliability, and network engineering. Accurate and fast evaluations of their density and distribution functions are critical for such applications, but no open source, user-friendly software implementation has been available. We review several numerical evaluations of the density and distribution of convolution of independent gamma variables and compare them with respect to their accuracy and speed. The methods that are based on the Kummer confluent hypergeometric function are computationally most efficient. The benefit of employing the confluent hypergeometric function is further demonstrated by a renewal process application, where the gamma variables in the convolution are Erlang variables. We derive a new computationally efficient formula for the probability mass function of the number of renewals by a given time. An R package coga provides efficient C++ based implementations of the discussed methods and are available in CRAN.

Journal ArticleDOI
TL;DR: In this paper, the class of completely monotone probability mass functions (pmf) from a statistical perspective is considered, and it is shown that the complete monotonous least squares estimator exists, is strongly consistent and converges weakly to the truth at the n-rate.

Journal ArticleDOI
TL;DR: The system is modelled as a discrete time Markov chain and matrix-geometric method is used to obtain the probability mass function of the AoI.

Journal ArticleDOI
TL;DR: A novel probabilistic approach to dynamic image sampling (PADIS) which is based on a data driven probability mass function which uses a local variance map and the runtime is faster than for the other methods.
Abstract: Incremental sampling can be applied in scientific imaging techniques whenever the measurements are taken incrementally, i.e., one pixel position is measured at a time. It can be used to reduce the measurement time as well as the dose impinging onto a specimen. For incremental sampling, the choice of the sampling pattern plays a major role in order to achieve a high reconstruction quality. Besides using static incremental sampling patterns, it is also possible to dynamically adapt the sampling pattern based on the already measured data. This is called dynamic sampling and allows for a higher reconstruction quality, as the inhomogeneity of the sampled image content can be taken into account. Several approaches for dynamic sampling have been published in the literature. However, they share the common drawback that homogeneous regions are sampled too late. This reduces the reconstruction quality as fine details can be missed. We overcome this drawback using a novel probabilistic approach to dynamic image sampling (PADIS). It is based on a data driven probability mass function which uses a local variance map. In our experiments, we evaluate the reconstruction quality for scanning electron microscopy images as well as for natural image content. For scanning electron microscopy images with a sampling density of 35% and frequency selective reconstruction, our approach achieves a PSNR gain of +0.92 dB compared to other dynamic sampling approaches and +1.42 dB compared to the best static patterns. For natural images, even higher gains are achieved. Experiments with additional measurement noise show that for our method the sampling patterns are more stable. Moreover, the runtime is faster than for the other methods.

Journal ArticleDOI
31 Dec 2020
TL;DR: An approximation for the probability mass function of the binomial distribution Bin Bin with p=c/\textcircled {1}^{\alpha }$$ with $$1/2<\alpha \le 1$$ .
Abstract: We study properties of two probability distributions defined on the infinite set {0,1,2,…} and generalizing the ordinary discrete uniform and binomial distributions. Both extensions use the grossone-model of infinity. The first of the two distributions we study is uniform and assigns masses 1/\textcircled1 to all points in the set {0,1,…,\textcircled1−1}, where \textcircled1 denotes the grossone. For this distribution, we study the problem of decomposing a random variable ξ with this distribution as a sum ξ=dξ1+⋯+ξm, where ξ1,…,ξm are independent non-degenerate random variables. Then, we develop an approximation for the probability mass function of the binomial distribution Bin(\textcircled1,p) with p=c/\textcircled1α with 1/2<α≤1. The accuracy of this approximation is assessed using a numerical study.

Journal ArticleDOI
TL;DR: Methods to analyze the time-series probability mass function produced by the IMM using the Viterbi and BCJR algorithms in common use in the digital-communications discipline to identify the “best path” through the space of available models over time, based on the likelihoods produced byThe IMM and the Markovian transition probabilities.
Abstract: Lithium-ion battery modeling for use in battery management systems requires models that can adapt to the changing behavior of a cell as it ages. One method to enable this adaptivity is to select the most representative model from a set of “pre-aged” models that represent the characteristics of the cell as different cyclic and calendar aging processes occur. By modeling the aging of a cell as a Markovian process, an interacting multiple-model Kalman filter (IMM) can be utilized to determine a time-varying probability mass function that specifies the probability that each of the models under consideration is the best representation of the cell under observation. While the output of the IMM is useful by itself, its predictions can be improved by post-processing. In this paper, we present methods to analyze the time-series probability mass function produced by the IMM using the Viterbi and BCJR algorithms in common use in the digital-communications discipline. These algorithms seek to identify the “best path” through the space of available models over time, based on the likelihoods produced by the IMM and the Markovian transition probabilities. Through the use of these post-processing algorithms, confidence in the best-fitting model can be improved.

Posted Content
TL;DR: The framework of three new classes of exponential dispersion models of discrete probability distributions defined by specifying their variance functions in their mean value parameterization are developed and proved to be overdispersed and zero inflated, making them as competitive statistical models for those in use in statistical modeling.
Abstract: We consider three new classes of exponential dispersion models of discrete probability distributions which are defined by specifying their variance functions in their mean value parameterization. In a previous paper (Bar-Lev and Ridder, 2020a), we have developed the framework of these classes and proved that they have some desirable properties. Each of these classes was shown to be overdispersed and zero inflated in ascending order, making them as competitive statistical models for those in use in statistical modeling. In this paper we elaborate on the computational aspects of their probability mass functions. Furthermore, we apply these classes for fitting real data sets having overdispersed and zero-inflated statistics. Classic models based on Poisson or negative binomial distributions show poor fits, and therefore many alternatives have already proposed in recent years. We execute an extensive comparison with these other proposals, from which we may conclude that our framework is a flexible tool that gives excellent results in all cases. Moreover, in most cases our model gives the best fit.