scispace - formally typeset
Search or ask a question
Posted Content

A family of statistical symmetric divergences based on Jensen's inequality

TL;DR: A novel parametric family of symmetric information-theoretic distances based on Jensen’s inequality for a convex functional generator unies the celebrated Jereys divergence with the Jensen-Shannon divergence when the Shannon entropy generator is chosen.
Abstract: We introduce a novel parametric family of symmetric information-theoretic distances based on Jensen’s inequality for a convex functional generator. In particular, this family unies the celebrated Jereys divergence with the Jensen-Shannon divergence when the Shannon entropy generator is chosen. We then design a generic algorithm to compute the unique centroid dened as the minimum average divergence. This yields a smooth family of centroids linking the Jereys to the Jensen-Shannon centroid. Finally, we report on our experimental results.
Citations
More filters
Journal ArticleDOI
11 May 2019-Entropy
TL;DR: This work presents a generalization of the Jensen–Shannon (JS) divergence using abstract means which yields closed-form expressions when the mean is chosen according to the parametric family of distributions.
Abstract: The Jensen–Shannon divergence is a renowned bounded symmetrization of the unbounded Kullback–Leibler divergence which measures the total Kullback–Leibler divergence to the average mixture distribution. However, the Jensen–Shannon divergence between Gaussian distributions is not available in closed form. To bypass this problem, we present a generalization of the Jensen–Shannon (JS) divergence using abstract means which yields closed-form expressions when the mean is chosen according to the parametric family of distributions. More generally, we define the JS-symmetrizations of any distance using parameter mixtures derived from abstract means. In particular, we first show that the geometric mean is well-suited for exponential families, and report two closed-form formula for (i) the geometric Jensen–Shannon divergence between probability densities of the same exponential family; and (ii) the geometric JS-symmetrization of the reverse Kullback–Leibler divergence between probability densities of the same exponential family. As a second illustrating example, we show that the harmonic mean is well-suited for the scale Cauchy distributions, and report a closed-form formula for the harmonic Jensen–Shannon divergence between scale Cauchy distributions. Applications to clustering with respect to these novel Jensen–Shannon divergences are touched upon.

109 citations


Cites background from "A family of statistical symmetric d..."

  • ...Then the Jensen–Shannon divergence and the Jeffreys divergence can be rewritten [24] as...

    [...]

  • ...A family of symmetric distances unifying the Jeffreys divergence with the Jensen–Shannon divergence was proposed in [24]....

    [...]

Journal ArticleDOI
TL;DR: This paper systematically combs the research status of similarity measurement, analyzes the advantages and disadvantages of current methods, develops a more comprehensive classification description system of text similarity measurement algorithms, and summarizes the future development direction.
Abstract: Text similarity measurement is the basis of natural language processing tasks, which play an important role in information retrieval, automatic question answering, machine translation, dialogue systems, and document matching. This paper systematically combs the research status of similarity measurement, analyzes the advantages and disadvantages of current methods, develops a more comprehensive classification description system of text similarity measurement algorithms, and summarizes the future development direction. With the aim of providing reference for related research and application, the text similarity measurement method is described by two aspects: text distance and text representation. The text distance can be divided into length distance, distribution distance, and semantic distance; text representation is divided into string-based, corpus-based, single-semantic text, multi-semantic text, and graph-structure-based representation. Finally, the development of text similarity is also summarized in the discussion section.

76 citations


Cites background from "A family of statistical symmetric d..."

  • ...The smaller the Jensen–Shannon distance, the more similar the distribution of the two documents [11]....

    [...]

Proceedings ArticleDOI
Yi Hao1, Nannan Wang1, Xinbo Gao1, Jie Li1, Xiaoyu Wang 
15 Oct 2019
TL;DR: A dual-alignment feature embedding method that can extract modality-invariant features with robust and rich identity embeddings for cross-modality person re-identification and achieves competitive results with the state-of-the-art methods on two datasets.
Abstract: Person re-identification aims at searching pedestrians across different cameras, which is a key problem in video surveillance. With requirements in night environment, RGB-infrared person re-identification which could be regarded as a cross-modality matching problem, has gained increasing attention in recent years. Aside from cross-modality discrepancy, RGB-infrared person re-identification also suffers from human pose and view point differences. We design a dual-alignment feature embedding method to extract discriminative modality-invariant features. The concept of dual-alignment is two folds: spatial and modality alignments. We adopt the part-level features to extract fine-grained camera-invariant information. We introduce distribution loss function and correlation loss function to align the embedding features across visible and infrared modalities. Finally, we can extract modality-invariant features with robust and rich identity embeddings for cross-modality person re-identification. Experiment confirms that the proposed baseline and improvement achieves competitive results with the state-of-the-art methods on two datasets. For instance, We achieve (57.5+12.6)% rank-1 accuracy and (57.3+11.8)% mAP on the RegDB dataset.

64 citations


Cites methods from "A family of statistical symmetric d..."

  • ...Because the partitioned features fV and fI are constrained by identity loss, thus the feature space is compact and we can use JS1(NV ,NI )[16] tomeasure the similarity....

    [...]

Journal ArticleDOI
TL;DR: In this paper, a vector-skew generalization of the scalar Jensen-Bregman divergences is introduced and the Jensen-Shannon divergence is derived for categorical distributions or normalized histograms.
Abstract: The Jensen-Shannon divergence is a renown bounded symmetrization of the Kullback-Leibler divergence which does not require probability densities to have matching supports. In this paper, we introduce a vector-skew generalization of the scalar $\alpha$-Jensen-Bregman divergences and derive thereof the vector-skew $\alpha$-Jensen-Shannon divergences. We study the properties of these novel divergences and show how to build parametric families of symmetric Jensen-Shannon-type divergences. Finally, we report an iterative algorithm to numerically compute the Jensen-Shannon-type centroids for a set of probability densities belonging to a mixture family: This includes the case of the Jensen-Shannon centroid of a set of categorical distributions or normalized histograms.

56 citations

References
More filters
Book
01 Jan 1991
TL;DR: The author examines the role of entropy, inequality, and randomness in the design of codes and the construction of codes in the rapidly changing environment.
Abstract: Preface to the Second Edition. Preface to the First Edition. Acknowledgments for the Second Edition. Acknowledgments for the First Edition. 1. Introduction and Preview. 1.1 Preview of the Book. 2. Entropy, Relative Entropy, and Mutual Information. 2.1 Entropy. 2.2 Joint Entropy and Conditional Entropy. 2.3 Relative Entropy and Mutual Information. 2.4 Relationship Between Entropy and Mutual Information. 2.5 Chain Rules for Entropy, Relative Entropy, and Mutual Information. 2.6 Jensen's Inequality and Its Consequences. 2.7 Log Sum Inequality and Its Applications. 2.8 Data-Processing Inequality. 2.9 Sufficient Statistics. 2.10 Fano's Inequality. Summary. Problems. Historical Notes. 3. Asymptotic Equipartition Property. 3.1 Asymptotic Equipartition Property Theorem. 3.2 Consequences of the AEP: Data Compression. 3.3 High-Probability Sets and the Typical Set. Summary. Problems. Historical Notes. 4. Entropy Rates of a Stochastic Process. 4.1 Markov Chains. 4.2 Entropy Rate. 4.3 Example: Entropy Rate of a Random Walk on a Weighted Graph. 4.4 Second Law of Thermodynamics. 4.5 Functions of Markov Chains. Summary. Problems. Historical Notes. 5. Data Compression. 5.1 Examples of Codes. 5.2 Kraft Inequality. 5.3 Optimal Codes. 5.4 Bounds on the Optimal Code Length. 5.5 Kraft Inequality for Uniquely Decodable Codes. 5.6 Huffman Codes. 5.7 Some Comments on Huffman Codes. 5.8 Optimality of Huffman Codes. 5.9 Shannon-Fano-Elias Coding. 5.10 Competitive Optimality of the Shannon Code. 5.11 Generation of Discrete Distributions from Fair Coins. Summary. Problems. Historical Notes. 6. Gambling and Data Compression. 6.1 The Horse Race. 6.2 Gambling and Side Information. 6.3 Dependent Horse Races and Entropy Rate. 6.4 The Entropy of English. 6.5 Data Compression and Gambling. 6.6 Gambling Estimate of the Entropy of English. Summary. Problems. Historical Notes. 7. Channel Capacity. 7.1 Examples of Channel Capacity. 7.2 Symmetric Channels. 7.3 Properties of Channel Capacity. 7.4 Preview of the Channel Coding Theorem. 7.5 Definitions. 7.6 Jointly Typical Sequences. 7.7 Channel Coding Theorem. 7.8 Zero-Error Codes. 7.9 Fano's Inequality and the Converse to the Coding Theorem. 7.10 Equality in the Converse to the Channel Coding Theorem. 7.11 Hamming Codes. 7.12 Feedback Capacity. 7.13 Source-Channel Separation Theorem. Summary. Problems. Historical Notes. 8. Differential Entropy. 8.1 Definitions. 8.2 AEP for Continuous Random Variables. 8.3 Relation of Differential Entropy to Discrete Entropy. 8.4 Joint and Conditional Differential Entropy. 8.5 Relative Entropy and Mutual Information. 8.6 Properties of Differential Entropy, Relative Entropy, and Mutual Information. Summary. Problems. Historical Notes. 9. Gaussian Channel. 9.1 Gaussian Channel: Definitions. 9.2 Converse to the Coding Theorem for Gaussian Channels. 9.3 Bandlimited Channels. 9.4 Parallel Gaussian Channels. 9.5 Channels with Colored Gaussian Noise. 9.6 Gaussian Channels with Feedback. Summary. Problems. Historical Notes. 10. Rate Distortion Theory. 10.1 Quantization. 10.2 Definitions. 10.3 Calculation of the Rate Distortion Function. 10.4 Converse to the Rate Distortion Theorem. 10.5 Achievability of the Rate Distortion Function. 10.6 Strongly Typical Sequences and Rate Distortion. 10.7 Characterization of the Rate Distortion Function. 10.8 Computation of Channel Capacity and the Rate Distortion Function. Summary. Problems. Historical Notes. 11. Information Theory and Statistics. 11.1 Method of Types. 11.2 Law of Large Numbers. 11.3 Universal Source Coding. 11.4 Large Deviation Theory. 11.5 Examples of Sanov's Theorem. 11.6 Conditional Limit Theorem. 11.7 Hypothesis Testing. 11.8 Chernoff-Stein Lemma. 11.9 Chernoff Information. 11.10 Fisher Information and the Cram-er-Rao Inequality. Summary. Problems. Historical Notes. 12. Maximum Entropy. 12.1 Maximum Entropy Distributions. 12.2 Examples. 12.3 Anomalous Maximum Entropy Problem. 12.4 Spectrum Estimation. 12.5 Entropy Rates of a Gaussian Process. 12.6 Burg's Maximum Entropy Theorem. Summary. Problems. Historical Notes. 13. Universal Source Coding. 13.1 Universal Codes and Channel Capacity. 13.2 Universal Coding for Binary Sequences. 13.3 Arithmetic Coding. 13.4 Lempel-Ziv Coding. 13.5 Optimality of Lempel-Ziv Algorithms. Compression. Summary. Problems. Historical Notes. 14. Kolmogorov Complexity. 14.1 Models of Computation. 14.2 Kolmogorov Complexity: Definitions and Examples. 14.3 Kolmogorov Complexity and Entropy. 14.4 Kolmogorov Complexity of Integers. 14.5 Algorithmically Random and Incompressible Sequences. 14.6 Universal Probability. 14.7 Kolmogorov complexity. 14.9 Universal Gambling. 14.10 Occam's Razor. 14.11 Kolmogorov Complexity and Universal Probability. 14.12 Kolmogorov Sufficient Statistic. 14.13 Minimum Description Length Principle. Summary. Problems. Historical Notes. 15. Network Information Theory. 15.1 Gaussian Multiple-User Channels. 15.2 Jointly Typical Sequences. 15.3 Multiple-Access Channel. 15.4 Encoding of Correlated Sources. 15.5 Duality Between Slepian-Wolf Encoding and Multiple-Access Channels. 15.6 Broadcast Channel. 15.7 Relay Channel. 15.8 Source Coding with Side Information. 15.9 Rate Distortion with Side Information. 15.10 General Multiterminal Networks. Summary. Problems. Historical Notes. 16. Information Theory and Portfolio Theory. 16.1 The Stock Market: Some Definitions. 16.2 Kuhn-Tucker Characterization of the Log-Optimal Portfolio. 16.3 Asymptotic Optimality of the Log-Optimal Portfolio. 16.4 Side Information and the Growth Rate. 16.5 Investment in Stationary Markets. 16.6 Competitive Optimality of the Log-Optimal Portfolio. 16.7 Universal Portfolios. 16.8 Shannon-McMillan-Breiman Theorem (General AEP). Summary. Problems. Historical Notes. 17. Inequalities in Information Theory. 17.1 Basic Inequalities of Information Theory. 17.2 Differential Entropy. 17.3 Bounds on Entropy and Relative Entropy. 17.4 Inequalities for Types. 17.5 Combinatorial Bounds on Entropy. 17.6 Entropy Rates of Subsets. 17.7 Entropy and Fisher Information. 17.8 Entropy Power Inequality and Brunn-Minkowski Inequality. 17.9 Inequalities for Determinants. 17.10 Inequalities for Ratios of Determinants. Summary. Problems. Historical Notes. Bibliography. List of Symbols. Index.

45,034 citations

01 Feb 1977

5,933 citations


Additional excerpts

  • ...…+ (1− α)ct) + α∇F (αct + (1− α)pi)), (57) That is, ct+1 = (∇F )−1 ( n∑ i=1 wi((1− α)∇F (αpi + (1− α)ct) + α∇F (αct + (1− α)pi)) ) (58) (Observe that since F is strictly convex, its Hessian ∇2F is positive-definite so that the reciprocal gradient is ∇F−1 is well-defined, see [Rockafeller(1969)].)...

    [...]

Posted Content
TL;DR: The information deviation between any two finite measures cannot be increased by any statistical operations (Markov morphisms) and is invarient if and only if the morphism is sufficient for these two measures as mentioned in this paper.
Abstract: The information deviation between any two finite measures cannot be increased by any statistical operations (Markov morphisms). It is invarient if and only if the morphism is sufficient for these two measures

5,228 citations