scispace - formally typeset
Search or ask a question
Book

Elements of information theory

01 Jan 1991-
TL;DR: The author examines the role of entropy, inequality, and randomness in the design of codes and the construction of codes in the rapidly changing environment.
Abstract: Preface to the Second Edition. Preface to the First Edition. Acknowledgments for the Second Edition. Acknowledgments for the First Edition. 1. Introduction and Preview. 1.1 Preview of the Book. 2. Entropy, Relative Entropy, and Mutual Information. 2.1 Entropy. 2.2 Joint Entropy and Conditional Entropy. 2.3 Relative Entropy and Mutual Information. 2.4 Relationship Between Entropy and Mutual Information. 2.5 Chain Rules for Entropy, Relative Entropy, and Mutual Information. 2.6 Jensen's Inequality and Its Consequences. 2.7 Log Sum Inequality and Its Applications. 2.8 Data-Processing Inequality. 2.9 Sufficient Statistics. 2.10 Fano's Inequality. Summary. Problems. Historical Notes. 3. Asymptotic Equipartition Property. 3.1 Asymptotic Equipartition Property Theorem. 3.2 Consequences of the AEP: Data Compression. 3.3 High-Probability Sets and the Typical Set. Summary. Problems. Historical Notes. 4. Entropy Rates of a Stochastic Process. 4.1 Markov Chains. 4.2 Entropy Rate. 4.3 Example: Entropy Rate of a Random Walk on a Weighted Graph. 4.4 Second Law of Thermodynamics. 4.5 Functions of Markov Chains. Summary. Problems. Historical Notes. 5. Data Compression. 5.1 Examples of Codes. 5.2 Kraft Inequality. 5.3 Optimal Codes. 5.4 Bounds on the Optimal Code Length. 5.5 Kraft Inequality for Uniquely Decodable Codes. 5.6 Huffman Codes. 5.7 Some Comments on Huffman Codes. 5.8 Optimality of Huffman Codes. 5.9 Shannon-Fano-Elias Coding. 5.10 Competitive Optimality of the Shannon Code. 5.11 Generation of Discrete Distributions from Fair Coins. Summary. Problems. Historical Notes. 6. Gambling and Data Compression. 6.1 The Horse Race. 6.2 Gambling and Side Information. 6.3 Dependent Horse Races and Entropy Rate. 6.4 The Entropy of English. 6.5 Data Compression and Gambling. 6.6 Gambling Estimate of the Entropy of English. Summary. Problems. Historical Notes. 7. Channel Capacity. 7.1 Examples of Channel Capacity. 7.2 Symmetric Channels. 7.3 Properties of Channel Capacity. 7.4 Preview of the Channel Coding Theorem. 7.5 Definitions. 7.6 Jointly Typical Sequences. 7.7 Channel Coding Theorem. 7.8 Zero-Error Codes. 7.9 Fano's Inequality and the Converse to the Coding Theorem. 7.10 Equality in the Converse to the Channel Coding Theorem. 7.11 Hamming Codes. 7.12 Feedback Capacity. 7.13 Source-Channel Separation Theorem. Summary. Problems. Historical Notes. 8. Differential Entropy. 8.1 Definitions. 8.2 AEP for Continuous Random Variables. 8.3 Relation of Differential Entropy to Discrete Entropy. 8.4 Joint and Conditional Differential Entropy. 8.5 Relative Entropy and Mutual Information. 8.6 Properties of Differential Entropy, Relative Entropy, and Mutual Information. Summary. Problems. Historical Notes. 9. Gaussian Channel. 9.1 Gaussian Channel: Definitions. 9.2 Converse to the Coding Theorem for Gaussian Channels. 9.3 Bandlimited Channels. 9.4 Parallel Gaussian Channels. 9.5 Channels with Colored Gaussian Noise. 9.6 Gaussian Channels with Feedback. Summary. Problems. Historical Notes. 10. Rate Distortion Theory. 10.1 Quantization. 10.2 Definitions. 10.3 Calculation of the Rate Distortion Function. 10.4 Converse to the Rate Distortion Theorem. 10.5 Achievability of the Rate Distortion Function. 10.6 Strongly Typical Sequences and Rate Distortion. 10.7 Characterization of the Rate Distortion Function. 10.8 Computation of Channel Capacity and the Rate Distortion Function. Summary. Problems. Historical Notes. 11. Information Theory and Statistics. 11.1 Method of Types. 11.2 Law of Large Numbers. 11.3 Universal Source Coding. 11.4 Large Deviation Theory. 11.5 Examples of Sanov's Theorem. 11.6 Conditional Limit Theorem. 11.7 Hypothesis Testing. 11.8 Chernoff-Stein Lemma. 11.9 Chernoff Information. 11.10 Fisher Information and the Cram-er-Rao Inequality. Summary. Problems. Historical Notes. 12. Maximum Entropy. 12.1 Maximum Entropy Distributions. 12.2 Examples. 12.3 Anomalous Maximum Entropy Problem. 12.4 Spectrum Estimation. 12.5 Entropy Rates of a Gaussian Process. 12.6 Burg's Maximum Entropy Theorem. Summary. Problems. Historical Notes. 13. Universal Source Coding. 13.1 Universal Codes and Channel Capacity. 13.2 Universal Coding for Binary Sequences. 13.3 Arithmetic Coding. 13.4 Lempel-Ziv Coding. 13.5 Optimality of Lempel-Ziv Algorithms. Compression. Summary. Problems. Historical Notes. 14. Kolmogorov Complexity. 14.1 Models of Computation. 14.2 Kolmogorov Complexity: Definitions and Examples. 14.3 Kolmogorov Complexity and Entropy. 14.4 Kolmogorov Complexity of Integers. 14.5 Algorithmically Random and Incompressible Sequences. 14.6 Universal Probability. 14.7 Kolmogorov complexity. 14.9 Universal Gambling. 14.10 Occam's Razor. 14.11 Kolmogorov Complexity and Universal Probability. 14.12 Kolmogorov Sufficient Statistic. 14.13 Minimum Description Length Principle. Summary. Problems. Historical Notes. 15. Network Information Theory. 15.1 Gaussian Multiple-User Channels. 15.2 Jointly Typical Sequences. 15.3 Multiple-Access Channel. 15.4 Encoding of Correlated Sources. 15.5 Duality Between Slepian-Wolf Encoding and Multiple-Access Channels. 15.6 Broadcast Channel. 15.7 Relay Channel. 15.8 Source Coding with Side Information. 15.9 Rate Distortion with Side Information. 15.10 General Multiterminal Networks. Summary. Problems. Historical Notes. 16. Information Theory and Portfolio Theory. 16.1 The Stock Market: Some Definitions. 16.2 Kuhn-Tucker Characterization of the Log-Optimal Portfolio. 16.3 Asymptotic Optimality of the Log-Optimal Portfolio. 16.4 Side Information and the Growth Rate. 16.5 Investment in Stationary Markets. 16.6 Competitive Optimality of the Log-Optimal Portfolio. 16.7 Universal Portfolios. 16.8 Shannon-McMillan-Breiman Theorem (General AEP). Summary. Problems. Historical Notes. 17. Inequalities in Information Theory. 17.1 Basic Inequalities of Information Theory. 17.2 Differential Entropy. 17.3 Bounds on Entropy and Relative Entropy. 17.4 Inequalities for Types. 17.5 Combinatorial Bounds on Entropy. 17.6 Entropy Rates of Subsets. 17.7 Entropy and Fisher Information. 17.8 Entropy Power Inequality and Brunn-Minkowski Inequality. 17.9 Inequalities for Determinants. 17.10 Inequalities for Ratios of Determinants. Summary. Problems. Historical Notes. Bibliography. List of Symbols. Index.

Content maybe subject to copyright    Report

Citations
More filters
Book
02 Jan 2013
TL;DR: In this paper, the authors provide a detailed description of the basic properties of optimal transport, including cyclical monotonicity and Kantorovich duality, and three examples of coupling techniques.
Abstract: Couplings and changes of variables.- Three examples of coupling techniques.- The founding fathers of optimal transport.- Qualitative description of optimal transport.- Basic properties.- Cyclical monotonicity and Kantorovich duality.- The Wasserstein distances.- Displacement interpolation.- The Monge-Mather shortening principle.- Solution of the Monge problem I: global approach.- Solution of the Monge problem II: Local approach.- The Jacobian equation.- Smoothness.- Qualitative picture.- Optimal transport and Riemannian geometry.- Ricci curvature.- Otto calculus.- Displacement convexity I.- Displacement convexity II.- Volume control.- Density control and local regularity.- Infinitesimal displacement convexity.- Isoperimetric-type inequalities.- Concentration inequalities.- Gradient flows I.- Gradient flows II: Qualitative properties.- Gradient flows III: Functional inequalities.- Synthetic treatment of Ricci curvature.- Analytic and synthetic points of view.- Convergence of metric-measure spaces.- Stability of optimal transport.- Weak Ricci curvature bounds I: Definition and Stability.- Weak Ricci curvature bounds II: Geometric and analytic properties.

5,524 citations

Journal ArticleDOI
TL;DR: This paper presents a tuning method that uses presence-only data for parameter tuning, and introduces several concepts that improve the predictive accuracy and running time of Maxent and describes a new logistic output format that gives an estimate of probability of presence.
Abstract: Accurate modeling of geographic distributions of species is crucial to various applications in ecology and conservation. The best performing techniques often require some parameter tuning, which may be prohibitively time-consuming to do separately for each species, or unreliable for small or biased datasets. Additionally, even with the abundance of good quality data, users interested in the application of species models need not have the statistical knowledge required for detailed tuning. In such cases, it is desirable to use "default settings", tuned and validated on diverse datasets. Maxent is a recently introduced modeling technique, achieving high predictive accuracy and enjoying several additional attractive properties. The performance of Maxent is influenced by a moderate number of parameters. The first contribution of this paper is the empirical tuning of these parameters. Since many datasets lack information about species absence, we present a tuning method that uses presence-only data. We evaluate our method on independently collected high-quality presence-absence data. In addition to tuning, we introduce several concepts that improve the predictive accuracy and running time of Maxent. We introduce "hinge features" that model more complex relationships in the training data; we describe a new logistic output format that gives an estimate of probability of presence; finally we explore "background sampling" strategies that cope with sample selection bias and decrease model-building time. Our evaluation, based on a diverse dataset of 226 species from 6 regions, shows: 1) default settings tuned on presence-only data achieve performance which is almost as good as if they had been tuned on the evaluation data itself; 2) hinge features substantially improve model performance; 3) logistic output improves model calibration, so that large differences in output values correspond better to large differences in suitability; 4) "target-group" background sampling can give much better predictive performance than random background sampling; 5) random background sampling results in a dramatic decrease in running time, with no decrease in model performance.

5,314 citations

01 Jan 2009
TL;DR: This report provides a general introduction to active learning and a survey of the literature, including a discussion of the scenarios in which queries can be formulated, and an overview of the query strategy frameworks proposed in the literature to date.
Abstract: The key idea behind active learning is that a machine learning algorithm can achieve greater accuracy with fewer training labels if it is allowed to choose the data from which it learns. An active learner may pose queries, usually in the form of unlabeled data instances to be labeled by an oracle (e.g., a human annotator). Active learning is well-motivated in many modern machine learning problems, where unlabeled data may be abundant or easily obtained, but labels are difficult, time-consuming, or expensive to obtain. This report provides a general introduction to active learning and a survey of the literature. This includes a discussion of the scenarios in which queries can be formulated, and an overview of the query strategy frameworks proposed in the literature to date. An analysis of the empirical and theoretical evidence for successful active learning, a summary of problem setting variants and practical issues, and a discussion of related topics in machine learning research are also presented.

5,227 citations

Journal ArticleDOI
TL;DR: A new approach toward target representation and localization, the central component in visual tracking of nonrigid objects, is proposed, which employs a metric derived from the Bhattacharyya coefficient as similarity measure, and uses the mean shift procedure to perform the optimization.
Abstract: A new approach toward target representation and localization, the central component in visual tracking of nonrigid objects, is proposed. The feature histogram-based target representations are regularized by spatial masking with an isotropic kernel. The masking induces spatially-smooth similarity functions suitable for gradient-based optimization, hence, the target localization problem can be formulated using the basin of attraction of the local maxima. We employ a metric derived from the Bhattacharyya coefficient as similarity measure, and use the mean shift procedure to perform the optimization. In the presented tracking examples, the new method successfully coped with camera motion, partial occlusions, clutter, and target scale variations. Integration with motion filters and data association techniques is also discussed. We describe only a few of the potential applications: exploitation of background information, Kalman tracking using motion models, and face tracking.

4,996 citations

Journal ArticleDOI
TL;DR: Tree Augmented Naive Bayes (TAN) is single out, which outperforms naive Bayes, yet at the same time maintains the computational simplicity and robustness that characterize naive Baye.
Abstract: Recent work in supervised learning has shown that a surprisingly simple Bayesian classifier with strong assumptions of independence among features, called naive Bayes, is competitive with state-of-the-art classifiers such as C4.5. This fact raises the question of whether a classifier with less restrictive assumptions can perform even better. In this paper we evaluate approaches for inducing classifiers from data, based on the theory of learning Bayesian networks. These networks are factored representations of probability distributions that generalize the naive Bayesian classifier and explicitly represent statements about independence. Among these approaches we single out a method we call Tree Augmented Naive Bayes (TAN), which outperforms naive Bayes, yet at the same time maintains the computational simplicity (no search involved) and robustness that characterize naive Bayes. We experimentally tested these approaches, using problems from the University of California at Irvine repository, and compared them to C4.5, naive Bayes, and wrapper methods for feature selection.

4,775 citations


Cites methods from "Elements of information theory"

  • ...We will concentrate on the essentials and refer the interested reader to Cover and Thomas (1991) for a comprehensive treatment of these notions....

    [...]

  • ...(8) We simplify this term by using the identity known as thechain lawfor mutual information (Cover & Thomas, 1991):IP (X; Y,Z) = IP (X; Z) + IP (X; Y|Z)....

    [...]

References
More filters
Journal ArticleDOI

2,415 citations

Book
Richard E. Blahut1
01 Jan 1987

1,048 citations

Journal ArticleDOI
G. Longo1
01 Oct 1979
TL;DR: The stochastic integral with respect to processes with values in a reflexive Banach space, Theor.
Abstract: 14. M. Metivier, The stochastic integral with respect to processes with values in a reflexive Banach space, Theor. Probability 19 (1974), 758-787. 15. M. Metivier and G. Pistone, Une formule d'isométrie pour r intégrale stochastique Hilbertienne et équations d'évolution linéaires stochastiques, Z. Wahrscheinlicnkeitstheorie und Verw. Gebiete 33 (1975), 1-18. 16. P. A. Meyer, A decomposition theorem for supermartingales, Illinois J. Math. 6 (1962), 193-205. 17. , Intégrales stochastiques. IV, Lecture Notes in Math., vol. 39, Springer-Verlag, Berlin and New York, 1967, pp. 142-162. 18. , Un cours sur les intégrales stochastiques, Lecture Notes in Math., vol. 511, Springer-Verlag, Berlin and New York, 1976, pp. 245-400. 19. , Intégrales Hilbertiennes, Lecture Notes in Math., vol. 581, Springer-Verlag, Berlin and New York, 1977, pp. 446-461. 20. R. E. A. C. Paley, N. Wiener and A. Zygmund, Notes on random functions, Math. Z. 37 (1933), 647-668. 21. J. Pellaumail, Sur lintégrale stochastique et la décomposition de Doob-Meyer, Asterique 9 (1973), 1-125. 22. P. E. Protter, Markov solutions of stochastic differential equations, TL Wahrscheinlichkeitstheorie und Verw. Gebiete 41 (1977), 39-58. 23. , A comparison of stochastic integrals, Ann. Probability (to appear). 24. R. L. Stratonovich, A new representation for stochastic integrals and equations, SIAM. J. Control 4 (1966), 362-371. 25. N. Wiener, Differential-space, J. Math, and Physics 2 (1923), 131-174.

40 citations