scispace - formally typeset
Search or ask a question
Posted Content

On Information and Sufficiency

01 Feb 1997-Research Papers in Economics (Santa Fe Institute)-
TL;DR: The information deviation between any two finite measures cannot be increased by any statistical operations (Markov morphisms) and is invarient if and only if the morphism is sufficient for these two measures as mentioned in this paper.
Abstract: The information deviation between any two finite measures cannot be increased by any statistical operations (Markov morphisms). It is invarient if and only if the morphism is sufficient for these two measures
Citations
More filters
01 Dec 2010
TL;DR: This chapter discusses quantum information theory, public-key cryptography and the RSA cryptosystem, and the proof of Lieb's theorem.
Abstract: Part I. Fundamental Concepts: 1. Introduction and overview 2. Introduction to quantum mechanics 3. Introduction to computer science Part II. Quantum Computation: 4. Quantum circuits 5. The quantum Fourier transform and its application 6. Quantum search algorithms 7. Quantum computers: physical realization Part III. Quantum Information: 8. Quantum noise and quantum operations 9. Distance measures for quantum information 10. Quantum error-correction 11. Entropy and information 12. Quantum information theory Appendices References Index.

14,825 citations

Journal ArticleDOI
TL;DR: This historical survey compactly summarizes relevant work, much of it from the previous millennium, review deep supervised learning, unsupervised learning, reinforcement learning & evolutionary computation, and indirect search for short programs encoding deep and large networks.

14,635 citations


Cites methods from "On Information and Sufficiency"

  • ...Many UL methods are designed to maximize entropy-related, information-theoretic (Boltzmann, 1909; Kullback & Leibler, 1951; Shannon, 1948) objectives (e.g., Amari, Cichocki, & Yang, 1996; Barlowet al., 1989;Dayan&Zemel, 1995;Deco&Parra, 1997; Field, 1994; Hinton, Dayan, Frey, & Neal, 1995; Linsker,…...

    [...]

  • ...Many UL methods are designed to maximize entropy-related, information-theoretic (Boltzmann, 1909; Shannon, 1948; Kullback and Leibler, 1951) objectives (e....

    [...]

Christopher M. Bishop1
01 Jan 2006
TL;DR: Probability distributions of linear models for regression and classification are given in this article, along with a discussion of combining models and combining models in the context of machine learning and classification.
Abstract: Probability Distributions.- Linear Models for Regression.- Linear Models for Classification.- Neural Networks.- Kernel Methods.- Sparse Kernel Machines.- Graphical Models.- Mixture Models and EM.- Approximate Inference.- Sampling Methods.- Continuous Latent Variables.- Sequential Data.- Combining Models.

10,141 citations

Journal ArticleDOI
TL;DR: A thorough exposition of community structure, or clustering, is attempted, from the definition of the main elements of the problem, to the presentation of most methods developed, with a special focus on techniques designed by statistical physicists.
Abstract: The modern science of networks has brought significant advances to our understanding of complex systems. One of the most relevant features of graphs representing real systems is community structure, or clustering, i. e. the organization of vertices in clusters, with many edges joining vertices of the same cluster and comparatively few edges joining vertices of different clusters. Such clusters, or communities, can be considered as fairly independent compartments of a graph, playing a similar role like, e. g., the tissues or the organs in the human body. Detecting communities is of great importance in sociology, biology and computer science, disciplines where systems are often represented as graphs. This problem is very hard and not yet satisfactorily solved, despite the huge effort of a large interdisciplinary community of scientists working on it over the past few years. We will attempt a thorough exposition of the topic, from the definition of the main elements of the problem, to the presentation of most methods developed, with a special focus on techniques designed by statistical physicists, from the discussion of crucial issues like the significance of clustering and how methods should be tested and compared against each other, to the description of applications to real networks.

9,057 citations


Cites methods from "On Information and Sufficiency"

  • ...Here the snapshot cost28 is the Kullback-Leibler (KL) divergence [389] between the adjacency/similarity matrix at time t and the matrix describing the community structure of the graph at time t; the historical cost is the KL divergence between the matrices describing the community structure of the graph at times t − 1 and t ....

    [...]

Journal ArticleDOI
TL;DR: Various facets of such multimodel inference are presented here, particularly methods of model averaging, which can be derived as a non-Bayesian result.
Abstract: The model selection literature has been generally poor at reflecting the deep foundations of the Akaike information criterion (AIC) and at making appropriate comparisons to the Bayesian information...

8,933 citations


Cites background from "On Information and Sufficiency"

  • ...In 1951 S. Kullback and R. A. Leibler published a now-famous paper (Kullback and Leibler 1951) that quantified the meaning of “information" as related to R. A. Fisher's concept of sufficient statistics....

    [...]

References
More filters
01 Jan 1936

6,325 citations


"On Information and Sufficiency" refers background in this paper

  • ...A special case of this divergence is Mahalanobis' generalized distance [13]....

    [...]

Journal ArticleDOI
TL;DR: In this paper, the authors define the center of location as the abscissa of a frequency curve for which the sampling errors of optimum location are uncorrelated with those of optimum scaling.
Abstract: Centre of Location. That abscissa of a frequency curve for which the sampling errors of optimum location are uncorrelated with those of optimum scaling. (9.)

3,392 citations

Journal ArticleDOI
01 Jul 1925
TL;DR: It has been pointed out to me that some of the statistical ideas employed in the following investigation have never received a strictly logical definition and analysis, and it is desirable to set out for criticism the manner in which the logical foundations of these ideas may be established.
Abstract: It has been pointed out to me that some of the statistical ideas employed in the following investigation have never received a strictly logical definition and analysis The idea of a frequency curve, for example, evidently implies an infinite hypothetical population distributed in a definite manner; but equally evidently the idea of an infinite hypothetical population requires a more precise logical specification than is contained in that phrase The same may be said of the intimately connected idea of random sampling These ideas have grown up in the minds of practical statisticians and lie at the basis especially of recent work; there can be no question of their pragmatic value It was no part of my original intention to deal with the logical bases of these ideas, but some comments which Dr Burnside has kindly made have convinced me that it may be desirable to set out for criticism the manner in which I believe the logical foundations of these ideas may be established

2,464 citations

Journal ArticleDOI
TL;DR: It is shown that a certain differential form depending on the values of the parameters in a law of chance is invariant for all transformations of the parameter when the law is differentiable with regard to all parameters.
Abstract: It is shown that a certain differential form depending on the values of the parameters in a law of chance is invariant for all transformations of the parameters when the law is differentiable with regard to all parameters. For laws containing a location and a scale parameter a form with a somewhat restricted type of invariance is found even when the law is not everywhere differentiable with regard to the parameters. This form has the properties required to give a general rule for stating the prior probability in a large class of estimation problems.

2,292 citations


"On Information and Sufficiency" refers methods in this paper

  • ...Jeffreys (par....

    [...]

  • ...The particular measure of divergence we use has been considered by Jeffreys ([10), [111) in another connection....

    [...]

01 Jan 1943

2,183 citations


"On Information and Sufficiency" refers background in this paper

  • ...We are also concerned with the statistical problem of discrimination ([31, [17]), by considering a measure of the "distance" or "divergence" between statistical populations ([1), [21, [131) in terms of our measure of information....

    [...]