Showing papers on "Entropy (information theory) published in 1996"

PDF

Open Access

Book•

A Probabilistic Theory of Pattern Recognition

[...]

Luc Devroye, László Györfi, Gábor Lugosi

01 Jan 1996

TL;DR: The Bayes Error and Vapnik-Chervonenkis theory are applied as guide for empirical classifier selection on the basis of explicit specification and explicit enforcement of the maximum likelihood principle.

...read moreread less

Abstract: Preface * Introduction * The Bayes Error * Inequalities and alternatedistance measures * Linear discrimination * Nearest neighbor rules *Consistency * Slow rates of convergence Error estimation * The regularhistogram rule * Kernel rules Consistency of the k-nearest neighborrule * Vapnik-Chervonenkis theory * Combinatorial aspects of Vapnik-Chervonenkis theory * Lower bounds for empirical classifier selection* The maximum likelihood principle * Parametric classification *Generalized linear discrimination * Complexity regularization *Condensed and edited nearest neighbor rules * Tree classifiers * Data-dependent partitioning * Splitting the data * The resubstitutionestimate * Deleted estimates of the error probability * Automatickernel rules * Automatic nearest neighbor rules * Hypercubes anddiscrete spaces * Epsilon entropy and totally bounded sets * Uniformlaws of large numbers * Neural networks * Other error estimates *Feature extraction * Appendix * Notation * References * Index

...read moreread less

3,598 citations

Journal Article•DOI•

An entropy criterion for assessing the number of clusters in a mixture model

[...]

Gilles Celeux¹, Gilda Soromenho•Institutions (1)

French Institute for Research in Computer Science and Automation¹

01 Sep 1996-Journal of Classification

TL;DR: In this article, an entropy criterion is proposed to estimate the number of clusters arising from a mixture model, which is derived from a relation linking the likelihood and the classification likelihood of a mixture.

...read moreread less

Abstract: In this paper, we consider an entropy criterion to estimate the number of clusters arising from a mixture model. This criterion is derived from a relation linking the likelihood and the classification likelihood of a mixture. Its performance is investigated through Monte Carlo experiments, and it shows favorable results compared to other classical criteria.

...read moreread less

1,689 citations

Journal Article•DOI•

A maximum entropy approach to adaptive statistical language modelling

[...]

Ronald Rosenfeld¹•Institutions (1)

Carnegie Mellon University¹

01 Jul 1996-Computer Speech & Language

TL;DR: An adaptive statistical language model is described, which successfully integrates long distance linguistic information with other knowledge sources, and shows the feasibility of incorporating many diverse knowledge sources in a single, unified statistical framework.

...read moreread less

771 citations

Journal Article•

Fitting Phase-type Distributions via the EM Algorithm

[...]

Ren Asmussen, Olle Nerman, Marita Olsson

01 Jan 1996-Scandinavian Journal of Statistics

TL;DR: An extended EM algorithm is used to minimize the information divergence (maximize the relative entropy) in the density approximation case and fits to Weibull, log normal, and Erlang distributions are used as illustrations of the latter.

...read moreread less

Abstract: Estimation from sample data and density approximation with phase-type distribu- tions are considered. Maximum likelihood estimation via the EM algorithm is discussed and performed for some data sets. An extended EM algorithm is used to minimize the information divergence (maximize the relative entropy) in the density approximation case. Fits to Weibull, log normal, and Erlang distributions are used as illustrations of the latter.

...read moreread less

690 citations

Book•

Function spaces, entropy numbers, differential operators

[...]

David Eric Edmunds¹, Hans Triebel²•Institutions (2)

University of Sussex¹, University of Jena²

28 Aug 1996

TL;DR: In this article, the abstract background of embeddings and function spaces are used to obtain the entropy and approximation numbers of embedding vectors. But the authors do not specify the number of permutations of the permutation vectors.

...read moreread less

Abstract: 1. The abstract background 2. Function spaces 3. Entropy and approximation numbers of embeddings 4. Weighted function spaces and entropy numbers 5. Elliptic operators Bibliography.

...read moreread less

428 citations

Book•

The Ergodic Theory of Discrete Sample Paths

[...]

Paul C. Shields

09 Jul 1996

TL;DR: In this article, the B-processes Bibliography index is used to find entropy-related properties for restricted classes of B-Processes, including entropy related properties and properties.

...read moreread less

Abstract: Basic concepts Entropy-related properties Entropy for restricted classes B-processes Bibliography Index.

...read moreread less

397 citations

Journal Article•DOI•

On minimum cross-entropy thresholding

[...]

Nikhil R. Pal¹•Institutions (1)

Indian Statistical Institute¹

01 Apr 1996-Pattern Recognition

TL;DR: This note first points out a certain conceptual problem and then proposes two algorithms which are free from that problem, which are tested on several data sets and the results are encouraging.

...read moreread less

315 citations

Journal Article•DOI•

Entropy estimation of symbol sequences.

[...]

Thomas Schürmann¹, Peter Grassberger•Institutions (1)

University of Wuppertal¹

01 Sep 1996-Chaos

TL;DR: Algorithms for estimating the Shannon entropy h of finite symbol sequences with long range correlations are considered, and a scaling law is proposed for extrapolation from finite sample lengths.

...read moreread less

Abstract: We discuss algorithms for estimating the Shannon entropy h of finite symbol sequences with long range correlations. In particular, we consider algorithms which estimate h from the code lengths produced by some compression algorithm. Our interest is in describing their convergence with sequence length, assuming no limits for the space and time complexities of the compression algorithms. A scaling law is proposed for extrapolation from finite sample lengths. This is applied to sequences of dynamical systems in non‐trivial chaotic regimes, a 1‐D cellular automaton, and to written English texts.

...read moreread less

304 citations

Proceedings Article•

Error-based and entropy-based discretization of continuous features

[...]

Ron Kohavi, Mehran Sahami¹•Institutions (1)

Stanford University¹

02 Aug 1996

TL;DR: This study includes both an extensive empirical comparison as well as an analysis of scenarios where error minimization may be an inappropriate discretization criterion, and analyzes the shortcomings of error-based approaches in comparison to entropy-based methods.

...read moreread less

Abstract: We present a comparison of error-based and entropy-based methods for discretization of continuous features. Our study includes both an extensive empirical comparison as well as an analysis of scenarios where error minimization may be an inappropriate discretization criterion. We present a discretization method based on the C4.5 decision tree algorithm and compare it to an existing entropy-based discretization algorithm, which employs the Minimum Description Length Principle, and a recently proposed error-based technique. We evaluate these discretization methods with respect to C4.5 and Naive-Bayesian classifiers on datasets from the UCI repository and analyze the computational complexity of each method. Our results indicate that the entropy-based MDL heuristic outperforms error minimization on average. We then analyze the shortcomings of error-based approaches in comparison to entropy-based methods.

...read moreread less

303 citations

Journal Article•DOI•

Information measures, effective complexity, and total information

[...]

Murray Gell-Mann¹, Murray Gell-Mann², Murray Gell-Mann³, Seth Lloyd²•Institutions (3)

University of New Mexico¹, Santa Fe Institute², Los Alamos National Laboratory³

01 Sep 1996-Complexity

TL;DR: This article defines the concept of an information measure and shows how common information measures such as entropy, Shannon information, and algorithmic information content can be combined to solve problems of characterization, inference, and learning for complex systems.

...read moreread less

Abstract: This article defines the concept of an information measure and shows how common information measures such as entropy, Shannon information, and algorithmic information content can be combined to solve problems of characterization, inference, and learning for complex systems. Particularly useful quantities are the effective complexity, which is roughly the length of a compact description of the identified regularities of an entity, and total information, which is effective complexity plus an entropy term that measures the information required to describe the random aspects of the entity. Mathematical definitions are given for both quantities and some applications are discussed. In particular, it is pointed out that if one compares different sets of identified regularities of an entity, the ‘best’ set minimizes the total information, and then, subject to that constraint, minimizes the effective complexity; the resulting effective complexity is then in many respects independent of the observer. © 1996 John Wiley & Sons, Inc.

...read moreread less

300 citations

Journal Article•DOI•

Minimum cross-entropy threshold selection

[...]

Anton D. Brink¹, Neil Pendock²•Institutions (2)

University of Pretoria¹, University of the Witwatersrand²

01 Jan 1996-Pattern Recognition

TL;DR: It is shown that this measure, the cross-entropy, is related to other commonly used measures of distance or similarity under special conditions, although it is in some senses more general.

...read moreread less

Journal Article•DOI•

Minimal entropy and Mostow's rigidity theorems

[...]

Gérard Besson, Gilles Courtois¹, Sylvestre Gallot²•Institutions (2)

École Polytechnique¹, École normale supérieure de Lyon²

01 Aug 1996-Ergodic Theory and Dynamical Systems

TL;DR: The idea of recognizing special metrics in terms of this invariant looks at first glance very optimistic as discussed by the authors, since the entropy is sensitive to changes of scale which makes it a bad invariant: however, this is circumvented by looking at the behaviour of the entropy functional on the space of metrics with fixed volume.

...read moreread less

Abstract: Let (Y, g) be a compact connected n-dimensional Riemannian manifold and let () be its universal cover endowed with the pulled-back metric. If y ∈ , we definewhere B(y, R) denotes the ball of radius R around y in . It is a well known fact that this limit exists and does not depend on y ([Man]). The invariant h(g) is called the volume entropy of the metric g but, for the sake of simplicity, we shall use the term entropy. The idea of recognizing special metrics in terms of this invariant looks at first glance very optimistic. First the entropy, which behaves like the inverse of a distance, is sensitive to changes of scale which makes it a bad invariant: however, this is a minor drawback that can be circumvented by looking at the behaviour of the entropy functional on the space of metrics with fixed volume (equal to one for example). Nevertheless, it seems very unlikely that two numbers, the entropy and the volume, might characterize any metric. The very first person to consider such a possibility was Katok ([Kat1]). In this article the entropy is thought of as a dynamical invariant which actually is suggested by its name. More precisely, let us define this dynamical invariant, which is called the topological entropy: let (M, d) be a compact metric space and ψt, a flow on it, we define.

...read moreread less

Journal Article•DOI•

Simulation of random processes and rate-distortion theory

[...]

Yossef Steinberg¹, Sergio Verdu•Institutions (1)

Ben-Gurion University of the Negev¹

01 Jan 1996-IEEE Transactions on Information Theory

TL;DR: In the case of Ornstein, Prohorov and other distances of the Kantorovich-Vasershtein type, it is shown that the finite-precision resolvability is equal to the rate-distortion function with a fidelity criterion derived from the accuracy measure, which leads to new results on nonstationary rate- Distortion theory.

...read moreread less

Abstract: We study the randomness necessary for the simulation of a random process with given distributions, on terms of the finite-precision resolvability of the process. Finite-precision resolvability is defined as the minimal random-bit rate required by the simulator as a function of the accuracy with which the distributions are replicated. The accuracy is quantified by means of various measures: variational distance, divergence, Orstein (1973), Prohorov (1956) and related measures of distance between the distributions of random process. In the case of Ornstein, Prohorov and other distances of the Kantorovich-Vasershtein type, we show that the finite-precision resolvability is equal to the rate-distortion function with a fidelity criterion derived from the accuracy measure. This connection leads to new results on nonstationary rate-distortion theory. In the case of variational distance, the resolvability of stationary ergodic processes is shown to equal entropy rate regardless of the allowed accuracy. In the case of normalized divergence, explicit expressions for finite-precision resolvability are obtained in many cases of interest; and connections with data compression with minimum probability of block error are shown.

...read moreread less

Journal Article•DOI•

The Shannon information entropy of protein sequences.

[...]

B.J. Strait¹, T.G. Dewey¹•Institutions (1)

University of Denver¹

01 Jul 1996-Biophysical Journal

TL;DR: The gambler algorithm, an algorithm based on the Chou-Fasman rules for protein structure, gives significantly lower entropies than the k-tuplet analysis, and the number of most probable protein sequences can be calculated.

...read moreread less

Journal Article•DOI•

Nonlinear dynamical analysis of speech

[...]

Arun Kumar, S. K. Mullick

01 Jul 1996-Journal of the Acoustical Society of America

TL;DR: It is found that most speech signals in the form of phoneme articulations are low dimensional, and the second‐order dynamical entropy of speech time series is found to be a lower bound of metric entropy.

...read moreread less

Abstract: This paper reports results of the estimation of dynamical invariants, namely Lyapunov exponents, dimension, and metric entropy for speech signals. Two optimality criteria from dynamical systems literature, namely singular value decomposition method and the redundancy method, are used to reconstruct state space trajectories of speech and make observations. The positive values of the largest Lyapunov exponent of speech signals in the form of phoneme articulations show the average exponential divergence of nearby trajectories in the reconstructed state space. The dimension of a time series is a measure of its complexity and gives bounds on the number of state space variables needed to model it. It is found that most speech signals in the form of phoneme articulations are low dimensional. For comparison, a statistical model of a speech time series is also used to estimate the correlation dimension. The second‐order dynamical entropy (which is a lower bound of metric entropy) of speech time series is found to ...

...read moreread less

Journal Article•DOI•

Orientational correlations and entropy in liquid water

[...]

Themis Lazaridis, Martin Karplus

08 Sep 1996-Journal of Chemical Physics

TL;DR: In this article, two approaches for obtaining an approximation to the radial distribution function (RDF) and an orientational distribution function, which is a function of the five angles for each distance between the molecules, are introduced.

...read moreread less

Abstract: The molecular pair correlation function in water is a function of a distance and five angles. It is here separated into the radial distribution function (RDF), which is only a function of distance, and an orientational distribution function (ODF), which is a function of the five angles for each distance between the molecules. While the RDF can be obtained from computer simulations, this is not practical for the ODF due to its high dimensionality. Two approaches for obtaining an approximation to the ODF are introduced. The first uses a product of one‐ and two‐dimensional marginal distributions from computer simulations. The second uses the gas‐phase low‐density limit as a reference and applies corrections based on (a) the orientationally averaged interactions in the liquid calculated by simulations, and (b) the observed differences in the one‐ and two‐dimensional marginal distributions in the gas and in the liquid. The site superposition approximation was also tested and found to be inadequate for reproducing the orientationally averaged interaction energy and the angular distributions obtained from the simulations. The two approximations to the pair correlation function are employed to estimate the contribution of two‐particle correlations to the excess entropy of TIP4P water. The calculated value is comparable to the excess entropy of TIP4P water estimated by other methods and to the experimental excess entropy of liquid water. More than 90% of the orientational part of the excess entropy is due to correlations between first neighbors. The change in excess entropy with temperature gives a value for the heat capacity that agrees within statistical uncertainty with that obtained from the change in energy with temperature and is reasonably close to the experimental value for water. The effect of pressure on the entropy was examined and it was found that increase in the pressure (density) causes a reduction of orientational correlations, in agreement with the idea of pressure as a ‘‘structure breaker’’ in water. The approach described here provides insight concerning the nature of the contributions to the excess entropy of water and should be applicable to other simple molecular fluids.

...read moreread less

Journal Article•DOI•

The Burrows–Wheeler Transform for Block Sorting Text Compression: Principles and Improvements

[...]

Peter M. Fenwick¹•Institutions (1)

University of Auckland¹

01 Jan 1996-The Computer Journal

TL;DR: This paper investigates the fundamental operation of the block sorting algorithm and presents some improvements based on that analysis, which relates the compression to the proportion of zeros after the MTF stage.

...read moreread less

Abstract: A recent development in text compression is a 'block sorting' algorithm which permutes the input text according to a special sort procedure and then processes the permuted text with Move-ToFront (MTF) and a final statistical compressor. The technique combines good speed with excellent compression performance. This paper investigates the fundamental operation of the algorithm and presents some improvements based on that analysis. Although block sorting is clearly related to previous compression techniques, it appears that it is best described by techniques derived from work by Shannon on the prediction and entropy of English text. A simple model is developed which relates the compression to the proportion of zeros after the MTF stage.

...read moreread less

Journal Article•

Root-n consistent estimators of entropy for densities with unbounded support

[...]

Alexandre B. Tsybakov, E. C. Van Der Meulen

01 Jan 1996-Scandinavian Journal of Statistics

TL;DR: In this paper, the authors consider a truncated version of the entropy estimator and prove the mean square /spl radic/nconsistency of this estimator for a class of densities with unbounded support, including the Gaussian density.

...read moreread less

Abstract: We consider a truncated version of the entropy estimator and prove the mean square /spl radic/n-consistency of this estimator for a class of densities with unbounded support, including the Gaussian density.

...read moreread less

Proceedings Article•

On the sample complexity of learning Bayesian networks

[...]

Nir Friedman¹, Zohar Yakhini¹•Institutions (1)

Stanford University¹

01 Aug 1996

TL;DR: The sample complexity of MDL based learning procedures for Bayesian networks is examined and the number of samples needed to learn an e-close approximation with confidence δ is shown, which means that the sample complexity is a low-order polynomial in the error threshold and sub-linear in the confidence bound.

...read moreread less

Abstract: In recent years there has been an increasing interest in learning Bayesian networks from data. One of the most effective methods for learning such networks is based on the minimum description length (MDL) principle. Previous work has shown that this learning procedure is asymptotically successful: with probability one, it will converge to the target distribution, given a sufficient number of samples. However, the rate of this convergence has been hitherto unknown. In this work we examine the sample complexity of MDL based learning procedures for Bayesian networks. We show that the number of samples needed to learn an e-close approximation (in terms of entropy distance) with confidence δ is O ((1/e)4/3 log 1/e log 1/δ log log 1/δ). This means that the sample complexity is a low-order polynomial in the error threshold and sub-linear in the confidence bound. We also discuss how the constants in this term depend on the complexity of the target distribution. Finally, we address questions of asymptotic minimality and propose a method for using the sample complexity results to speed up the learning process.

...read moreread less

Journal Article•DOI•

Coarse-grained entropy rates for characterization of complex time series

[...]

Milan Paluš¹, Milan Paluš²•Institutions (2)

Academy of Sciences of the Czech Republic¹, Santa Fe Institute²

15 May 1996-Physica D: Nonlinear Phenomena

TL;DR: A method for estimation of coarse-grained entropy rates (CER's) from time series is presented, based on information-theoretic functionals---redundancies, which shows potential application of the CER's in analysis of electrophysiological signals or other complex time series.

...read moreread less

Journal Article•DOI•

A global optimization technique for statistical classifier design

[...]

David J. Miller¹, Ajit V. Rao², Kenneth Rose², Allen Gersho²•Institutions (2)

Pennsylvania State University¹, University of California, Santa Barbara²

01 Dec 1996-IEEE Transactions on Signal Processing

TL;DR: The theoretical basis for the annealing method is derived, on which it is based the development of a novel design algorithm and its effectiveness and superior performance in the design of practical classifiers for some of the most popular structures currently in use are demonstrated.

...read moreread less

Abstract: A global optimization method is introduced that minimize the rate of misclassification. We first derive the theoretical basis for the method, on which we base the development of a novel design algorithm and demonstrate its effectiveness and superior performance in the design of practical classifiers for some of the most popular structures currently in use. The method, grounded in ideas from statistical physics and information theory, extends the deterministic annealing approach for optimization, both to incorporate structural constraints on data assignments to classes and to minimize the probability of error as the cost objective. During the design, data are assigned to classes in probability so as to minimize the expected classification error given a specified level of randomness, as measured by Shannon's entropy. The constrained optimization is equivalent to a free-energy minimization, motivating a deterministic annealing approach in which the entropy and expected misclassification cost are reduced with the temperature while enforcing the classifier's structure. In the limit, a hard classifier is obtained. This approach is applicable to a variety of classifier structures, including the widely used prototype-based, radial basis function, and multilayer perceptron classifiers. The method is compared with learning vector quantization, back propagation (BP), several radial basis function design techniques, as well as with paradigms for more directly optimizing all these structures to minimize probability of error. The annealing method achieves significant performance gains over other design methods on a number of benchmark examples from the literature, while often retaining design complexity comparable with or only moderately greater than that of strict descent methods. Substantial gains, both inside and outside the training set, are achieved for complicated examples involving high-dimensional data and large class overlap.

...read moreread less

Journal Article•DOI•

Development of low entropy coding in a recurrent network.

[...]

George F Harpur¹, Richard W. Prager¹•Institutions (1)

University of Cambridge¹

01 May 1996-Network: Computation In Neural Systems

TL;DR: An unsupervised neural network which exhibits competition between units via inhibitory feedback is presented, and it is shown how the assignment of prior probabilities to network outputs can help to reduce entropy.

...read moreread less

Abstract: In this paper we present an unsupervised neural network which exhibits competition between units via inhibitory feedback. The operation is such as to minimize reconstruction error, both for individual patterns, and over the entire training set. A key difference from networks which perform principal components analysis, or one of its variants, is the ability to converge to non-orthogonal weight values. We discuss the network's operation in relation to the twin goals of maximizing information transfer and minimizing code entropy, and show how the assignment of prior probabilities to network outputs can help to reduce entropy. We present results from two binary coding problems, and from experiments with image coding.

...read moreread less

Proceedings Article•DOI•

FRAME: filters, random fields, and minimax entropy towards a unified theory for texture modeling

[...]

Song-Chun Zhu¹, Ying Nian Wu, David Mumford•Institutions (1)

Harvard University¹

18 Jun 1996

TL;DR: These experiments demonstrate that many textures previously considered as different categories can be modeled and synthesized in a common framework, and interprets and clarifies many previous concepts and methods for texture analysis and synthesis from a unified point of view.

...read moreread less

Abstract: In this paper, a minimax entropy principle is studied, based on which a novel theory, called FRAME (Filters, Random fields And Minimax Entropy) is proposed for texture modeling. FRAME combines attractive aspects of two important themes in texture modeling: multi-channel filtering and Markov random field (MRF) modeling. It incorporates the responses of a set of well selected filters into the distribution over a random field and hence has a much stronger descriptive ability than the traditional MRF models. Furthermore, it interprets and clarifies many previous concepts and methods for texture analysis and synthesis from a unified point of view. Algorithms are proposed for probability inference, stochastic simulation and filter selection. Experiments on a variety of textures are described to illustrate our theory and to show the performance of our algorithms. These experiments demonstrate that many textures previously considered as different categories can be modeled and synthesized in a common framework.

...read moreread less

Journal Article•DOI•

A Maximum Entropy Approach to Recovering Information from Multinomial Response Data

[...]

Amos Ggolan¹, George G. Judge¹, Jeffrey M. Perloff¹•Institutions (1)

University of California, Berkeley¹

01 Jun 1996-Journal of the American Statistical Association

TL;DR: The generalized maximum entropy (GME) model as discussed by the authors includes noise terms in the multinomial information constraints, each noise term is modeled as the mean of a finite set of a priori known points in the interval [−1,1] with unknown probabilities where no parametric assumptions about the error distribution are made.

...read moreread less

Abstract: The classical maximum entropy (ME) approach to estimating the unknown parameters of a multinomial discrete choice problem, which is equivalent to the maximum likelihood multinomial logit (ML) estimator, is generalized. The generalized maximum entropy (GME) model includes noise terms in the multinomial information constraints. Each noise term is modeled as the mean of a finite set of a priori known points in the interval [−1,1] with unknown probabilities where no parametric assumptions about the error distribution are made. A GME model for the multinomial probabilities and for the distributions associated with the noise terms is derived by maximizing the joint entropy of multinomial and noise distributions, under the assumption of independence. The GME formulation reduces to the ME in the limit as the sample grows large or when no noise is included in the entropy maximization. Further, even though the GME and the logit estimators are conceptually different, the dual GME model is related to a gener...

...read moreread less

Journal Article•DOI•

Framework for Entropy-based Map Evaluation

[...]

Jan T. Bjørke

01 Jan 1996

TL;DR: How the Shannon information theory (Shannon and Weaver 1964) can be used to compute an evaluation index of a map, i.e., a parameter which measures the efficiency of the map is demonstrated.

...read moreread less

Abstract: The automation of map design is a challenging task for both researchers and designers of spatial information systems. A main problem in automation is the quantification and formalization of the properties of the process to be automated. This article contributes to the formalization of some steps in the processes involved in map design and demonstrates how the Shannon information theory (Shannon and Weaver 1964) can be used to compute an evaluation index of a map, i.e., a parameter which measures the efficiency of the map. Throughout this article, the term "information" is mostly used in a narrow sense and the application of information theory is restricted to the syntactic level of cartographic communication. Information sources for map entropy computations are identified and elaborated on. A special class of map information sources are defined and termed "orthogonal map information sources". Further, a strategy to consider spatial properties of a map in entropy computations is presented. At the end of th...

...read moreread less

Journal Article•DOI•

Multidimensional density shaping by sigmoids

[...]

Z. Roth¹, Y. Baram•Institutions (1)

Advanced Technology Center¹

01 Sep 1996-IEEE Transactions on Neural Networks

TL;DR: An estimate of the probability density function of a random vector is obtained by maximizing the output entropy of a feedforward network of sigmoidal units with respect to the input weights.

...read moreread less

Abstract: An estimate of the probability density function of a random vector is obtained by maximizing the output entropy of a feedforward network of sigmoidal units with respect to the input weights. Classification problems can be solved by selecting the class associated with the maximal estimated density. Newton's optimization method, applied to the estimated density, yields a recursive estimator for a random variable or a random sequence. A constrained connectivity structure yields a linear estimator, which is particularly suitable for "real time" prediction. A Gaussian nonlinearity yields a closed-form solution for the network's parameters, which may also be used for initializing the optimization algorithm when other nonlinearities are employed. A triangular connectivity between the neurons and the input, which is naturally suggested by the statistical setting, reduces the number of parameters. Applications to classification and forecasting problems are demonstrated.

...read moreread less

Journal Article•DOI•

Limitation on the Amount of Accessible Information in a Quantum Channel

[...]

Benjamin Schumacher¹, Benjamin Schumacher², Michael D. Westmoreland³, William K. Wootters⁴•Institutions (4)

Los Alamos National Laboratory¹, Kenyon College², Denison University³, Williams College⁴

29 Apr 1996-Physical Review Letters

TL;DR: A new result limiting the amount of accessible information in a quantum channel is proved, which generalizes Kholevo{close_quote}s theorem and implies it as a simple corollary.

...read moreread less

Abstract: We prove a new result limiting the amount of accessible information in a quantum channel This generalizes Kholevo's theorem and implies it as a simple corollary Our proof uses the strong subadditivity of the von Neumann entropy functional $S(\ensuremath{\rho})$ and a specific physical analysis of the measurement process The result presented here has application in information obtained from ``weak'' measurements, such as those sometimes considered in quantum cryptography

...read moreread less

Book•

Introduction to Coding and Information Theory

[...]

Steven Roman¹•Institutions (1)

California State University¹

26 Nov 1996

TL;DR: This book is intended to introduce coding theory and information theory to undergraduate students of mathematics and computer science.

...read moreread less

Abstract: This book is intended to introduce coding theory and information theory to undergraduate students of mathematics and computer science. It begins with a review of probablity theory as applied to finite sample spaces and a general introduction to the nature and types of codes. The two subsequent chapters discuss information theory: efficiency of codes, the entropy of information sources, and Shannon's Noiseless Coding Theorem. The remaining three chapters deal with coding theory: communication channels, decoding in the presence of errors, the general theory of linear codes, and such specific codes as Hamming codes, the simplex codes, and many others.

...read moreread less

Book Chapter•DOI•

Maxent, Mathematics, and Information Theory

[...]

Imre Csiszár¹•Institutions (1)

Hungarian Academy of Sciences¹

01 Jan 1996

TL;DR: This is a mathematically oriented survey about the method of maximum entropy or minimum I-divergence, with a critical treatment of its various justifications and relation to Bayesian statistics.

...read moreread less

Abstract: This is a mathematically oriented survey about the method of maximum entropy or minimum I-divergence, with a critical treatment of its various justifications and relation to Bayesian statistics. Information theoretic ideas are given substantial attention, including “information geometry”. The axiomatic approach is considered as the best justification of maxent, as well as of alternate methods of minimizing some Bregman distance or f-divergence other than I-divergence. The possible interpretation of such alternate methods within the original maxent paradigm is also considered.

...read moreread less

Journal Article•DOI•

Deconvolution of astronomical images using the multiscale maximum entropy method

[...]

Eric Pantin¹, Jean-Luc Starck¹•Institutions (1)

DSM¹

01 Sep 1996-Astronomy & Astrophysics Supplement Series

TL;DR: This work uses the wavelet transform, a mathematical tool to decompose a signal into dierent frequency bands, to introduce the concept of multi-scale entropy of an image, leading to a better restoration at all spatial frequencies.

...read moreread less

Abstract: Following the ideas of Bontekoe et al. who noticed that the classical Maximum Entropy Method (MEM) had diculties to eciently restore high and low spatial frequency structure in an image at the same time, we use the wavelet transform, a mathematical tool to decompose a signal into dierent frequency bands. We introduce the concept of multi-scale entropy of an image, leading to a better restoration at all spatial frequencies. This deconvolution method is flux conservative and the use of a multiresolution support solves the problem of MEM to choose the parameter, i.e. the relative weight between the goodness-of-t and the entropy. We show that our algorithm is ecient too for ltering astronomical images. A range of practical examples illustrate this approach.

...read moreread less

Collapse