scispace - formally typeset
Search or ask a question

Showing papers on "Entropy (information theory) published in 2004"


Journal ArticleDOI
TL;DR: Two classes of improved estimators for mutual information M(X,Y), from samples of random points distributed according to some joint probability density mu(x,y), based on entropy estimates from k -nearest neighbor distances are presented.
Abstract: We present two classes of improved estimators for mutual information M(X,Y), from samples of random points distributed according to some joint probability density mu(x,y). In contrast to conventional estimators based on binnings, they are based on entropy estimates from k -nearest neighbor distances. This means that they are data efficient (with k=1 we resolve structures down to the smallest possible scales), adaptive (the resolution is higher where data are more numerous), and have minimal bias. Indeed, the bias of the underlying entropy estimates is mainly due to nonuniformity of the density at the smallest resolved scale, giving typically systematic errors which scale as functions of k/N for N points. Numerically, we find that both families become exact for independent distributions, i.e. the estimator M(X,Y) vanishes (up to statistical fluctuations) if mu(x,y)=mu(x)mu(y). This holds for all tested marginal distributions and for all dimensions of x and y. In addition, we give estimators for redundancies between more than two random variables. We compare our algorithms in detail with existing algorithms. Finally, we demonstrate the usefulness of our estimators for assessing the actual independence of components obtained from independent component analysis (ICA), for improving ICA, and for estimating the reliability of blind source separation.

3,224 citations


Journal ArticleDOI
TL;DR: By inductive arguments employing the entropy power inequality of information theory, and a new quantizer error bound, an explicit expression for the infimum stabilizing data rate is derived, under very mild conditions on the initial state and noise probability distributions.
Abstract: Feedback control with limited data rates is an emerging area which incorporates ideas from both control and information theory. A fundamental question it poses is how low the closed-loop data rate can be made before a given dynamical system is impossible to stabilize by any coding and control law. Analogously to source coding, this defines the smallest error-free data rate sufficient to achieve "reliable" control, and explicit expressions for it have been derived for linear time-invariant systems without disturbances. In this paper, the more general case of finite-dimensional linear systems with process and observation noise is considered, the object being mean square state stability. By inductive arguments employing the entropy power inequality of information theory, and a new quantizer error bound, an explicit expression for the infimum stabilizing data rate is derived, under very mild conditions on the initial state and noise probability distributions.

740 citations


Journal ArticleDOI
TL;DR: An alternative measure of uncertainty that extends Shannon entropy to random variables with continuous distributions called cumulative residual entropy (CRE), which is more general than the Shannon entropy in that its definition is valid in the continuous and discrete domains.
Abstract: In this paper, we use the cumulative distribution of a random variable to define its information content and thereby develop an alternative measure of uncertainty that extends Shannon entropy to random variables with continuous distributions. We call this measure cumulative residual entropy (CRE). The salient features of CRE are as follows: 1) it is more general than the Shannon entropy in that its definition is valid in the continuous and discrete domains, 2) it possesses more general mathematical properties than the Shannon entropy, and 3) it can be easily computed from sample data and these computations asymptotically converge to the true values. The properties of CRE and a precise formula relating CRE and Shannon entropy are given in the paper. Finally, we present some applications of CRE to reliability engineering and computer vision.

515 citations


Journal ArticleDOI
TL;DR: In this article, the authors show that the problem of maximizing entropy and minimizing a related discrepancy or divergence between distributions can be viewed as dual problems, with the solution to each providing that to the other.
Abstract: We describe and develop a close relationship between two problems that have customarily been regarded as distinct: that of maximizing entropy, and that of minimizing worst-case expected loss Using a formulation grounded in the equilibrium theory of zero-sum games between Decision Maker and Nature, these two problems are shown to be dual to each other, the solution to each providing that to the other Although Topsoe described this connection for the Shannon entropy over 20 years ago, it does not appear to be widely known even in that important special case We here generalize this theory to apply to arbitrary decision problems and loss functions We indicate how an appropriate generalized definition of entropy can be associated with such a problem, and we show that, subject to certain regularity conditions, the above-mentioned duality continues to apply in this extended context This simultaneously provides a possible rationale for maximizing entropy and a tool for finding robust Bayes acts We also describe the essential identity between the problem of maximizing entropy and that of minimizing a related discrepancy or divergence between distributions This leads to an extension, to arbitrary discrepancies, of a well-known minimax theorem for the case of Kullback–Leibler divergence (the “redundancy-capacity theorem” of information theory) For the important case of families of distributions having certain mean values specified, we develop simple sufficient conditions and methods for identifying the desired solutions We use this theory to introduce a new concept of “generalized exponential family” linked to the specific decision problem under consideration, and we demonstrate that this shares many of the properties of standard exponential families Finally, we show that the existence of an equilibrium in our game can be rephrased in terms of a “Pythagorean property” of the related divergence, thus generalizing previously announced results for Kullback–Leibler and Bregman divergences

502 citations


Proceedings ArticleDOI
26 Apr 2004
TL;DR: Analytical modeling and simulations reveal that while the nature of optimal routing with compression does depend on the correlation level, surprisingly, there exists a practical static clustering scheme which can provide near-optimal performance for a wide range of spatial correlations.
Abstract: The efficacy of data aggregation in sensor networks is a function of the degree of spatial correlation in the sensed phenomenon. While several data aggregation (i.e., routing with data compression) techniques have been proposed in the literature, an understanding of the performance of various data aggregation schemes across the range of spatial correlations is lacking. We analyze the performance of routing with compression in wireless sensor networks using an application-independent measure of data compression (an empirically obtained approximation for the joint entropy of sources as a function of the distance between them) to quantify the size of compressed information, and a bit-hop metric to quantify the total cost of joint routing with compression. Analytical modelling and simulations reveal that while the nature of optimal routing with compression does depend on the correlation level, surprisingly, there exists a practical static clustering scheme which can provide near-optimal performance for a wide range of spatial correlations. This result is of great practical significance as it shows that a simple cluster-based system design can perform as well as sophisticated adaptive schemes for joint routing and compression.

326 citations


Journal ArticleDOI
TL;DR: The concepts of information entropy, rough entropy and knowledge granulation in rough set theory are introduced, and the relationships among those concepts are established.
Abstract: Rough set theory is a relatively new mathematical tool for use in computer applications in circumstances which are characterized by vagueness and uncertainty. In this paper, we introduce the concepts of information entropy, rough entropy and knowledge granulation in rough set theory, and establish the relationships among those concepts. These results will be very helpful for understanding the essence of concept approximation and establishing granular computing in rough set theory.

320 citations


Book ChapterDOI
TL;DR: It is shown that real networks are clustered in a well-defined domain of the entropy- noise space and that optimally heterogeneous nets actually cluster around the same narrow domain, suggesting that strong constraints actually operate on the possible universe of complex networks.
Abstract: Complex networks are characterized by highly heterogeneous distributions of links, often pervading the presence of key properties such as robustness under node removal. Several correlation measures have been defined in order to characterize the structure of these nets. Here we show that mutual information, noise and joint en- tropies can be properly defined on a static graph. These measures are computed for a number of real networks and analytically estimated for some simple standard models. It is shown that real networks are clustered in a well-defined domain of the entropy- noise space. By using simulated annealing optimization, it is shown that optimally heterogeneous nets actually cluster around the same narrow domain, suggesting that strong constraints actually operate on the possible universe of complex networks. The evolutionary implications are discussed.

311 citations


Book
01 Jan 2004
TL;DR: In this article, the authors provide a comprehensive description of a new method of proving the central limit theorem, through the use of apparently unrelated results from information theory, including entropy and Fisher information.
Abstract: This book provides a comprehensive description of a new method of proving the central limit theorem, through the use of apparently unrelated results from information theory. It gives a basic introduction to the concepts of entropy and Fisher information, and collects together standard results concerning their behaviour. It brings together results from a number of research papers as well as unpublished material, showing how the techniques can give a unified view of limit theorems.

300 citations


Proceedings ArticleDOI
26 Apr 2004
TL;DR: In this article, an entropy-based sensor selection heuristic for localization is proposed, which selects an informative sensor such that the fusion of the selected sensor observation with the prior target location distribution would yield on average the greatest or nearly the greatest reduction in the entropy of the target location distributions.
Abstract: We propose an entropy-based sensor selection heuristic for localization. Given 1) a prior probability distribution of the target location, and 2) the locations and the sensing models of a set of candidate sensors for selection, the heuristic selects an informative sensor such that the fusion of the selected sensor observation with the prior target location distribution would yield on average the greatest or nearly the greatest reduction in the entropy of the target location distribution. The heuristic greedily selects one sensor in each step without retrieving any actual sensor observations. The heuristic is also computationally much simpler than the mutual-information-based approaches. The effectiveness of the heuristic is evaluated using localization simulations in which Gaussian sensing models are assumed for simplicity. The heuristic is more effective when the optimal candidate sensor is more informative.

297 citations


Journal ArticleDOI
TL;DR: A recently introduced Bayesian entropy estimator is applied to synthetic data inspired by experiments, and to real experimental spike trains, and performs admirably even very deep in the undersampled regime, where other techniques fail.
Abstract: The major problem in information theoretic analysis of neural responses and other biological data is the reliable estimation of entropy-like quantities from small samples. We apply a recently introduced Bayesian entropy estimator to synthetic data inspired by experiments, and to real experimental spike trains. The estimator performs admirably even very deep in the undersampled regime, where other techniques fail. This opens new possibilities for the information theoretic analysis of experiments, and may be of general interest as an example of learning from limited data.

291 citations


Journal ArticleDOI
TL;DR: In this article, a link between maximizing entropy and the construction of polygonal interpolants is established, and the maximum entropy formulation leads to a feasible solution for ϕi in any convex or non-convex polygon.
Abstract: In this paper, we establish a link between maximizing (information-theoretic) entropy and the construction of polygonal interpolants. The determination of shape functions on n-gons (n>3) leads to a non-unique under-determined system of linear equations. The barycentric co-ordinates ϕi, which form a partition of unity, are associated with discrete probability measures, and the linear reproducing conditions are the counterpart of the expectations of a linear function. The ϕi are computed by maximizing the uncertainty H(ϕ1,ϕ2,…,ϕn)=−∑ ϕi logϕi, subject to the above constraints. The description is expository in nature, and the numerical results via the maximum entropy (MAXENT) formulation are compared to those obtained from a few distinct polygonal interpolants. The maximum entropy formulation leads to a feasible solution for ϕi in any convex or non-convex polygon. This study is an instance of the application of the maximum entropy principle, wherein least-biased inference is made on the basis of incomplete information. Copyright © 2004 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: A new thresholding technique based on two-dimensional Renyi's entropy is presented, which extends a method due to Sahoo et al. (1997) and includes a previously proposed global thresholding methodDue to Abutaleb (Pattern Recognition 47 (1989) 22).

Journal ArticleDOI
TL;DR: The physics of information in the context of molecular biology and genomics is introduced and applications to molecular sequence and structure analysis are reviewed, and new tools in the characterization of resistance mutations, and in drug design are introduced.

Journal ArticleDOI
TL;DR: A new correlation measure, the product of the Shannon entropy power and the Fisher information of the electron density, is introduced by analyzing the Fisher-Shannon information plane of some two-electron systems.
Abstract: A new correlation measure, the product of the Shannon entropy power and the Fisher information of the electron density, is introduced by analyzing the Fisher–Shannon information plane of some two-electron systems (He-like ions, Hooke’s atoms). The uncertainty and scaling properties of this information product are pointed out. In addition, the Fisher and Shannon measures of a finite many-electron system are shown to be bounded by the corresponding single-electron measures and the number of electrons of the system.

Journal ArticleDOI
TL;DR: An information-theoretic framework for analyzing control systems based on the close relationship of controllers to communication channels is proposed, providing new derivations of the advantage afforded by closed-loop control and proposing an information-based optimality criterion for control systems.
Abstract: We propose an information-theoretic framework for analyzing control systems based on the close relationship of controllers to communication channels. A communication channel takes an input state and transforms it into an output state. A controller, similarly, takes the initial state of a system to be controlled and transforms it into a target state. In this sense, a controller can be thought of as an actuation channel that acts on inputs to produce desired outputs. In this transformation process, two different control strategies can be adopted: (i) the controller applies an actuation dynamics that is independent of the state of the system to be controlled (open-loop control); or (ii) the controller enacts an actuation dynamics that is based on some information about the state of the controlled system (closed-loop control). Using this communication channel model of control, we provide necessary and sufficient conditions for a system to be perfectly controllable and perfectly observable in terms of information and entropy. In addition, we derive a quantitative trade-off between the amount of information gathered by a closed-loop controller and its relative performance advantage over an open-loop controller in stabilizing a system. This work supplements earlier results (Phys. Rev. Lett. 84 (2000) 1156) by providing new derivations of the advantage afforded by closed-loop control and by proposing an information-based optimality criterion for control systems. New applications of this approach pertaining to proportional controllers, and the control of chaotic maps are also presented.

Posted Content
TL;DR: A transformed metric entropy measure of dependence is studied which satisfies many desirable properties, including being a proper measure of distance and capable of good performance in identifying dependence even in possibly nonlinear time series.
Abstract: A transformed metric entropy measure of dependence is studied which satisfies many desirable properties, including being a proper measure of 'distance'. It is capable of good performance in identifying dependence even in possibly nonlinear time series, and is applicable for both continuous and discrete variables. A nonparametric kernel density implementation is considered here for many stylized models including linear and nonlinear MA, AR, GARCH, integrated series and chaotic dynamics. A related permutation test of independence is proposed and compared with several alternatives.

Journal ArticleDOI
TL;DR: It is shown that if X1, X2, . . . are independent and identically distributed square-integrable random variables then the entropy of the normalized sum Ent (X1+ · · · + Xn over √n) is an increasing function of n.
Abstract: It is shown that if X1, X2, . . . are independent and identically distributed square-integrable random variables then the entropy of the normalized sum Ent (X1+ · · · + Xn over √n) is an increasing function of n. This resolves an old problem which goes back to [6, 7, 5]. The result also has a version for non-identically distributed random variables or random vectors.

Journal ArticleDOI
01 Aug 2004
TL;DR: In this paper, a new technique is developed for phase adjustment in ISAR imaging, where the adjustment phase is found by iteratively solving an equation, which is derived by minimising the entropy of the image.
Abstract: A new technique is developed for phase adjustment in ISAR imaging. The adjustment phase is found by iteratively solving an equation, which is derived by minimising the entropy of the image. This technique can be used to estimate adjustment phases of any form. Moreover, the optimisation method used in this technique is computationally more efficient than trial-and-error methods.

Proceedings ArticleDOI
04 Jul 2004
TL;DR: It is shown that the entropy-based criterion can be derived in the formal framework of probabilistic clustering models and the connection between the criterion and the approach based on dissimilarity co-efficients is established.
Abstract: Entropy-type measures for the heterogeneity of clusters have been used for a long time. This paper studies the entropy-based criterion in clustering categorical data. It first shows that the entropy-based criterion can be derived in the formal framework of probabilistic clustering models and establishes the connection between the criterion and the approach based on dissimilarity co-efficients. An iterative Monte-Carlo procedure is then presented to search for the partitions minimizing the criterion. Experiments are conducted to show the effectiveness of the proposed procedure.

Journal ArticleDOI
TL;DR: In this paper, a transformed metric entropy measure of dependence is studied which satisfies many desirable properties, including being a proper measure of distance, and is capable of good performance in identifying dependence even in possibly nonlinear time series.
Abstract: . A transformed metric entropy measure of dependence is studied which satisfies many desirable properties, including being a proper measure of distance. It is capable of good performance in identifying dependence even in possibly nonlinear time series, and is applicable for both continuous and discrete variables. A nonparametric kernel density implementation is considered here for many stylized models including linear and nonlinear MA, AR, GARCH, integrated series and chaotic dynamics. A related permutation test of independence is proposed and compared with several alternatives.

Journal ArticleDOI
TL;DR: A non-parametric method for calibrating jump-diffusion models to a set of observed option prices is presented and it is shown via simulation tests that using the entropy penalty resolves the numerical instability of the calibration problem.
Abstract: We present a non-parametric method for calibrating jump-diffusion models to a set of observed option prices. We show that the usual formulations of the inverse problem via nonlinear least squares are ill-posed. In the realistic case where the set of observed prices is discrete and finite, we propose a regularization method based on relative entropy: we reformulate our calibration problem into a problem of finding a risk neutral jump-diffusion model that reproduces the observed option prices and has the smallest possible relative entropy with respect to a chosen prior model. We discuss the numerical implementation of our method using a gradient based optimization and show via simulation tests on various examples that using the entropy penalty resolves the numerical instability of the calibration problem. Finally, we apply our method to empirical data sets of index options and discuss the empirical results obtained.

Journal ArticleDOI
TL;DR: A general recursive filter decoding algorithm based on a point process model of individual neuron spiking activity and a linear stochastic state-space model of the biological signal is presented and an integrated approach to dynamically reading neural codes, measuring their properties, and quantifying the accuracy with which encoded information is extracted is suggested.
Abstract: Neural spike train decoding algorithms and techniques to compute Shannon mutual information are important methods for analyzing how neural systems represent biological signals. Decoding algorithms are also one of several strategies being used to design controls for brain-machine interfaces. Developing optimal strategies to desig n decoding algorithms and compute mutual information are therefore important problems in computational neuroscience. We present a general recursive filter decoding algorithm based on a point process model of individual neuron spiking activity and a linear stochastic state-space model of the biological signal. We derive from the algorithm new instantaneous estimates of the entropy, entropy rate, and the mutual information between the signal and the ensemble spiking activity. We assess the accuracy of the algorithm by computing, along with the decoding error, the true coverage probability of the approximate 0.95 confidence regions for the individual signal estimates. We illustrate the new algorithm by reanalyzing the position and ensemble neural spiking activity of CA1 hippocampal neurons from two rats foraging in an open circular environment. We compare the performance of this algorithm with a linear filter constructed by the widely used reverse correlation method. The median decoding error for Animal 1 (2) during 10 minutes of open foraging was 5.9 (5.5) cm, the median entropy was 6.9 (7.0) bits, the median information was 9.4 (9.4) bits, and the true coverage probability for 0.95 confidence regions was 0.67 (0.75) using 34 (32) neurons. These findings improve significantly on our previous results and suggest an integrated approach to dynamically reading neural codes, measuring their properties, and quantifying the accuracy with which encoded information is extracted.

Posted Content
TL;DR: The basic notions of both theories are discussed, and the relation of both to universal coding, Shannon mutual information versus Kolmogorov (`algorithmic') mutual information, probabilistic sufficient statistic versus algorithmic sufficient statistic (related to lossy compression in the Shannon theory versus meaningful information in the Kolmogsorov theory) are related.
Abstract: We compare the elementary theories of Shannon information and Kolmogorov complexity, the extent to which they have a common purpose, and where they are fundamentally different. We discuss and relate the basic notions of both theories: Shannon entropy versus Kolmogorov complexity, the relation of both to universal coding, Shannon mutual information versus Kolmogorov (`algorithmic') mutual information, probabilistic sufficient statistic versus algorithmic sufficient statistic (related to lossy compression in the Shannon theory versus meaningful information in the Kolmogorov theory), and rate distortion theory versus Kolmogorov's structure function. Part of the material has appeared in print before, scattered through various publications, but this is the first comprehensive systematic comparison. The last mentioned relations are new.

Journal ArticleDOI
TL;DR: Comparisons of the new method against the standard method based on word frequencies are presented, providing evidence that this new approach is an alternative entropy estimator for binned spike trains.
Abstract: Normalized Lempel-Ziv complexity, which measures the generation rate of new patterns along a digital sequence, is closely related to such important source properties as entropy and compression ratio, but, in contrast to these, it is a property of individual sequences. In this article, we propose to exploit this concept to estimate (or, at least, to bound from below) the entropy of neural discharges (spike trains). The main advantages of this method include fast convergence of the estimator (as supported by numerical simulation) and the fact that there is no need to know the probability law of the process generating the signal. Furthermore, we present numerical and experimental comparisons of the new method against the standard method based on word frequencies, providing evidence that this new approach is an alternative entropy estimator for binned spike trains.

Journal ArticleDOI
TL;DR: It is shown that the trade-off between efficiency and power present in multiple-trial-type experiments is identical in form to that observed for single- Trial type experiments.

Journal ArticleDOI
TL;DR: In this paper, the entanglement entropy for the ground state of a spin chain is related to the corner transfer matrices of the triangular Ising model and expressed in closed form.
Abstract: The entanglement entropy for the ground state of a XY spin chain is related to the corner transfer matrices of the triangular Ising model and expressed in closed form.

Proceedings ArticleDOI
17 May 2004
TL;DR: The idea of a multiband/multiresolution entropy feature where the spectrum is divided into equal size subbands and entropy is computed in each subband is introduced.
Abstract: In general, entropy gives us a measure of the number of bits required to represent some information. When applied to the probability mass function (PMF), entropy can also be used to measure the "peakiness" of a distribution. We propose using the entropy of a short time Fourier transform spectrum, normalised as PMF, as an additional feature for automatic speech recognition (ASR). It is indeed expected that a peaky spectrum, representation of clear formant structure in the case of voiced sounds, will have low entropy, while a flatter spectrum, corresponding to nonspeech or noisy regions, will have higher entropy. Extending this reasoning further, we introduce the idea of a multiband/multiresolution entropy feature where we divide the spectrum into equal size subbands and compute entropy in each subband. The results show that multiband entropy features used in conjunction with normal cepstral features improve the performance of an ASR system.

Proceedings ArticleDOI
17 Oct 2004
TL;DR: It is shown that certain cryptographic tasks like bit commitment, encryption, secret sharing, zero-knowledge, non-interactive zero- knowledge, and secure two-party computation for any non-trivial junction are impossible to realize if parties have access to entropy sources with slightly less-than-perfect entropy, i.e., sources with imperfect randomness.
Abstract: We investigate the feasibility of a variety of cryptographic tasks with imperfect randomness. The kind of imperfect randomness we consider are entropy sources, such as those considered by Santha and Vazirani, Chor and Goldreich, and Zuckerman. We show the following: (1) certain cryptographic tasks like bit commitment, encryption, secret sharing, zero-knowledge, non-interactive zero-knowledge, and secure two-party computation for any non-trivial junction are impossible to realize if parties have access to entropy sources with slightly less-than-perfect entropy, i.e., sources with imperfect randomness. These results are unconditional and do not rely on any un-proven assumption. (2) On the other hand, based on stronger variants of standard assumptions, secure signature schemes are possible with imperfect entropy sources. As another positive result, we show (without any unproven assumption) that interactive proofs can be made sound with respect to imperfect entropy sources.

Journal ArticleDOI
TL;DR: An efficient method to select an optimum set of test points for dictionary techniques in analog fault diagnosis by searching for the minimum of the entropy index based on the available test points by using an integer-coded dictionary.
Abstract: An efficient method to select an optimum set of test points for dictionary techniques in analog fault diagnosis is proposed. This is done by searching for the minimum of the entropy index based on the available test points. First, the two-dimensional integer-coded dictionary is constructed whose entries are measurements associated with faults and test points. The problem of optimum test points selection is, thus, transformed to the selection of the columns that isolate the rows of the dictionary. Then, the likelihood for a column to be chosen based on the size of its ambiguity set is evaluated using the minimum entropy index of test points. Finally, the test point with the minimum entropy index is selected to construct the optimum set of test points. The proposed entropy-based method to select a local minimum set of test points is polynomial bounded in computational cost. The comparison between the proposed method and other reported test points selection methods is carried out by statistical experiments. The results indicate that the proposed method more efficiently and more accurately finds the locally optimum set of test points and is practical for large scale analog systems.

Journal ArticleDOI
TL;DR: This paper provides an intuitive interpretation of the theoretical results and examines the practical implications of these results for the optimal design of fMRI experiments with multiple trial types.