scispace - formally typeset
Search or ask a question

Showing papers on "Entropy (information theory) published in 1987"


Journal ArticleDOI
TL;DR: In this paper, it is shown that for simple cellular automata, pattern formation can be clearly separated from a mere reduction of the source entropy and different types of automata can be distinguished.
Abstract: We demonstrate by means of several examples that an easily calculable measure of algorithmic complexity c which has been introduced by Lempel and Ziv [IEEE Trans. Inf. Theory IT-22, 25 (1976)] is extremely useful for characterizing spatiotemporal patterns in high-dimensionality nonlinear systems. It is shown that, for time series, c can be a finer measure for order than the Liapunov exponent. We find that, for simple cellular automata, pattern formation can be clearly separated from a mere reduction of the source entropy and different types of automata can be distinguished. For a chain of coupled logistic maps, c signals pattern formation which cannot be seen in the spatial correlation function alone.

491 citations


Journal ArticleDOI
01 Jan 1987

418 citations


Journal Article
TL;DR: A method to reconstruct the deterministic portion of the equations of motion directly from a data series to represent a vast reduction of a chaotic data set’s observed complexity to a compact, algorithmic specification is described.
Abstract: Temporal pattern learning, control and prediction, and chaotic data analysis share a common problem: deducing optimal equations of motion from observations of time-dependent behavior. Each desires to obtain models of the physical world from limited information. We describe a method to reconstruct the deterministic portion of the equations of motion directly from a data series. These equations of motion represent a vast reduction of a chaotic data set’s observed complexity to a compact, algorithmic specification. This approach employs an informational measure of model optimality to guide searching through the space of dynamical systems. As corollary results, we indicate how to estimate the minimum embedding dimension, extrinsic noise level, metric entropy, and Lyapunov spectrum. Numerical and experimental applications demonstrate the method’s feasibility and limitations. Extensions to estimating parametrized families of dynamical systems from bifurcation data and to spatial pattern evolution are presented. Applications to predicting chaotic data and the design of forecasting, learning, and control systems, are discussed.

332 citations


Journal ArticleDOI
01 May 1987
TL;DR: For measuring the degree of association or correlation between two nominal variables, a measure based on informational entropy is presented as being preferable to that proposed recently by Horibe.
Abstract: For measuring the degree of association or correlation between two nominal variables, a measure based on informational entropy is presented as being preferable to that proposed recently by Horibe [1]. Asymptotic developments are also presented that may be used for making approximate statistical inferences about the population measure when the sample size is reasonably large. The use of this methodology is illustrated using a numerical example.

214 citations


Journal ArticleDOI
TL;DR: In this paper, the authors proposed an event-covering approach which covers a subset of statistically relevant outcomes in the outcome space of variable-pairs, and once the covered event patterns are acquired, subsequent analysis tasks such as probabilistic inference, cluster analysis, and detection of event patterns for each cluster based on the incomplete probability scheme can be performed.
Abstract: The difficulties in analyzing and clustering (synthesizing) multivariate data of the mixed type (discrete and continuous) are largely due to: 1) nonuniform scaling in different coordinates, 2) the lack of order in nominal data, and 3) the lack of a suitable similarity measure. This paper presents a new approach which bypasses these difficulties and can acquire statistical knowledge from incomplete mixed-mode data. The proposed method adopts an event-covering approach which covers a subset of statistically relevant outcomes in the outcome space of variable-pairs. And once the covered event patterns are acquired, subsequent analysis tasks such as probabilistic inference, cluster analysis, and detection of event patterns for each cluster based on the incomplete probability scheme can be performed. There are four phases in our method: 1) the discretization of the continuous components based on a maximum entropy criterion so that the data can be treated as n-tuples of discrete-valued features; 2) the estimation of the missing values using our newly developed inference procedure; 3) the initial formation of clusters by analyzing the nearest-neighbor distance on subsets of selected samples; and 4) the reclassification of the n-tuples into more reliable clusters based on the detected interdependence relationships. For performance evaluation, experiments have been conducted using both simulated and real life data.

200 citations


Journal ArticleDOI
01 Jul 1987
TL;DR: This paper concludes that the density maximizing entropy is identical to the conditional density of the complete data given the incomplete data, and derives a recursive algorithm for the generation of Toeplitz constrained maximum-likelihood estimators which at each iteration evaluates conditional mean estimates of the lag products based on the previous estimate of the covariance.
Abstract: The principle of maximum entropy has played an important role in the solution of problems in which the measurements correspond to moment constraints on some many-to-one mapping h(x). In this paper we explore its role in estimation problems in which the measured data are statistical observations and moment constraints on the observation function h(x) do not exist. We conclude that: 1) For the class of likelihood problems arising in a complete-incomplete data context in which the complete data x are nonuniquely determined by the measured incomplete data y via the many-to-one mapping y = h(x), the density maximizing entropy is identical to the conditional density of the complete data given the incomplete data. This equivalence results by viewing the measurements as specifying the domain over which the density is defined, rather than as a moment constraint on h(x). 2) The identity between the maximum entropy and the conditional density results in the fact that maximum-likelihood estimates may be obtained via a joint maximization (minimization) of the entropy function (Kullback-Liebler divergence). This provides the basis for the iterative algorithm of Dempster, Laird, and Rubin [1] for the maximization of likelihood functions. 3) This iterative method is used for maximum-likelihood estimation of image parameters in emission tomography and gammaray astronomy. We demonstrate that unconstrained likelihood estimation of image intensities from finite data sets yields unstable estimates. We show how Grenander's method of sieves can be used with the iterative algorithm to remove the instability. A bandwidth sieve is introduced resulting in an estimator which is smoothed via exponential splines. 4) We also derive a recursive algorithm for the generation of Toeplitz constrained maximum-likelihood estimators which at each iteration evaluates conditional mean estimates of the lag products based on the previous estimate of the covariance, from which the updated Toeplitz covariance is generated. We prove that the sequence of Toeplitz estimators has the property that they increase in likelihood, remain in the set of positive-definite Toeplitz covariances, and has all of its limit points stable and satisfying the necessary conditions for maximizing the likelihood.

148 citations


Journal ArticleDOI
Daniel Kersten1
TL;DR: An experiment was devised in which human observers interactively restored missing gray levels from 128 X 128 pixel pictures with 16 gray levels, and found that for almost-complete pictures, but not for noisy pictures, this performance can be matched by a nearest-neighbor predictor.
Abstract: One aspect of human image understanding is the ability to estimate missing parts of a natural image. This ability depends on the redundancy of the representation used to describe the class of images. In 1951, Shannon [ Bell. Syst. Tech. J.30, 50 ( 1951)] showed how to estimate bounds on the entropy and redundancy of an information source from predictability data. The entropy, in turn, gives a measure of the limits to error-free information compaction. An experiment was devised in which human observers interactively restored missing gray levels from 128 × 128 pixel pictures with 16 gray levels. For eight images, the redundancy ranged from 46%, for a complicated picture of foliage, to 74%, for a picture of a face. For almost-complete pictures, but not for noisy pictures, this performance can be matched by a nearest-neighbor predictor.

147 citations


01 Mar 1987
TL;DR: The Boltzmann learning algorithm is generalized to higher‐order Boltzman machine, which should be much faster than for a second‐order Bolzmann machine based on pairwise interactions.
Abstract: The Boltzmann machine is a nonlinear network of stochastic binary processing units that interact pairwise through symmetric connection strengths. In a third-order Boltzmann machine, triples of units interact through symmetric conjunctive interactions. The Boltzmann learning algorithm is generalized to higher-order interactions. The rate of learning for internal representations in a higher-order Boltzmann machine should be much faster than for a second-order Boltzmann machine based on pairwise interactions.

122 citations


Proceedings ArticleDOI
01 Apr 1987
TL;DR: An algorithm for calculating a noise-to-mask ratio is presented which helps to identify, where quantization noise could be audible, where the OCF-Coder can be audible.
Abstract: Optimum Coding in the Frequency domain (OCF) uses entropy coding of quantized spectral coefficients to efficiently code high quality sound signals with 3 bits/sample. In an iterative algorithm psychoacoustic weigthing is used to get the quantization noise to be masked in every critical band. The coder itself uses iterative quantizer control to get each data block to be coded with a fixed number of bits. Details about the OCF-Coder are presented together with information about the codebook needed and the training for the entropy coder. An algorithm for calculating a noise-to-mask ratio is presented which helps to identify, where quantization noise could be audible.

95 citations


Journal ArticleDOI
TL;DR: In this article, a supersystem of the Gibbsian ensemble consisting of N replicas of the system is considered, on which thought experiments compatible with the data a i can be performed.

80 citations


Journal ArticleDOI
TL;DR: The paper explores the use of the Shannon (informational) entropy measure and Jaynes's maximum entropy formalism in the solution of constrained non-linear programming problems and extends the method into an entropy augmented Lagrangean formulation.
Abstract: The paper explores the use of the Shannon (informational) entropy measure and Jaynes's maximum entropy formalism in the solution of constrained non-linear programming problems. Through a surrogate constraint approach an entropy based update formula for the surrogate multipliers is derived. A numerical example of the method is presented. Some information-theoretic interpretations of mathematical programming are explored. Finally, through the use or surrogate duals the method is extended into an entropy augmented Lagrangean formulation.

Journal ArticleDOI
TL;DR: In this article, the authors presented an algorithm for Markov sources that is easy to implement and bound the loss of efficiency as a function of the code complexity and the mismatch between the source and the code.
Abstract: Petry's efficient and optimal variable to fixed-length source code for discrete memoryless sources was described by Schalkwijk. By extending this coding technique we are able to give an algorithm for Markov sources that is easy to implement. We can bound the loss of efficiency as a function of the code complexity and the mismatch between the source and the code. Rates arbitrarily close to the source entropy are shown to be achievable. In this sense the codes introduced are optimal.

Journal ArticleDOI
TL;DR: In this paper, the authors apply the Principle of Maximum Entropy (PME) to estimate the measurement uncertainty, which describes the state of knowledge about the true value of a measured quantity.
Abstract: Systematic deviations influence the measurement uncertainty, which describes the state of knowledge about the true value of a measured quantity. As an interval estimation, the measurement uncertainty should always be given in terms of a probability statement. To establish it, a probability assignment for the systematic deviation is needed. Where an estimate of this location parameter is not part of the experimental data, we apply the Principle of Maximum Entropy (PME), which yields unique, impersonal, and unbiased assignments based on nonstatistical information. As will be shown, the results are reasonable. The final measurement uncertainty, which is constructed to make use of both the experimental data and other prior knowledge, will correctly represent the state of knowledge with regard to the true value.

Journal ArticleDOI
TL;DR: A special-purpose iterative algorithm, of the row-action type, for solving the problem of maximizing the “$\log x$” entropy functional over linear equality constraints, employing “projections” onto hyperplanes which are called entropy projections.
Abstract: In this paper we develop a special-purpose iterative algorithm, of the row-action type, for solving the problem of maximizing the “$\log x$” entropy functional over linear equality constraints. The algorithm employs “projections” onto hyperplanes which we call “$\log x$” entropy projections. A complete proof of convergence is given.

Journal ArticleDOI
Eitan Tadmor1
TL;DR: In this paper, it was shown that symmetric systems of conservation laws are equipped with a one-parameter family of entropy functions, and a simple symmetrizability criterion was used.

Journal ArticleDOI
TL;DR: Existing methods of hydrologic network design are reviewed and a formulation based on Shannon’s information theory is presented, which involves the computation of joint entropy terms which can be computed by discretizinghydrologic time series data collected at station locations.
Abstract: Existing methods of hydrologic network design are reviewed and a formulation based on Shannon’s information theory is presented. This type of formulation involves the computation of joint entropy terms which can be computed by discretizing hydrologic time series data collected at station locations. The computation of discrete entropy terms is straightforward but in handling large numbers of stations enormous computation time and storage is required. In order to minimize these problems, bivariate and multivariate continuous distributions are used to derive entropy terms. The information transmission at bivariate level is derived for normal, lognormal, gamma, exponential, and extreme value distributions. At the multivariate level, multivariate form of normal and lognormal probability density functions are used.In order to illustrate the applicability of the derived information relationship for various bivariate and multivariate probability distributions, daily precipitation data for a period of two years co...

Journal ArticleDOI
TL;DR: In this article, necessary and sufficient conditions for the weak convergence of the row sums of an infinitesimal row-independent triangular array of stochastic processes, indexed by a set S, to a sample-continuous Gaussian process, when the array satisfies a "random entropy" condition, analogous to one used by Gine and Zinn (1984) for empirical processes.
Abstract: Necessary and sufficient conditions are found for the weak convergence of the row sums of an infinitesimal row-independent triangular array (φ nj ) of stochastic processes, indexed by a set S, to a sample-continuous Gaussian process, when the array satisfies a “random entropy” condition, analogous to one used by Gine and Zinn (1984) for empirical processes. This entropy condition is satisfied when S is a class of sets or functions with the Vapnik-Ĉervonenkis property and each φ nj (f)fdνnj is of the form νnjc for some reasonable random finite signed measure v nj. As a result we obtain necessary and sufficient conditions for the weak convergence of (possibly non-i.i.d.) partial-sum processes, and new sufficient conditions for empirical processes, indexed by Vapnik-Ĉervonenkis classes. Special cases include Prokhorov's (1956) central limit theorem for empirical processes, and Shorack's (1979) theorems on weighted empirical processes.

Proceedings ArticleDOI
John Daugman1
06 Jun 1987
TL;DR: Any effort to develop efficient schemes for image representation must begin by pondering the nature of image structure and image information, and the statistical complexity of images does not correspond to their resolution if they contain nonrandom structure, coherence, or local auto-correlation.
Abstract: Any effort to develop efficient schemes for image representation must begin by pondering the nature of image structure and image information. The fundamental insight which makes compact coding possible is that the statistical complexity of images does not correspond to their resolution (number of resolvable states) if they contain nonrandom structure, coherence, or local auto-correlation. These are respects in which real images differ from random noise: they are optical projections of 3-D objects whose physical constitution and material unity ensure locally homogeneous image structure, whether such local correlations are as simple as luminance value, or a more subtle textural signature captured by some higher-order statistic. Except in the case of synthetic white noise, it is not true that each pixel in an image is statistically independent from its neighbors and from every other pixel; yet that is the default assumption in the standard image representations employed in video transmission channels or the data structures of storage devices. This statistical fact - that the entropy of the channel vastly exceeds the entropy of the signal - has long been recognized, but it has proven difficult to reduce channel bandwidth without loss of resolution. In practical terms, the consequence is that the video data rates (typically 8 bits for each one of several hundred thousand pixels in an image mosaic, resulting in information bandwidths in the tens of millions of bits per second) are far more costly informationally than they need to be, and moreover, no image structure more complex than a single pixel at a time is explicitly extracted or encoded.

Journal ArticleDOI
TL;DR: In this paper, the principle of maximum entropy was used to derive the Pearson type (PT) III distribution by maximizing the entropy subject to two appropriate constraints which were the mean and the mean of the logarithm of real values about a constant > 0.

Journal ArticleDOI
TL;DR: This work proposes a maximum entropy method to reconstruct the object from either the Fourier domain data or directly from the original diffracted field measurements, which is minimized using variational techniques and a conjugate-gradient iterative method.
Abstract: In diffraction tomography, the generalized Radon theorem relates the Fourier transform (FT) of the diffracted field to the two-dimensional FT of the diffracting object. The relationship stands on algebraic contours, which are semicircles in the case of Born or Rytov first-order linear approximations. But the corresponding data are not sufficient to determine uniquely the solution. We propose a maximum entropy method to reconstruct the object from either the Fourier domain data or directly from the original diffracted field measurements. To do this, we give a new definition for the entropy of an object considered as a function of R(2) to C. To take into account the presence of noise, a chi-squared statistic is added to the entropy measure. The objective function thus obtained is minimized using variational techniques and a conjugate-gradient iterative method. The computational cost and practical implementation of the algorithm are discussed. Some simulated results are given which compare this new method with the classical ones.

Journal ArticleDOI
TL;DR: Some notions of information theory and measurement theory which underlie the description of a system by a reduced set of variables and the production of various types of entropies in collisions between complex systems such as heavy ions are reviewed.

Journal ArticleDOI
TL;DR: A new algorithm is developed for solving the maximum entropy (ME) image reconstruction problem by solving a system of ordinary differential equations with appropriate initial values and it is shown how initial values are determined.
Abstract: A new algorithm is developed for solving the maximum entropy (ME) image reconstruction problem. The problem is reduced to solving a system of ordinary differential equations with appropriate initial values. The choice of initial values closely relates to the satisfaction of constraints, and we show how initial values are determined. The algorithm does not involve any optimization method. Instead of searching in the (n + 1)-dimensional space as required for most ME algorithms, our approach relies on solving a one-dimensional search along a well-defined and easily mastered path. Moreover, an efficient algorithm is developed to handle the search. The computer reconstruction verifies the theory.

Journal ArticleDOI
TL;DR: The term "entropy" is now widely used in social science, although its origin is in physical science as mentioned in this paper, and there are three main ways in which the term may be used.
Abstract: The term “entropy” is now widely used in social science, although its origin is in physical science. There are three main ways in which the term may be used. The first invokes the original meaning,...

Proceedings Article
01 Jan 1987
TL;DR: For a network that learns a problem from examples using a local learning rule, it is proved that the entropy of the problem becomes a lower bound for the connectivity of the network.
Abstract: How does the connectivity of a neural network (number of synapses per neuron) relate to the complexity of the problems it can handle (measured by the entropy)? Switching theory would suggest no relation at all, since all Boolean functions can be implemented using a circuit with very low connectivity (e.g., using two-input NAND gates). However, for a network that learns a problem from examples using a local learning rule, we prove that the entropy of the problem becomes a lower bound for the connectivity of the network.

Journal ArticleDOI
TL;DR: The approach allows one to deduce the order parameters and dominant spatial patterns of a system which undergoes a non-equilibrium phase transition by means of an algorithm rather than by guessing.
Abstract: This paper is concerned with processes of self-organization which can take place both in the inanimate and animate world. In particular we study the question what physics can contribute to the understanding of these processes. Its traditional disciplines, namely thermodynamics and statistical mechanics which are concerned with the behavior of multi-component systems, require new ideas and concepts in order to cope with self-organizing systems. These concepts were elaborated in the new field of synergetics from the microscopic point of view. The present paper is mainly concerned with a macroscopic approach. In Section 1 we briefly remind the reader of various concepts of entropy and information and we give brief definitions of structure and self-organization. At present there seems to be no satisfactory definition of complexity available. Section 2 provides two examples of self-organizing systems, namely the laser and slime mold. In Section 3 we briefly remind the reader of the microscopic approach used in synergetics. As a new result it is shown that the information of the total system close to instability points is essentially contained in the information in the order parameters for which the specific example of a single order parameter is then treated explicitly. Finally we show how adequate constraints can be found to formulate the maximum information entropy principle for self-organizing systems. Our approach allows one to deduce the order parameters and dominant spatial patterns of a system which undergoes a non-equilibrium phase transition by means of an algorithm rather than by guessing.

Proceedings ArticleDOI
10 Jun 1987
TL;DR: Observability of a Markovian linear discrete-in system is shown to be related to entropies of the initial state and the output observation and Stability is found to pertain to the capacity of the channel which represents the system.
Abstract: Many dynamical models which have been analyzed in the context of system theory, can also be viewed as communication channels with memory. In this interpretation, the system's input is a transmitted message and the observation, or output, is the received message. Information-theoretic measures like entropy, mutual information and capacity can therefore be employed, and key concepts in system thery, such as observability, controllability and stability, can be expressed in information-theoretic terms. In this paper we study certain linear Markovian models from this viewpoint Observability of a Markovian linear discrete-in system is shown to be related to entropies of the initial state and the output observation. Stability is found to pertain to the capacity of the channel which represents the system. The derived relations expose the role of information flow in dynamical system behavior and suggest applications for other liner and nonlinear models.

Journal ArticleDOI
TL;DR: A Minkowskian theory of observation is derived which holds when the observable is a pair and which involves a parameter which is directly related to the subjectivity of the observer, which provides a new approach to fuzzy number.

Book ChapterDOI
01 Jan 1987
TL;DR: This paper presents the notion of random graphs and their associated probability distributions, and introduces a distance measure together with a hierarchical clustering algorithm to synthesize an ensemble of attributed graphs into a probability distribution (or a set of distributions) of a random graph.
Abstract: This paper presents the notion of random graphs and their associated probability distributions. It addresses both the structural and probabilistic aspects of structural pattern recognition. A structural pattern can be explicitly represented in the form of attributed graphs and an ensemble of such representations can be considered as outcomes of a mapping, called random graph mapping. To account for the variation of structural patterns in the ensemble, a lower order probability distribution is used to approximate the high order joint probability. To synthesize an ensemble of attributed graphs into a probability distribution (or a set of distributions) of a random graph, we introduce a distance measure together with a hierarchical clustering algorithm. The distance measure is defined as the minimum change of a specially defined Shannon’s entropy before and after the merging of the distributions. With this new formulation, both supervised and unsupervised classification of structural patterns can be achieved.

Book ChapterDOI
01 Oct 1987
TL;DR: This work deals with the definition of Hierarchically Intelligent Control and the Principle of Decreasing Precision with Increasing Intelligence and a three level structure representing Organization, Coordination and Execution will be developed as a probabilistic model of such a system.
Abstract: Intelligent Machines capable of performing autonomously in uncertain environments, have imposed new design requirements for modern engineers. New concepts, drawn from areas like Artificial Intelligence, Operations Research and Control Theory, are required in order to implement anthropomorphic tasks with minimum intervention of an operator. This work deals with the definition of Hierarchically Intelligent Control and the Principle of Decreasing Precision with Increasing Intelligence. A three level structure representing Organization, Coordination and Execution will be developed as a probabilistic model of such a system and the approaches necessary to implement each one of them will be discussed. Finally, Entropy will be proposed as a common measure of all three levels and the problem of Intelligent Control will be cast as the mathematical programming solution that minimizes the total Entropy.

Journal ArticleDOI
TL;DR: A simple algorithm to reduce artifacts that often appear in image restoration techniques such as Wiener filtering by using an entropy gradient and an analytically calculated step size per iteration.
Abstract: We introduce a simple algorithm to reduce artifacts that often appear in image restoration techniques such as Wiener filtering. The algorithm starts with the inverse filter solution and iteratively calculates the correction term. At each iteration we use an entropy gradient and an analytically calculated step size. The algorithm uses two Fourier transforms per iteration. We show both 1 -D and 2-D examples to illustrate the algorithm.