scispace - formally typeset
Search or ask a question

Showing papers on "Entropy (information theory) published in 1997"


Journal ArticleDOI
TL;DR: The authors outline a new scheme for parameterizing polarimetric scattering problems that relies on an eigenvalue analysis of the coherency matrix and employs a three-level Bernoulli statistical model to generate estimates of the average target scattering matrix parameters from the data.
Abstract: The authors outline a new scheme for parameterizing polarimetric scattering problems, which has application in the quantitative analysis of polarimetric SAR data. The method relies on an eigenvalue analysis of the coherency matrix and employs a three-level Bernoulli statistical model to generate estimates of the average target scattering matrix parameters from the data. The scattering entropy is a key parameter is determining the randomness in this model and is seen as a fundamental parameter in assessing the importance of polarimetry in remote sensing problems. The authors show application of the method to some important classical random media scattering problems and apply it to POLSAR data from the NASA/JPL AIRSAR data base.

2,262 citations


Journal ArticleDOI
TL;DR: The minimax entropy principle is applied to texture modeling, where a novel Markov random field model, called FRAME, is derived, and encouraging results are obtained in experiments on a variety of texture images.
Abstract: This article proposes a general theory and methodology, called the minimax entropy principle, for building statistical models for images (or signals) in a variety of applications. This principle consists of two parts. The first is the maximum entropy principle for feature binding (or fusion): for a given set of observed feature statistics, a distribution can be built to bind these feature statistics together by maximizing the entropy over all distributions that reproduce them. The second part is the minimum entropy principle for feature selection: among all plausible sets of feature statistics, we choose the set whose maximum entropy distribution has the minimum entropy. Computational and inferential issues in both parts are addressed; in particular, a feature pursuit procedure is proposed for approximately selecting the optimal set of features. The minimax entropy principle is then corrected by considering the sample variation in the observed feature statistics, and an information criterion for feature pursuit is derived. The minimax entropy principle is applied to texture modeling, where a novel Markov random field (MRF) model, called FRAME (filter, random field, and minimax entropy), is derived, and encouraging results are obtained in experiments on a variety of texture images. The relationship between our theory and the mechanisms of neural computation is also discussed.

477 citations


Proceedings Article
01 Dec 1997
TL;DR: A first-order approximation of the density of maximum entropy for a continuous 1-D random variable is derived, which results in a density expansion which is somewhat similar to the classical polynomial density expansions by Gram-Charlier and Edgeworth.
Abstract: We derive a first-order approximation of the density of maximum entropy for a continuous 1-D random variable, given a number of simple constraints. This results in a density expansion which is somewhat similar to the classical polynomial density expansions by Gram-Charlier and Edgeworth. Using this approximation of density, an approximation of 1-D differential entropy is derived. The approximation of entropy is both more exact and more robust against outliers than the classical approximation based on the polynomial density expansions, without being computationally more expensive. The approximation has applications, for example, in independent component analysis and projection pursuit.

420 citations


Journal ArticleDOI
TL;DR: In this paper, the authors studied the spectral measure of Gaussian Wigner's matrices and proved that it satisfies the large deviation principle and showed that the good rate function which governs this principle achieves its minimum value at Wigners semicircular law.
Abstract: We study the spectral measure of Gaussian Wigner's matrices and prove that it satisfies a large deviation principle. We show that the good rate function which governs this principle achieves its minimum value at Wigner's semicircular law, which entails the convergence of the spectral measure to the semicircular law. As a conclusion, we give some further examples of random matrices with spectral measure satisfying a large deviation principle and argue about Voiculescu's non commutative entropy.

372 citations


Journal ArticleDOI
TL;DR: The authors show that replacing the class other, which includes all tissue not modeled explicitly by Gaussians with small variance, by a uniform probability density, and amending the expectation-maximization (EM) algorithm appropriately, gives significantly better results.
Abstract: The authors propose a modification of Wells et al. (ibid., vol. 15, no. 4, p. 429-42, 1996) technique for bias field estimation and segmentation of magnetic resonance (MR) images. They show that replacing the class other, which includes all tissue not modeled explicitly by Gaussians with small variance, by a uniform probability density, and amending the expectation-maximization (EM) algorithm appropriately, gives significantly better results. The authors next consider the estimation and filtering of high-frequency information in MR images, comprising noise, intertissue boundaries, and within tissue microstructures. The authors conclude that post-filtering is preferable to the prefiltering that has been proposed previously. The authors observe that the performance of any segmentation algorithm, in particular that of Wells et al. (and the authors' refinements of it) is affected substantially by the number and selection of the tissue classes that are modeled explicitly, the corresponding defining parameters and, critically, the spatial distribution of tissues in the image. The authors present an initial exploration to choose automatically the number of classes and the associated parameters that give the best output. This requires the authors to define what is meant by "best output" and for this they propose the application of minimum entropy. The methods developed have been implemented and are illustrated throughout on simulated and real data (brain and breast MR).

350 citations


Journal ArticleDOI
TL;DR: A review of recent contributions on entropy applications in hydrology and water resources, discusses the usefulness and versatility of the entropy concept, and reflects on the strengths and limitations of this concept as mentioned in this paper.
Abstract: Since the development of the entropy theory by Shannon in the late 1940s and of the principle of maximum entropy (POME) by Jaynes in the late 1950s there has been a proliferation of applications of entropy in a wide spectrum of areas, including hydrological and environmental sciences. The real impetus to entropy-based hydrological modelling was provided by Amorocho and Espildora in 1972. A great variety of entropy applications in hydrology and water resources have since been reported, and new applications continue to unfold. This paper reviews the recent contributions on entropy applications in hydrology and water resources, discusses the usefulness and versatility of the entropy concept, and reflects on the strengths and limitations of this concept. The paper concludes with comments on its implications in developing countries. #1997 by John Wiley & Sons, Ltd.

310 citations


Journal ArticleDOI
TL;DR: It is shown by computer simulation that the convergence of the stochastic descent algorithms is improved by using the natural gradient and the adaptively estimated cumulants.
Abstract: There are two major approaches for blind separation: maximum entropy (ME) and minimum mutual information (MMI). Both can be implemented by the stochastic gradient descent method for obtaining the demixing matrix. The MI is the contrast function for blind separation; the entropy is not. To justify the ME, the relation between ME and MMI is first elucidated by calculating the first derivative of the entropy and proving that the mean subtraction is necessary in applying the ME and at the solution points determined by the MI, the ME will not update the demixing matrix in the directions of increasing the cross-talking. Second, the natural gradient instead of the ordinary gradient is introduced to obtain efficient algorithms, because the parameter space is a Riemannian space consisting of matrices. The mutual information is calculated by applying the Gram-Charlier expansion to approximate probability density functions of the outputs. Finally, we propose an efficient learning algorithm that incorporates with an adaptive method of estimating the unknown cumulants. It is shown by computer simulation that the convergence of the stochastic descent algorithms is improved by using the natural gradient and the adaptively estimated cumulants.

306 citations


Journal ArticleDOI
TL;DR: A flux splitting scheme (AUSMDV) has been constructed with an aim at removing numerical dissipation of the Van Leer-type flux vector splittings on a contact discontinuity and an entropy fix is presented to cure the numerical shock instability associated with the ``carbuncle phenomenon''.
Abstract: A flux splitting scheme (AUSMDV) has been constructed with an aim at removing numerical dissipation of the Van Leer-type flux vector splittings on a contact discontinuity. The obtained scheme is also recognized as an improved advection upstream splitting method (AUSM) by Liou and Steffen. The proposed scheme has the following favorable properties: accurate and robust resolution for shock and contact (steady and moving) discontinuities; conservation of enthalpy for steady flows; algorithmic simplicity; and easy extension to general conservation laws such as that for chemically reacting flows. A simple shock fix is presented to cure the numerical shock instability associated with the ``carbuncle phenomenon'' and an entropy fix to remove an expansion shock or glitch at the sonic point. Extensive numerical experiments were conducted to validate the proposed scheme for a wide range of problems, and the results are compiled for comparison with several recent upwind methods.

291 citations


Book
12 Dec 1997
TL;DR: This chapter tries to achieve two purposes: its main aim is to present the principles of compressing different types of data, such as text, images, and sound, and its secondary goal is to outline the Principles of the most important compression algorithms.
Abstract: The exponential growth of computer applications in the last three decades of the 20th century has resulted in an explosive growth in the amounts of data moved between computers, collected, and stored by computer users. This, in turn, has created the field of data compression. Practically unknown in the 1960s, this discipline has now come of age. It is based on information theory, and has proved its value by providing us with fast, sophisticated methods capable of high compression ratios.This chapter tries to achieve two purposes. Its main aim is to present the principles of compressing different types of data, such as text, images, and sound. Its secondary goal is to outline the principles of the most important compression algorithms. The main sections discuss statistical compression methods, dictionary-based methods, methods for the compression of still images, of video, and of audio data. In addition, there is a short section devoted to wavelet methods, since these seem to hold much promise for the future.

287 citations


Journal ArticleDOI
TL;DR: In this article, it was shown that the probability of any given local configuration in a random tiling of the plane with dominos can be computed explicitly for the measure of maximal entropy μ on the space of tilings.
Abstract: We show how to compute the probability of any given local configuration in a random tiling of the plane with dominos. That is, we explicitly compute the measures of cylinder sets for the measure of maximal entropy μ on the space of tilings of the plane with dominos. We construct a measure ν on the set of lozenge tilings of the plane, show that its entropy is the topological entropy, and compute explicitly the ν-measures of cylinder sets. As applications of these results, we prove that the translation action is strongly mixing for μ and ν, and compute the rate of convergence to mixing (the correlation between distant events). For the measure ν we compute the variance of the height function.

275 citations


Patent
19 Mar 1997
TL;DR: In this paper, a method and apparatus for adaptive bit allocation and hybrid lossless entropy encoding is presented, which includes three components: (1) a transform stage, (2) a quantization stage, and (3) a loss-less entropy coder stage.
Abstract: A method and apparatus for adaptive bit allocation and hybrid lossless entropy encoding. The system includes three components: (1) a transform stage, (2) a quantization stage, and (3) a lossless entropy coder stage. The transform stage (1) uses a wavelet transform algorithm. The quantization stage (2) adaptively estimates values for parameters defining an approximation between quantization size and the logarithm of quantization error, and recursively calculates the optimal quantization size for each band to achieve a desired bit rate. The baseband and subbands are transformed into quantization matrices using the corresponding quantization sizes. The lossless entropy coder stage (3) uses the observation that the entropy property of run lengths of zero index values in the subband quantization matrices is different from the entropy property of non-zero indices. Each quantization matrix is parsed so that each non-zero index is extracted into a separate stream, and the remaining position information is parsed into an odd stream of run length values for "0" and an even stream of run length values for "1". These three streams are Huffman coded separately in conventional fashion.

Journal ArticleDOI
TL;DR: A small glossary of measure-preserving ergodic theory can be found in this article, where rank-one systems are defined as a generalization of dynamical systems.
Abstract: 0.1. Measure-theoretic dynamical systems 2 0.2. A small glossary of measure-preserving ergodic theory 3 1. Rank one 5 1.1. The lecturer’s nightmare : how to define a rank one system 5 1.2. First properties and the reduced geometric definition 7 1.3. First examples, and the last definition 8 1.4. The famous rank one systems : a guided tour of the zoo 9 1.5. Metric properties of rank one systems 10 1.6. Spectral properties of rank one systems 11 2. Generalizations of rank one and related notions 12 2.1. Finite rank 12 2.2. Local rank and covering number 13 2.3. Metric properties 14 2.4. Spectral properties 14 2.5. Dictionary of other related notions 16 2.6. Funny rank 17 2.7. Other constructions by cutting and stacking 18 3. Rank properties of classical systems 18 3.1. Irrational rotations 18 3.2. Interval exchanges 19 3.3. Substitutions 20 References 22

Journal ArticleDOI
TL;DR: In this paper, the evolution of the entropy of the central intra-cluster gas is explicitly taken into account and a theoretical framework is developed to develop a theoretical context within which steadily improving measurements of the X-ray luminosities and temperatures of distant galaxy clusters can be interpreted.
Abstract: Observations of the evolution of the galaxy cluster X-ray luminosity function suggest that the entropy of the intra-cluster medium plays a significant role in determining the development of cluster X-ray properties. I present a theoretical framework in which the evolution of the entropy of the central intra-cluster gas is explicitly taken into account. The aim of this work is to develop a theoretical context within which steadily improving measurements of the X-ray luminosities and temperatures of distant galaxy clusters can be interpreted. I discuss the possible range of entropy evolution parameters and relate these to the physical processes heating and cooling the intra-cluster medium. The practical application of this work is demonstrated by combining currently available evolutionary constraints on the X-ray luminosity function and the luminosity--temperature correlation to determine the best-fitting model parameters.

Journal ArticleDOI
TL;DR: In this paper, a unified description of all BPS states of M-theory compactified on T5 in terms of the five-brane is given, and a new explanation for its correspondence with heterotic string theory by exhibiting its dual equivalence to Mtheory on K3 × S1.

Journal ArticleDOI
25 Mar 1997
TL;DR: A new statistical model (compression algorithm) for naturally occurring DNA sequences is introduced, the strongest reported to date, and its parameters are learned using expectation maximization (EM) and may lead to better performance in microbiological pattern recognition applications.
Abstract: If DNA were a random string over its alphabet {A,C,G,T}, an optimal code would assign 2 bits to each nucleotide. We imagine DNA to be a highly ordered, purposeful molecule, and might therefore reasonably expect statistical models of its string representation to produce much lower entropy estimates. Surprisingly this has not been the case for many natural DNA sequences, including portions of the human genome. We introduce a new statistical model (compression algorithm), the strongest reported to date, for naturally occurring DNA sequences. Conventional techniques code a nucleotide using only slightly fewer bits (1.90) than one obtains by relying only on the frequency statistics of individual nucleotides (1.95). Our method in some cases increases this gap by more than five-fold (1.66) and may lead to better performance in microbiological pattern recognition applications. One of our main contributions, and the principle source of these improvements, is the formal inclusion of inexact match information in the model. The existence of matches at various distances forms a panel of experts which are then combined into a single prediction. The structure of this combination is novel and its parameters are learned using expectation maximization (EM).

Book
08 May 1997
TL;DR: Theoretical Framework: What is Information? Joint & conditional probabilities. as discussed by the authors discusses concepts, definitions, and analogies of information in economic systems, including the Second Law of Thermodynamics.
Abstract: Partial Contents: Concepts, Definitions, and Analogies. Entropy & the Second Law of Thermodynamics. Irreversibility in Economics. Information. Theoretical Framework: What is Information? Joint & Conditional Probabilities. Information & Entropy. Physical evolution: From the Universe to the Earth. The Evolution of Matter. The First Picosecond & the Next Three Minutes. From Three Minutes to a Million Years. Geological & Biochemical Evolution. Geological Evolution of the Earth. Chemical Precursors. Digression: Other Unsolved Mysteries. Biological Evolution. Primitive Organisms. The Invention of Sexual Reproduction. Evolutionary Mechanisms & Discontinuities. Evolution in Human Social Systems. The Evolution of Cooperative Behavior. Games & Rational Choice in Social Systems. Evolution in Economic Systems. Evolution & Growth. The Problem of Economic Growth Revisited. Schumpeter's Contribution: Radical Innovation. The Economy as a Self- Organizing Information Processing System. The Analogy with Living Systems. Information Transformation and Value Added. Information Added by Materials Processing. Energy Conversion. Cost of Refining. Morphological Information. Labor as an Information Process. Ergonomic

Journal ArticleDOI
TL;DR: In this paper, the authors define invariants for measure-preserving actions of discrete amenable groups which characterize various subexponential rates of growth for the number of essential orbits similarly to the way entropy of the action characterizes the exponential growth rate.
Abstract: We define invariants for measure-preserving actions of discrete amenable groups which characterize various subexponential rates of growth for the number of “essential” orbits similarly to the way entropy of the action characterizes the exponential growth rate. We obtain above estimates for these invariants for actions by diffeomorphisms of a compact manifold (with a Borel invariant measure) and, more generally, by Lipschitz homeomorphisms of a compact metric space of finite box dimension. We show that natural cutting and stacking constructions alternating independent and periodic concatenation of names produce ℤ 2 actions with zero one-dimensional entropies in all (including irrational) directions which do not allow either of the above realizations.

Journal ArticleDOI
TL;DR: In this article, the authors evaluate the signification philosophique du critere d'information d'Akaike applique aux problemes de tracage de courbe, and mesure la pertinence du principe statistique de maximisation entropique pour choisir entre des hypotheses simples and des hypotheses compliquees.
Abstract: Evaluation de la signification philosophique du critere d'information d'Akaike applique aux problemes de tracage de courbe. Examinant des exemples ou le theoreme d'Akaike est valide, et des contre-exemples ou celui-ci est invalide, l'A. mesure la pertinence du principe statistique de maximisation entropique pour choisir entre des hypotheses simples et des hypotheses compliquees

Journal ArticleDOI
TL;DR: It is demonstrated that for unifilar or Markov sources, the redundancy of encoding the first n letters of the source output with the Lempel-Ziv incremental parsing rule, the Welch modification, or a new variant is O((ln n)/sup -1/), and the exact form of convergence is upper-bound.
Abstract: The Lempel-Ziv codes are universal variable-to-fixed length codes that have become virtually standard in practical lossless data compression. For any given source output string from a Markov or unifilar source, we upper-bound the difference between the number of binary digits needed to encode the string and the self-information of the string. We use this result to demonstrate that for unifilar or Markov sources, the redundancy of encoding the first n letters of the source output with the Lempel-Ziv incremental parsing rule (LZ'78), the Welch modification (LZW), or a new variant is O((ln n)/sup -1/), and we upper-bound the exact form of convergence. We conclude by considering the relationship between the code length and the empirical entropy associated with a string.

Journal Article
TL;DR: In this paper, the uncertainty associated with the deviation from the prototype definitions can be estimated using a membership exaggeration measure, which is used to identify that the high elevation areas were mapped with high accuracy and that error reduction efforts are needed in mapping the soil resource in the low elevation areas.
Abstract: There are two kinds of uncertainty associated with assigning a geographic entity to a class in the classification process. The first is related to the fuzzy belonging of the entity to the prescribed set of classes and the second is associated with the deviation of the entity from the prototype of the class to which the entity is assigned. This paper argues that these two kinds of uncertainty can be estimated if a similarity model is employed in spatial data representation. Under this similarity model, the uncertainty of fuzzy belonging can be approximated by an entropy measure of membership distribution or by a measure of membership residual. The uncertainty associated with the deviation from the prototype definitions can be estimated using a membership exaggeration measure. A case study using a soil map shows that high entropy values occur in areas where soils seem to be transitional and that areas which are mis-classified have higher entropy values. The membership exaggeration is high for areas where soil experts have low confidence in identifi-ing soil types and predicting their spatial distribution. These measures helped in identifying that the high elevation areas were mapped with high accuracy and that error reduction efforts are needed in mapping the soil resource in the low elevation areas.

Book
01 Feb 1997
TL;DR: This chapter discusses Empirical Modeling by Neural Networks, which automates the very labor-intensive and therefore time-heavy and expensive process of designing and implementing self-organizing systems.
Abstract: 1. Introduction.- 1.1 Goal.- 1.2 Relation to Other Scientific Fields.- 1.3 Plan of the Monograph.- 2. A Quantitative Description of Nature.- 2.1 Synergetics of Natural Phenomena.- 2.2 A Description of Nature.- 2.3 Fundamentals of Quantitative Description.- 2.4 Fundamentals of Physical Laws.- 2.5 The Random Character of Physical Variables.- 2.6 Expression of Natural Laws by Differential Equations.- 2.7 Methods of Empirical Modeling.- 2.7.1 The Role of Models.- 2.7.2 Piecewise Linear Models of Empirical Natural Laws.- 2.8 Introduction to Modeling by Neural Networks.- 2.8.1 Functional Properties of a Neuron.- 2.8.2 Empirical Modeling by a Perceptron.- 3. Transducers.- 3.1 The Role of Sensors and Actuators.- 3.2 Sensors and Actuators of Biological Systems.- 3.2.1 Performance Characteristics of Biological Sensors.- 3.2.2 Structure of Biological Sensors.- 3.2.3 Transduction Characteristics of Biological Sensors.- 3.3 Operational Characteristics of Transducers.- 3.3.1 Transducer Classification.- 3.3.2 Transduction Characteristics.- 3.3.3 Sensor Loading Effects.- 3.3.4 Transducer Field Characteristics.- 3.4 Fabricated Transducers.- 3.4.1 Microsensors and Integrated Sensors.- 3.4.2 Synthetic Bio-sensors and Neurobiology.- 3.5 Transducers in Intelligent Measurement Systems.- 3.6 Future Directions in Transducer Evolution.- 4. Probability Densities.- 4.1 Estimation of Probability Density.- 4.1.1 Parzen Window Approach.- 4.1.2 An Optimal Selection of the Window Function.- 4.1.3 Nearest Neighbor and Maximal Self-Consistency Approach.- 4.1.4 The Self-Consistent Method in the Multivariate Case..- 4.1.5 Numerical Examples.- 4.1.6 Conclusions About Filtering of the Empirical PDF.- 5. Information.- 5.1 Some Basic Ideas.- 5.2 Entropy of Information.- 5.3 Properties of Information Entropy.- 5.4 Relative Information.- 5.4.1 Information of Continuous Distributions.- 5.4.2 Information Gain from Experiments.- 5.5 Information Measure of Distance Between Distributions.- 6. Maximum Entropy Principles.- 6.1 Gibbs Maximum Entropy Principle.- 6.2 The Absolute Maximum Entropy Principle.- 6.3 Quantization of Continuous Probability Distributions.- 6.3.1 Quadratic Measure of Discrepancy Between Distributions.- 6.3.2 Information Divergence as a Measure of Discrepancy.- 6.3.3 Vector Quantization and Reconstruction Measure of Discrepancy.- 7. Adaptive Modeling of Natural Laws.- 7.1 Probabilistic Modeler of Natural Laws.- 7.2 Optimization of Adaptive Modeler Performance.- 7.3 Stochastic Approach to Adaptation Laws.- 7.4 Stochastic Adaptation of a Vector Quantizer.- 7.5 Perturbation Method of Adaptation.- 7.6 Evolution of an Optimal Modeler and Perturbation Method.- 7.7 Parametric Versus Non-Parametric Modeling.- 8. Self-Organization and Formal Neurons.- 8.1 Optimal Storage of Empirical Information in Discrete Systems.- 8.2 Adaptive Vector Quantization and Topological Mappings.- 8.3 Self-Organization Based on the Absolute Maximum-Entropy Principle.- 8.4 Derivation of a Generalized Self-Organization Rule.- 8.5 Numerical Examples of Self-Organized Adaptation.- 8.6 Formal Neurons and the Self-Organization Process.- 9. Modeling by Non-Parametric Regression.- 9.1 The Problem of an Optimal Prediction.- 9.2 Parzen'1. Introduction.- 1.1 Goal.- 1.2 Relation to Other Scientific Fields.- 1.3 Plan of the Monograph.- 2. A Quantitative Description of Nature.- 2.1 Synergetics of Natural Phenomena.- 2.2 A Description of Nature.- 2.3 Fundamentals of Quantitative Description.- 2.4 Fundamentals of Physical Laws.- 2.5 The Random Character of Physical Variables.- 2.6 Expression of Natural Laws by Differential Equations.- 2.7 Methods of Empirical Modeling.- 2.7.1 The Role of Models.- 2.7.2 Piecewise Linear Models of Empirical Natural Laws.- 2.8 Introduction to Modeling by Neural Networks.- 2.8.1 Functional Properties of a Neuron.- 2.8.2 Empirical Modeling by a Perceptron.- 3. Transducers.- 3.1 The Role of Sensors and Actuators.- 3.2 Sensors and Actuators of Biological Systems.- 3.2.1 Performance Characteristics of Biological Sensors.- 3.2.2 Structure of Biological Sensors.- 3.2.3 Transduction Characteristics of Biological Sensors.- 3.3 Operational Characteristics of Transducers.- 3.3.1 Transducer Classification.- 3.3.2 Transduction Characteristics.- 3.3.3 Sensor Loading Effects.- 3.3.4 Transducer Field Characteristics.- 3.4 Fabricated Transducers.- 3.4.1 Microsensors and Integrated Sensors.- 3.4.2 Synthetic Bio-sensors and Neurobiology.- 3.5 Transducers in Intelligent Measurement Systems.- 3.6 Future Directions in Transducer Evolution.- 4. Probability Densities.- 4.1 Estimation of Probability Density.- 4.1.1 Parzen Window Approach.- 4.1.2 An Optimal Selection of the Window Function.- 4.1.3 Nearest Neighbor and Maximal Self-Consistency Approach.- 4.1.4 The Self-Consistent Method in the Multivariate Case..- 4.1.5 Numerical Examples.- 4.1.6 Conclusions About Filtering of the Empirical PDF.- 5. Information.- 5.1 Some Basic Ideas.- 5.2 Entropy of Information.- 5.3 Properties of Information Entropy.- 5.4 Relative Information.- 5.4.1 Information of Continuous Distributions.- 5.4.2 Information Gain from Experiments.- 5.5 Information Measure of Distance Between Distributions.- 6. Maximum Entropy Principles.- 6.1 Gibbs Maximum Entropy Principle.- 6.2 The Absolute Maximum Entropy Principle.- 6.3 Quantization of Continuous Probability Distributions.- 6.3.1 Quadratic Measure of Discrepancy Between Distributions.- 6.3.2 Information Divergence as a Measure of Discrepancy.- 6.3.3 Vector Quantization and Reconstruction Measure of Discrepancy.- 7. Adaptive Modeling of Natural Laws.- 7.1 Probabilistic Modeler of Natural Laws.- 7.2 Optimization of Adaptive Modeler Performance.- 7.3 Stochastic Approach to Adaptation Laws.- 7.4 Stochastic Adaptation of a Vector Quantizer.- 7.5 Perturbation Method of Adaptation.- 7.6 Evolution of an Optimal Modeler and Perturbation Method.- 7.7 Parametric Versus Non-Parametric Modeling.- 8. Self-Organization and Formal Neurons.- 8.1 Optimal Storage of Empirical Information in Discrete Systems.- 8.2 Adaptive Vector Quantization and Topological Mappings.- 8.3 Self-Organization Based on the Absolute Maximum-Entropy Principle.- 8.4 Derivation of a Generalized Self-Organization Rule.- 8.5 Numerical Examples of Self-Organized Adaptation.- 8.6 Formal Neurons and the Self-Organization Process.- 9. Modeling by Non-Parametric Regression.- 9.1 The Problem of an Optimal Prediction.- 9.2 Parzen's Window Approach to General Regression.- 9.3 General Regression Modeler, Feedback and Recognition.- 9.4 Application of the General Regression Modeler.- 9.4.1 Empirical Modeling of Acoustic Phenomena.- 9.4.2 Prediction of the Seismic Capacity of Walls.- 9.4.3 Modeling of a Periodontal Disease Healing Process.- 10. Linear Modeling and Invariances.- 10.1 Relation Between Parametric Modeling and Invariances.- 10.2 Generalized Linear Regression Model.- 10.2.1 An Example of Iterative Determination of a Linear Regression Model.- 10.3 Sequential Adaptation of Linear Regression Model.- 10.4 Transition from the Cross- to Auto-Associator.- 10.4.1 Application of the Auto-Associator to Analysis of Ultrasonic Signals.- 11. Modeling and Forecasting of Chaotic Processes.- 11.1 Modeling of Chaotic Processes.- 11.2 Examples of Chaotic Process Forecasting.- 11.3 Forecasting of Chaotic Acoustic Emission Signals.- 11.4 Empirical Modeling of Non-Autonomous Chaotic Systems.- 11.4.1 Example of Economic Time-Series Forecasting.- 11.5 Cascade Modeling of Chaos Generators.- 11.5.1 Numerical Experiments.- 11.5.2 Concluding Remarks.- 12. Modeling by Neural Networks.- 12.1 From Biological to Artificial Neural Networks.- 12.1.1 Basic Blocks of Neural Networks and Their Dynamics.- 12.2 A Linear Associator.- 12.3 Multi-layer Perceptrons and Back-Propagation Learning.- 12.4 Radial Basis Function Neural Networks.- 12.5 Equivalence of a Radial Basis Function NN and Perceptrons.- 13. Fundamentals of Intelligent Control.- 13.1 Introduction.- 13.2 Basic Tasks of Intelligent Control.- 13.2.1 Empirical Description of a Controlled System.- 13.2.2 General Identification by Non-Parametric Modeling.- 13.3 The Tracking Problem.- 13.4 Cloning.- 13.5 An Empirical Approach to Optimal Control.- 13.5.1 The Theoretical Problem of Optimal Control.- 13.5.2 Experimental Description of Plant Performance and Optimal Control.- 13.5.3 Design of an Intelligent Optimal Controller.- 13.5.4 The Influence of the Environment on Optimal Control.- 13.5.5 The Problem of Phase Space Exploration.- 13.5.6 Numerical Simulations of Optimal Control.- 13.5.7 Summary and Conclusions.- 14. Self-Control and Biological Evolution.- 14.1 Modeling of Natural Phenomena by Biological Systems.- 14.2 Joint Modeling of Organism and Environment.- 14.3 An Operational Description of Consciousness.- 14.4 The Fundamental Problem of Evolution.- A. Fundamentals of Probability and Statistics.- A.1 Sample Points, Sample Space, Events and Relations.- A.2 Probability.- A.3 Random Variables and Probability Distributions.- A.4 Averages and Moments.- A.5 Random Processes.- A.6 Sampling, Estimation and Statistics.- B. Fundamentals of Deterministic Chaos.- B.1 Instability of Chaotic Systems.- B.2 Characterization of Strange Attractors.- B.3 Experimental Characterization of Chaotic Phenomena.- References.

Journal ArticleDOI
TL;DR: This paper shows that the LOGIT type stochastic assignment/stochastic user equilibrium assignment can be represented as an optimization problem with only link variables and the equivalence of the decomposed formulation to LOGIT assignment is proved by using the Markov properties that underlie Dial's algorithm.
Abstract: This paper shows that the LOGIT type stochastic assignment/stochastic user equilibrium assignment can be represented as an optimization problem with only link variables. The conventional entropy function defined by path flows in the objective can be decomposed into a function consisting only of link flows. The idea of the decomposed formulation is derived from a consideration of the most likely link flow patterns over a network. Then the equivalence of the decomposed formulation to LOGIT assignment is proved by using the Markov properties that underlie Dial's algorithm. Through the analyses, some useful properties of the entropy function and its conjugate dual function (expected minimum cost function) have been derived. Finally, it is discussed that the derived results have a potential impact on the development of efficient algorithms for the stochastic user equilibrium assignment.

Journal ArticleDOI
TL;DR: A novel approach, based on an enhanced Laplacian pyramid, is proposed for the compression, either lossless or lossy, of gray-scale images, and shows improvements over reversible Joint Photographers Expert Group (JPEG) and the reduced-difference pyramid schemes.
Abstract: In this paper, the effects of quantization noise feedback on the entropy of Laplacian pyramids are investigated. This technique makes it possible for the maximum absolute reconstruction error to be easily and strongly upper-bounded (near-lossless coding), and therefore, allows reversible compression. The entropy-minimizing optimum quantizer is obtained by modeling the first-order distributions of the differential signals as Laplacian densities, and by deriving a model for the equivalent memoryless entropy. A novel approach, based on an enhanced Laplacian pyramid, is proposed for the compression, either lossless or lossy, of gray-scale images. Major details are prioritized through a content-driven decision rule embedded in a uniform threshold quantizer with noise feedback. Lossless coding shows improvements over reversible Joint Photographers Expert Group (JPEG) and the reduced-difference pyramid schemes, while lossy coding outperforms JPEG, with a significant peak signal-to-noise ratio (PSNR) gain. Also, subjective quality is higher even at very low bit rates, due to the absence of the annoying impairments typical of JPEG. Moreover, image versions having resolution and SNR that are both progressively increasing are made available at the receiving end from the earliest retrieval stage on, as intermediate steps of the decoding procedure, without any additional cost.

Journal ArticleDOI
TL;DR: It is shown that MER is a viable strategy for building topographic maps that maximize the average mutual information of the output responses to noiseless input signals when only input noise and noise-added input signals are available.
Abstract: This article introduces an extremely simple and local learning rule for to pographic map formation. The rule, called the maximum entropy learning rule (MER), maximizes the unconditional entropy of the map's output for any type of input distribution. The aim of this article is to show that MER is a viable strategy for building topographic maps that maximize the average mutual information of the output responses to noiseless input signals when only input noise and noise-added input signals are available.

Proceedings ArticleDOI
10 Dec 1997
TL;DR: The paper solves problems of worst case robust performance analysis and output feedback minimax optimal controller synthesis in a general nonlinear setting and specializing these results to the linear case leads to a minimax LQG optimal controller.
Abstract: Considers a class of discrete time stochastic uncertain systems in which the uncertainty is described by a constraint on the relative entropy between a nominal noise distribution and the perturbed noise distribution. The paper solves problems of worst case robust performance analysis and output feedback minimax optimal controller synthesis in a general nonlinear setting. Specializing these results to the linear case leads to a minimax LQG optimal controller.

Journal ArticleDOI
TL;DR: The proposed gray-level threshold selection method is similar to the maximum entropy method proposed by Kapur et al. (1985), however, the new method provided a good threshold value in many instances where the previous method did not.
Abstract: A new gray-level threshold selection method for image segmentation is presented. It is based on minimizing the difference between entropies of the object and the background distributions of the gray-level histogram. The proposed method is similar to the maximum entropy method proposed by Kapur et al. (1985), however, the new method provided a good threshold value in many instances where the previous method did not. The effectiveness of our method is demonstrated by its performance on videomicroscopic images of the rat lung. Extension of the method to higher order probability density functions is described.

Journal ArticleDOI
TL;DR: A new learning algorithm for regression modeling based on deterministic annealing, which consistently and substantially outperformed the competing methods for training NRBF and HME regression functions over a variety of benchmark regression examples.
Abstract: We propose a new learning algorithm for regression modeling. The method is especially suitable for optimizing neural network structures that are amenable to a statistical description as mixture models. These include mixture of experts, hierarchical mixture of experts (HME), and normalized radial basis functions (NRBF). Unlike recent maximum likelihood (ML) approaches, we directly minimize the (squared) regression error. We use the probabilistic framework as means to define an optimization method that avoids many shallow local minima on the complex cost surface. Our method is based on deterministic annealing (DA), where the entropy of the system is gradually reduced, with the expected regression cost (energy) minimized at each entropy level. The corresponding Lagrangian is the system's "free-energy", and this annealing process is controlled by variation of the Lagrange multiplier, which acts as a "temperature" parameter. The new method consistently and substantially outperformed the competing methods for training NRBF and HME regression functions over a variety of benchmark regression examples.

Proceedings ArticleDOI
14 Dec 1997
TL;DR: A new kind of language model, which models whole sentences or utterances directly using the maximum entropy (ME) paradigm, which is conceptually simpler, and more naturally suited to modeling whole-sentence phenomena, than the conditional ME models proposed to date.
Abstract: Introduces a new kind of language model, which models whole sentences or utterances directly using the maximum entropy (ME) paradigm. The new model is conceptually simpler, and more naturally suited to modeling whole-sentence phenomena, than the conditional ME models proposed to date. By avoiding the chain rule, the model treats each sentence or utterance as a "bag of features", where features are arbitrary computable properties of the sentence. The model is unnormalizable, but this does not interfere with training (done via sampling) or with use. Using the model is computationally straightforward. The main computational cost of training the model is in generating sample sentences from a Gibbs distribution. Interestingly, this cost has different dependencies, and is potentially lower than in the comparable conditional ME model.

Journal ArticleDOI
TL;DR: The effect of state quantization in scaler discrete-time linear control systems is studied by analyzing the system as a partially observed stochastic system and it is shown that this problem is equivalent to an optimal control problem for a controlled Markov chain.
Abstract: In this paper the effect of state quantization in scaler discrete-time linear control systems is studied by analyzing the system as a partially observed stochastic system. The problem of optimal state information gathering and filtering is investigated using information theoretic measures and formulating the state estimation problem as an entropy optimization problem. The active probing effect of the feedback control is thoroughly studied. Optimal feedback controls which minimize various types of entropy costs are determined, and it is shown that this problem is equivalent to an optimal control problem for a controlled Markov chain.

Journal ArticleDOI
TL;DR: A conditional distribution learning formulation for real-time signal processing with neural networks based on an extension of maximum likelihood theory-partial likelihood (PL) estimation-which allows for dependent observations and sequential processing is presented.
Abstract: We present a conditional distribution learning formulation for real-time signal processing with neural networks based on an extension of maximum likelihood theory-partial likelihood (PL) estimation-which allows for (i) dependent observations and (ii) sequential processing. For a general neural network conditional distribution model, we establish a fundamental information-theoretic connection, the equivalence of maximum PL estimation, and accumulated relative entropy (ARE) minimization, and obtain large sample properties of PL for the general case of dependent observations. As an example, the binary case with the sigmoidal perceptron as the probability model is presented. It is shown that the single and multilayer perceptron (MLP) models satisfy conditions for the equivalence of the two cost functions: ARE and negative log partial likelihood. The practical issue of their gradient descent minimization is then studied within the well-formed cost functions framework. It is shown that these are well-formed cost functions for networks without hidden units; hence, their gradient descent minimization is guaranteed to converge to a solution if one exists on such networks. The formulation is applied to adaptive channel equalization, and simulation results are presented to show the ability of the least relative entropy equalizer to realize complex decision boundaries and to recover during training from convergence at the wrong extreme in cases where the mean square error-based MLP equalizer cannot.