scispace - formally typeset
Search or ask a question

Showing papers on "Entropy (information theory) published in 1994"


Journal ArticleDOI
TL;DR: In this paper, it was shown that the reference prior is the maximum entropy prior, and that it is the only prior prior that achieves the asymptotic maximin value.

283 citations


Journal ArticleDOI
TL;DR: The problem of entropy-constrained multiple-description scalar quantizer design is posed as an optimization problem, necessary conditions for optimality are derived, and an iterative design algorithm is presented.
Abstract: The problem of entropy-constrained multiple-description scalar quantizer design is posed as an optimization problem, necessary conditions for optimality are derived, and an iterative design algorithm is presented. Performance results are presented for a Gaussian source, along with comparisons to the multiple-description rate distortion bound and a reference system. >

200 citations


Journal ArticleDOI
TL;DR: The entropy method for image thresholding suggested by Kapur et al. has been modified and a more pertinent information measure of the image is obtained.

199 citations


Journal ArticleDOI
TL;DR: The purpose of this article is to develop a general appreciation for the meanings of information functions rather than their mathematical use and to discuss the intricacies of quantifying information in some statistical problems.
Abstract: The purpose of this article is to discuss the intricacies of quantifying information in some statistical problems. The aim is to develop a general appreciation for the meanings of information functions rather than their mathematical use. This theme integrates fundamental aspects of the contributions of Kullback, Lindley, and Jaynes and bridges chaos to probability modeling. A synopsis of information-theoretic statistics is presented in the form of a pyramid with Shannon at the vertex and a triangular base that signifies three distinct variants of quantifying information: discrimination information (Kullback), mutual information (Lindley), and maximum entropy information (Jaynes). Examples of capturing information by the maximum entropy (ME) method are discussed. It is shown that the ME approach produces a general class of logit models capable of capturing various forms of sample and nonsample information. Diagnostics for quantifying information captured by the ME logit models are given, and decom...

176 citations


Journal ArticleDOI
TL;DR: In this article, the authors present two methods for stepwise selection of sampling units, and corresponding schemes for removal of units that can be used in connection with sample rotation, and describe practical, geometrically convergent algorithms for computing the wi from the 7i.
Abstract: SUMMARY Attention is drawn to a method of sampling a finite population of N units with unequal probabilities and without replacement. The method was originally proposed by Stern & Cover (1989) as a model for lotteries. The method can be characterized as maximizing entropy given coverage probabilities 7Ci, or equivalently as having the probability of a selected sample proportional to the product of a set of 'weights' wi. We show the essential uniqueness of the wi given the 7i, and describe practical, geometrically convergent algorithms for computing the wi from the 7i. We present two methods for stepwise selection of sampling units, and corresponding schemes for removal of units that can be used in connection with sample rotation. Inclusion probabilities of any order can be written explicitly in closed form. Second-order inclusion probabilities 7rij satisfy the condition 0< ij < 7ic 7j, which guarantees Yates & Grundy's variance estimator to be unbiased, definable for all samples and always nonnegative for any sample size.

158 citations


Journal ArticleDOI
TL;DR: A new methodology for data collection network design employs a measure of the information flow between gauging stations in the network which is referred to as the directional information transfer and non-parametric estimation is used to approximate the multivariate probability density functions required for entropy calculations.

144 citations


Journal ArticleDOI
TL;DR: It is shown that the Shannon lower bound is asymptotically tight for norm-based distortions, when the source vector has a finite differential entropy and a finite /splalpha/ th moment for some /spl alpha/>0, with respect to the given norm.
Abstract: New results are proved on the convergence of the Shannon (1959) lower bound to the rate distortion function as the distortion decreases to zero. The key convergence result is proved using a fundamental property of informational divergence. As a corollary, it is shown that the Shannon lower bound is asymptotically tight for norm-based distortions, when the source vector has a finite differential entropy and a finite /spl alpha/ th moment for some /spl alpha/>0, with respect to the given norm. Moreover, we derive a theorem of Linkov (1965) on the asymptotic tightness of the Shannon lower bound for general difference distortion measures with more relaxed conditions on the source density. We also show that the Shannon lower bound relative to a stationary source and single-letter difference distortion is asymptotically tight under very weak assumptions on the source distribution. >

140 citations


Book ChapterDOI
A.D. Wyner1, Jacob Ziv1
01 Jun 1994
TL;DR: The sliding-window version of the Lempel-Ziv data-compression algorithm is described, and it is shown that as the "window size," a quantity related to the memory and complexity of the procedure, goes to infinity, the compression rate approaches the source entropy.
Abstract: The sliding-window version of the Lempel-Ziv data-compression algorithm (sometimes called LZ '77) has been thrust into prominence recently. A version of this algorithm is used in the highly successful "Stacker" program for personal computers. If is also incorporated into Microsoft's new MS-DOS-6. Although other versions of the Lempel-Ziv algorithm are known to he optimal in the sense that they compress a data source to its entropy, optimality in this sense has never been demonstrated for this version. In this self-contained paper, we will describe the algorithm, and show that as the "window size," a quantity which is related to the memory and complexity of the procedure, goes to infinity, the compression rate approaches the source entropy. The proof is surprisingly general, applying to all finite-alphabet stationary ergodic sources. >

118 citations


Proceedings ArticleDOI
19 Apr 1994
TL;DR: When applied to a time-frequency representation from Cohen's (1989) class, the Renyi entropy conforms closely to the visually based notion of complexity that the authors use when inspecting time- frequencies images.
Abstract: Many functions have been proposed for estimating signal information content and complexity on the time-frequency plane, including moment-based measures such as the time-bandwidth product and the Shannon and Renyi(see 4th Berkeley Symp. Math., Stat., Prob., vol.1) entropies. When applied to a time-frequency representation from Cohen's (1989) class, the Renyi entropy conforms closely to the visually based notion of complexity that we use when inspecting time-frequency images. A detailed discussion reveals many of the desirable properties of the Renyi information measure for both deterministic and random signals. >

110 citations


Journal ArticleDOI
TL;DR: In this paper, it is shown that entropy calculations are seriously affected by systematic errors due to the finite size of the samples, and that these difficulties can be dealt with by assuming simple probability distributions underlying the generating process (e.g. equidistribution, power-law distribution, exponential distribution).
Abstract: This paper is devoted to the statistical analysis of symbol sequences, such as Markov strings, DNA sequences, or texts from natural languages. It is shown that entropy calculations are seriously affected by systematic errors due to the finite size of the samples. These difficulties can be dealt with by assuming simple probability distributions underlying the generating process (e.g. equidistribution, power-law distribution, exponential distribution). Analytical expressions for the dominant correction terms are derived and tested.

109 citations


Journal ArticleDOI
TL;DR: In this paper, a method is proposed for statistically evaluating the accuracy levels of maximum likelihood classifications and representing them graphically based on the concept that the heterogeneity of the membership probabilities can be taken as an indicator of the confidence for the classification, such a parameter is estimated for all pixels as relative probability entropy and represented in a separate channel.
Abstract: A method is proposed for statistically evaluating the accuracy levels of maximum likelihood classifications and representing them graphically. Based on the concept that the heterogeneity of maximum likelihood membership probabilities can be taken as an indicator of the confidence for the classification, such a parameter is estimated for all pixels as relative probability entropy and represented in a separate channel. After a brief presentation of the statistical basis of the methodology, this is applied to a conventional and two modified maximum likelihood classifications in a case study using Landsat TM scenes. The results demonstrate the efficiency of the approach and, particularly, its usefulness for operational applications.

Journal ArticleDOI
TL;DR: Although the random-worlds method makes sense in general, the connection to maximum entropy seems to disappear in the non-unary case, and these observations suggest unexpected limitations to the applicability of maximum-entropy methods.
Abstract: Given a knowledge base KB containing first-order and statistical facts, we consider a principled method, called the random-worlds method, for computing a degree of belief that some formula Φ holds given KB. If we are reasoning about a world or system consisting of N individuals, then we can consider all possible worlds, or first-order models, with domain {1,..., N} that satisfy KB, and compute the fraction of them in which Φ is true. We define the degree of belief to be the asymptotic value of this fraction as N grows large. We show that when the vocabulary underlying Φ and KB uses constants and unary predicates only, we can naturally associate an entropy with each world. As N grows larger, there are many more worlds with higher entropy. Therefore, we can use a maximum-entropy computation to compute the degree of belief. This result is in a similar spirit to previous work in physics and artificial intelligence, but is far more general. Of equal interest to the result itself are the limitations on its scope. Most importantly, the restriction to unary predicates seems necessary. Although the random-worlds method makes sense in general, the connection to maximum entropy seems to disappear in the non-unary case. These observations suggest unexpected limitations to the applicability of maximum-entropy methods.

Book
01 Jan 1994
TL;DR: This volume presents a new algorithm for retrieval, which is optimal with respect to both program length and running time, and algorithms for hashing and adaptive on-line compressing, and the trade-off between complexity and the quality of coding is placed.
Abstract: From the Publisher: This volume constitutes a comprehensive self-contained course on source encoding. This is a rapidly developing field and the purpose of this book is to present the theory from its beginnings to the latest developments, some of which appear in book form for the first time. The major differences between this volume and previously-published works is that, here, information retrieval is incorporated into source coding instead of discussing this separately. Secondly, this volume places an emphasis on the trade-off between complexity and the quality of coding, i.e. what is the price of achieving a maximum degree of data compression? Thirdly, special attention is paid to universal families which contain a good compressing map for every source in a set. The volume presents a new algorithm for retrieval, which is optimal with respect to both program length and running time, and algorithms for hashing and adaptive on-line compressing. All the main tools of source coding and data compression such as Shannon, Ziv-Lempel, Gilbert-Moore codes, Kolmogorov complexity and entropy, lexicographic and digital search, are discussed. Moreover, data compression methods are described for developing short programs for partially specified Boolean functions, short formulas for threshold functions, identification keys, stochastic algorithms for finding the occurrence of a word in a text, and T-independent sets. Researchers and graduate students of information theory and theoretical computer science. It will also serve as a useful reference for communication engineers and data base designers.

Proceedings ArticleDOI
26 Jun 1994
TL;DR: A new approach to fuzzy clustering is presented, which provides the basis for the development of the maximum entropy clustering algorithm (MECA), which is based on an objective function incorporating a measure of the entropy of the membership functions and a measures of the distortion between the prototypes and the feature vectors.
Abstract: This paper presents a new approach to fuzzy clustering, which provides the basis for the development of the maximum entropy clustering algorithm (MECA). The derivation of the proposed algorithm is based on an objective function incorporating a measure of the entropy of the membership functions and a measure of the distortion between the prototypes and the feature vectors. This formulation allows the gradual transition from a maximum uncertainty or minimum selectivity phase to a minimum uncertainty or maximum selectivity phase during the clustering process. Such a transition is achieved by controlling the relative effect of the maximization of the membership entropy and the minimization of the distortion between the prototypes and the feature vectors. The IRIS data set provides the basis for evaluating the proposed algorithms and comparing their performance with that of competing techniques. >

Journal ArticleDOI
TL;DR: An on-line state and parameter identification scheme for hidden Markov models (HMMs) with states in a finite-discrete set is developed using recursive prediction error (RPE) techniques and an improved version of an earlier proposed scheme is presented with a parameterization that ensures positivity of transition probability estimates.
Abstract: An on-line state and parameter identification scheme for hidden Markov models (HMMs) with states in a finite-discrete set is developed using recursive prediction error (RPE) techniques. The parameters of interest are the transition probabilities and discrete state values of a Markov chain. The noise density associated with the observations can also be estimated. Implementation aspects of the proposed algorithms are discussed, and simulation studies are presented to show that the algorithms converge for a wide variety of initializations. In addition, an improved version of an earlier proposed scheme (the Recursive Kullback-Leibler (RKL) algorithm) is presented with a parameterization that ensures positivity of transition probability estimates. >

Journal ArticleDOI
TL;DR: In this article, a collective measure for the uncertainty importance and distributional sensitivity is presented, which is based on the information theory in which the entropy is a measure of uncertainty represented by a probability density function.

Book ChapterDOI
01 Jan 1994
TL;DR: An iterative approach, Segmentation Pursuit, for identifying edges by the fast segmentation algorithm and removing them from the data is described, which considers the search for a segmented wavelet basis which, among all such segmented bases, minimizes the “entropy” of the resulting coefficients.
Abstract: We describe segmented multiresolution analyses of [0, 1]. Such multiresolution analyses lead to segmented wavelet bases which are adapted to discontinuities, cusps, etc., at a given location τ ∈ [0, 1]. Our approach emphasizes the idea of average-interpolation — synthesizing a smooth function on the line having prescribed boxcar averages. This particular approach leads to methods with subpixel resolution and to wavelet transforms with the advantage that, for a signal of length n , all n pixel-level segmented wavelet transforms can be computed simultaneously in a total time and space which are both O ( n log( n )). We consider the search for a segmented wavelet basis which, among all such segmented bases, minimizes the “entropy” of the resulting coefficients. Fast access to all segmentations enables fast search for a best segmentation. When the “entropy” is Stein's Unbiased Risk Estimate, one obtains a new method of edge-preserving de-noising. When the “entropy” is the l 2 -energy, one obtains a new multi-resolution edge detector, which works not only for step discontinuities but also for cusp and higher-order discontinuities, and in a near-optimal fashion in the presence of noise. We describe an iterative approach, Segmentation Pursuit , for identifying edges by the fast segmentation algorithm and removing them from the data.

Journal ArticleDOI
TL;DR: The asymptotic number of random bits per input sample required for accurate simulation, as a function of the distribution of the input process, is found.
Abstract: Studies the minimum random bit rate required to simulate a random system (channel), where the simulator operates with a given external input. As measures of simulation accuracy the authors use both the variational distance and the d~ distance between joint input-output distributions. They find the asymptotic number of random bits per input sample required for accurate simulation, as a function of the distribution of the input process. These results hold for arbitrary channels and input processes, including nonstationary and nonergodic processes and do not hinge on a specific simulation scheme. A by-product of the analysis is a general formula for the minimal achievable source coding rate with side information. >

Patent
19 May 1994
TL;DR: In this paper, data compression is effected on arbitrary high entropy digitized data by compression and entropy transformation in an iterative system, which includes nonlinear addressing, merge technique, swapping technique and various arithmetic modification techniques.
Abstract: Data compression is effected on arbitrary high entropy digitized data by compression and entropy transformation in an iterative system. Compression includes nonlinear addressing. Entropy transformation may involve any of a number of techniques to reorder distribution of data for testing to determine if the newly ordered data is compressible. Among the techniques are a merge technique, a swapping technique and various arithmetic modification techniques.

Journal ArticleDOI
J. Dvorak1
TL;DR: The article describes an automated classification tool that helps minimize conceptual entropy, which is manifested by increasing conceptual inconsistency as the authors travel down the hierarchy.
Abstract: All systems that undergo frequent change characteristically tend toward disorder. This is known as entropy and is recognized in all branches of science. Class hierarchies are shared structures which, if useful, undergo frequent change in the form of additional subclassing, modification to existing classes, and sometimes the restructuring of the hierarchy itself. Given this frequent change, we can expect class hierarchies to exhibit entropic tendencies, which we term conceptual entropy. Conceptual entropy is manifested by increasing conceptual inconsistency as we travel down the hierarchy. That is, the deeper the level of the hierarchy, the greater the probability that a subclass will not consistently extend and/or specialize the concept of its superclass. Constructing and maintaining consistent class hierarchies is one of the most difficult activities of object-oriented design. The article describes an automated classification tool that helps minimize conceptual entropy. >

Journal ArticleDOI
TL;DR: In this article, it was shown that the low-distortion performance of the Zamir-Feder universal vector quantizer is asymptotically the same as that of the deterministic lattice quantizer.
Abstract: Two results are given. First, using a result of Csiszar (1973) the asymptotic (i.e., high-resolution/low distortion) performance for entropy-constrained tessellating vector quantization, heuristically derived by Gersho (1979), is proven for all sources with finite differential entropy. This implies, using Gersho's conjecture and Zador's formula, that tessellating vector quantizers are asymptotically optimal for this broad class of sources, and generalizes a rigorous result of Gish and Pierce (1968) from the scalar to the vector case. Second, the asymptotic performance is established for Zamir and Feder's (1992) randomized lattice quantization. With the only assumption that the source has finite differential entropy, it is proven that the low-distortion performance of the Zamir-Feder universal vector quantizer is asympotically the same as that of the deterministic lattice quantizer. >

Journal ArticleDOI
M. Schmelling1
TL;DR: A new unfolding method is presented for the case where the unfolded distribution is known to be non-negative everywhere and it combines the least-squares method with the principle of minimum cross-entropy.
Abstract: A new unfolding method is presented for the case where the unfolded distribution is known to be non-negative everywhere. The method combines the least-squares method with the principle of minimum cross-entropy. Its properties are discussed together with an algorithm for its realization and illustrated by means of a numerical example.

Journal ArticleDOI
TL;DR: It is shown that a discrete infinite distribution with infinite entropy cannot be estimated consistently in information divergence and there is no universal source code for an infinite source alphabet over the class of all discrete memoryless sources with finite entropy.
Abstract: Shows that a discrete infinite distribution with finite entropy cannot be estimated consistently in information divergence. As a corollary the authors show that there is no universal source code for an infinite source alphabet over the class of all discrete memoryless sources with finite entropy. >

Journal ArticleDOI
TL;DR: The problem addressed here is this: Let S be a finite set and B be a belief function on 2S, which induces a density on S, which in turn induces a host of densities on S.
Abstract: A common procedure for selecting a particular density from a given class of densities is to choose one with maximum entropy. The problem addressed here is this. Let S be a finite set and let B be a belief function on 2S. Then B induces a density on 2S, which in turn induces a host of densities on S. Provide an algorithm for choosing from this host of densities one with maximum entropy.

Book ChapterDOI
TL;DR: The authors provides an introduction to the use of empirical process methods in econometrics, which can be used to establish the large sample properties of econometric estimators and test statistics.
Abstract: This paper provides an introduction to the use of empirical process methods in econometrics. These methods can be used to establish the large sample properties of econometric estimators and test statistics. In the first part of the paper, key terminology and results are introduced and discussed heuristically. Applications in the econometrics literature are briefly reviewed. A select set of three classes of applications is discussed in more detail. The second part of the paper shows how one can verify a key property called stochastic equicontinuity. The paper takes several stochastic equicontinuity results from the probability literature, which rely on entropy conditions of one sort or another, and provides primitive sufficient conditions under which the entropy conditions hold. This yields stochastic equicontinuity results that are readily applicable in a variety of contexts. Examples are provided.

Proceedings Article
Gustavo Deco1, Wilfried Brauer
01 Jan 1994
TL;DR: A neural network learning paradigm based on information theory is proposed as a way to perform in an unsupervised fashion, redundancy reduction among the elements of the output layer without loss of information from the sensory input.
Abstract: A neural network learning paradigm based on information theory is proposed as a way to perform in an unsupervised fashion, redundancy reduction among the elements of the output layer without loss of information from the sensory input. The model developed performs nonlinear decorrelation up to higher orders of the cumulant tensors and results in probabilistically independent components of the output layer. This means that we don't need to assume Gaussian distribution neither at the input nor at the output. The theory presented is related to the unsupervised-learning theory of Barlow, which proposes redundancy reduction as the goal of cognition. When nonlinear units are used nonlinear principal component analysis is obtained. In this case nonlinear manifolds can be reduced to minimum dimension manifolds. If such units are used the network performs a generalized principal component analysis in the sense that non-Gaussian distributions can be linearly decorrelated and higher orders of the correlation tensors are also taken into account. The basic structure of the architecture involves a general transformation that is volume conserving and therefore the entropy, yielding a map without loss of information. Minimization of the mutual information among the output neurons eliminates the redundancy between the outputs and results in statistical decorrelation of the extracted features. This is known as factorial learning.

Journal ArticleDOI
TL;DR: BaJkova's generalized maximum entropy method for reconstruction of complex signals is further generalized through the use of Kullback-Leibler cross entropy, which permits apriori information in the form of bias functions to be inserted into the algorithm, with resulting benefits to reconstruction quality.
Abstract: Bajkova's generalized maximum entropy method for reconstruction of complex signals is further generalized through the use of Kullback–Leibler cross entropy. This permits a priori information in the form of bias functions to be inserted into the algorithm, with resulting benefits to reconstruction quality. Also, the cross-entropy term is imbedded within an overall maximum a posteriori probability approach that includes a noise-rejection term. A further modification is transformation of the large two-dimensional problem arising from modest-sized two-dimensional images into a sequence of one-dimensional problems. Finally the added operation of three-point median window filtration of each intermediary one-dimensional output is shown to suppress edge-top overshoots while augmenting edge gradients. Applications to simulated complex images are shown.

Patent
Cheung Auyeung1
01 Dec 1994
TL;DR: In this paper, a method for adaptive entropy encoding/decoding of a plurality of quantised transform coefficients in a video/image compression system is presented, where a predetermined number of quantized transform coefficients are received in a predetermined order, giving a generally decreasing average power.
Abstract: The present invention is a method (100) and apparatus (300) for adaptive entropy encoding/decoding of a plurality of quantised transform coefficients in a video/image compression system. For encoding, first, a predetermined number of quantized transform coefficients are received in a predetermined order, giving a generally decreasing average power. Then the quantized transform coefficients are parsed into a plurality of coefficient groups. When the last coefficient group comprises all zero quantized coefficients, it is discarded. The coefficient groups are then converted into a plurality of parameter sets in the predetermined order. A current parameter set is obtained from the parameter sets in the reverse order of the predetermined order. A current entropy encoder is selected adaptively based on the previously selected entropy encoder and the previous parameter set. The current parameter set is encoded by the current entropy encoder to provide entropy encoded information bits.

Journal ArticleDOI
TL;DR: A population of self-replicating segments of code subject to random mutation and survival of the fittest is investigated and a number of statements on the evolution of complexity and the trade-off between entropy and information are obtained.
Abstract: We present a theoretical as well as experimental investigation of a population of self-replicating segments of code subject to random mutation and survival of the fittest. Under the assumption that such a system constitutes a minimal system with characteristics of life, we obtain a number of statements on the evolution of complexity and the trade-off between entropy and information.

Journal ArticleDOI
TL;DR: A general definition of fuzzification and level set, based on t-norms/conorms and their diagonal functions, is introduced and the chaos theorem of Benhabib and Day for set valued mappings is considerably strengthened and generalised.