scispace - formally typeset
Search or ask a question

Showing papers on "Entropy (information theory) published in 2001"


Proceedings ArticleDOI
14 May 2001
TL;DR: This work proposes to use several information-theoretic measures, namely, entropy, conditional entropy, relative conditional entropy; information gain, information gain; and information cost for anomaly detection for protection mechanisms against novel attacks.
Abstract: Anomaly detection is an essential component of protection mechanisms against novel attacks. We propose to use several information-theoretic measures, namely, entropy, conditional entropy, relative conditional entropy, information gain, and information cost for anomaly detection. These measures can be used to describe the characteristics of an audit data set, suggest the appropriate anomaly detection model(s) to be built, and explain the performance of the model(s). We use case studies on Unix system call data, BSM data, and network tcpdump data to illustrate the utilities of these measures.

627 citations


Journal ArticleDOI
TL;DR: A flexible 'cross entropy' (CE) approach to estimating a consistent SAM starting from inconsistent data estimated with error, which represents an efficient information processing rule-using only and all information available.
Abstract: The problem in estimating a social accounting matrix (SAM) for a recent year is to find an efficient and cost-effective way to incorporate and reconcile information from a variety of sources, including data from prior years. Based on information theory, the paper presents a flexible 'cross entropy' (CE) approach to estimating a consistent SAM starting from inconsistent data estimated with error, a common experience in many countries. The method represents an efficient information processing rule-using only and all information available. It allows incorporating errors in variables, inequality constraints, and prior knowledge about any part of the SAM. An example is presented, applying the CE approach to data from Mozambique, using a Monte Carlo approach to compare the CE approach to the standard RAS method and to evaluate the gains in precision from utilizing additional information.

513 citations


Journal ArticleDOI
TL;DR: In this paper, the authors define predictive information Ipred(T) as the mutual information between the past and the future of a time series, and show that the divergent part of the predictive information provides the unique measure for the complexity of the dynamics underlying the time series.
Abstract: We define predictive information Ipred(T) as the mutual information between the past and the future of a time series. Three qualitatively different behaviors are found in the limit of large observation times T:Ipred(T) can remain finite, grow logarithmically, or grow as a fractional power law. If the time series allows us to learn a model with a finite number of parameters, then Ipred(T) grows logarithmically with a coefficient that counts the dimensionality of the model space. In contrast, power-law growth is associated, for example, with the learning of infinite parameter (or nonparametric) models such as continuous functions with smoothness constraints. There are connections between the predictive information and measures of complexity that have been defined both in learning theory and the analysis of physical systems through statistical mechanics and dynamical systems theory. Furthermore, in the same way that entropy provides the unique measure of available information consistent with some simple and plausible conditions, we argue that the divergent part of Ipred(T) provides the unique measure for the complexity of dynamics underlying a time series. Finally, we discuss how these ideas may be useful in problems in physics, statistics, and biology.

485 citations


Journal ArticleDOI
Joshua T. Goodman1
TL;DR: The authors compare a combination of all of these techniques together to a Katz smoothed trigram model with no count cutoffs, achieving perplexity reductions between 38 and 50% depending on training data size, as well as a word error rate reduction of 8.9%.

443 citations


Proceedings Article
21 Nov 2001
TL;DR: This paper uses the theoretical basis provided by Information Theory to define a new measure, viewpoint entropy, that allows us to compute good viewing positions automatically and designs an algorithm that uses this measure to explore automatically objects or scenes.
Abstract: Computation of good viewpoints is important in several fields: computational geometry, visual servoing, robot motion, graph drawing, etc. In addition, selection of good views is rapidly becoming a key issue in computer graphics due to the new techniques of Image Based Rendering. Although there is no consensus about what a good view means in Computer Graphics, the quality of a viewpoint is intuitively related to how much information it gives us about a scene. In this paper we use the theoretical basis provided by Information Theory to define a new measure, viewpoint entropy, that allows us to compute good viewing positions automatically. We also show how it can be used to select a set of good views of a scene for scene understanding. Finally, we design an algorithm that uses this measure to explore automatically objects or scenes.

353 citations


Journal ArticleDOI
TL;DR: In this paper, the authors compute the rate at which the posterior distribution concentrates around the true parameter value, and show that the rate is driven by the size of the space, as measured by bracketing entropy, and the degree to which the prior concentrates in a small ball around the real parameter.
Abstract: We compute the rate at which the posterior distribution concentrates around the true parameter value. The spaces we work in are quite general and include in finite dimensional cases. The rates are driven by two quantities: the size of the space, as measured by bracketing entropy, and the degree to which the prior concentrates in a small ball around the true parameter. We consider two examples.

345 citations


Journal ArticleDOI
TL;DR: It is proved that for a tensor product of two unital stochastic maps on qubit states, using an entanglement that involves only states which emerge with minimal entropy cannot decrease the entropy below the minimum achievable using product states.
Abstract: We consider the minimal entropy of qubit states transmitted through two uses of a noisy quantum channel, which is modeled by the action of a completely positive trace-preserving (or stochastic) map. We provide strong support for the conjecture that this minimal entropy is additive, namely, that the minimum entropy can be achieved when product states are transmitted. Explicitly, we prove that for a tensor product of two unital stochastic maps on qubit states, using an entanglement that involves only states which emerge with minimal entropy cannot decrease the entropy below the minimum achievable using product states. We give a separate argument, based on the geometry of the image of the set of density matrices under stochastic maps, which suggests that the minimal entropy conjecture holds for nonunital as well as for unital maps. We also show that the maximal norm of the output states is multiplicative for most product maps on n-qubit states, including all those for which at least one map is unital. For the class of unital channels on C/sup 2/, we show that additivity of minimal entropy implies that the Holevo (see IEEE Trans. Inform. Theory, vol.44, p.269-73, 1998 and Russian Math. Surv., p.1295-1331, 1999) capacity of the channel is additive over two inputs, achievable with orthogonal states, and equal to the Shannon capacity. This implies that superadditivity of the capacity is possible only for nonunital channels.

339 citations


Journal ArticleDOI
01 Jun 2001
TL;DR: This paper presents an efficient fuzzy classifier with the ability of feature selection based on a fuzzy entropy measure and investigates the use of fuzzy entropy to select relevant features.
Abstract: This paper presents an efficient fuzzy classifier with the ability of feature selection based on a fuzzy entropy measure. Fuzzy entropy is employed to evaluate the information of pattern distribution in the pattern space. With this information, we can partition the pattern space into nonoverlapping decision regions for pattern classification. Since the decision regions do not overlap, both the complexity and computational load of the classifier are reduced and thus the training time and classification time are extremely short. Although the decision regions are partitioned into nonoverlapping subspaces, we can achieve good classification performance since the decision regions can be correctly determined via our proposed fuzzy entropy measure. In addition, we also investigate the use of fuzzy entropy to select relevant features. The feature selection procedure not only reduces the dimensionality of a problem but also discards noise-corrupted, redundant and unimportant features. Finally, we apply the proposed classifier to the Iris database and Wisconsin breast cancer database to evaluate the classification performance. Both of the results show that the proposed classifier can work well for the pattern classification application.

298 citations


Proceedings ArticleDOI
11 Jun 2001
TL;DR: The entropy rate of a finite-state hidden Markov model can be estimated by forward sum-product trellis processing of simulated model output data to compute information rates of binary-input AWGN channels with memory.
Abstract: The entropy rate of a finite-state hidden Markov model can be estimated by forward sum-product trellis processing (i.e., the forward recursion of the Baum-Welch/BCJR algorithm) of simulated model output data. This can be used to compute information rates of binary-input AWGN channels with memory.

260 citations


Journal ArticleDOI
TL;DR: For additive functions satisfying mild conditions (including the cases of the mean, the entropy, and mutual information), the plug-in estimates of F are universally consistent as mentioned in this paper, and without further assumptions, no rate-of-convergence results can be obtained for any sequence of estimators.
Abstract: Suppose P is an arbitrary discrete distribution on acountable alphabet . Given an i.i.d. sample (X1,…,Xn) drawnfrom P, we consider the problem of estimating the entropy H(P) or some other functional F=F(P) of the unknown distribution P. We show that, for additive functionals satisfying mild conditions (including the cases of the mean, the entropy, and mutual information), the plug-in estimates of F are universally consistent. We also prove that, without further assumptions, no rate-of-convergence results can be obtained for any sequence of estimators. In the case of entropy estimation, under a variety of different assumptions, we get rate-of-convergence results for the plug-in estimate and for a nonparametric estimator based on match-lengths. The behavior of the variance and the expected error of the plug-in estimate is shown to be in sharp contrast to the finite-alphabet case. A number of other important examples of functionals are also treated in some detail. © 2001 John Wiley & Sons, Inc. Random Struct. Alg., 19: 163–193, 2001

255 citations


Proceedings Article
03 Jan 2001
TL;DR: This work studies properties of popular near–uniform (Dirichlet) priors for learning undersampled probability distributions on discrete nonmetric spaces and finds a surprisingly good estimator of entropies of discrete distributions.
Abstract: We study properties of popular near–uniform (Dirichlet) priors for learning undersampled probability distributions on discrete nonmetric spaces and show that they lead to disastrous results. However, an Occam–style phase space argument expands the priors into their infinite mixture and resolves most of the observed problems. This leads to a surprisingly good estimator of entropies of discrete distributions.

Journal ArticleDOI
TL;DR: A generalized entropy criterion for solving the rational Nevanlinna-Pick problem for n+1 interpolating conditions and the degree of interpolants bounded by n is presented, which requires a selection of a monic Schur polynomial of degree n.
Abstract: We present a generalized entropy criterion for solving the rational Nevanlinna-Pick problem for n+1 interpolating conditions and the degree of interpolants bounded by n. The primal problem of maximizing this entropy gain has a very well-behaved dual problem. This dual is a convex optimization problem in a finite-dimensional space and gives rise to an algorithm for finding all interpolants which are positive real and rational of degree at most n. The criterion requires a selection of a monic Schur polynomial of degree n. It follows that this class of monic polynomials completely parameterizes all such rational interpolants, and it therefore provides a set of design parameters for specifying such interpolants. The algorithm is implemented in a state-space form and applied to several illustrative problems in systems and control, namely sensitivity minimization, maximal power transfer and spectral estimation.

Proceedings ArticleDOI
28 May 2001
TL;DR: A simple example of pheromone-based coordination is constructed, a way to measure the Shannon entropy at the macro (agent) and micro (pheromone) levels is defined, and an entropybased view of the coordination is exhibited.
Abstract: Emergent self-organization in multi-agent systems appears to contradict the second law of thermodynamics. This paradox has been explained in terms of a coupling between the macro level that hosts self-organization (and an apparent reduction in entropy), and the micro level (where random processes greatly increase entropy). Metaphorically, the micro level serves as an entropy “sink”, permitting overall system entropy to increase while sequestering this increase from the interactions where selforganization is desired. We make this metaphor precise by constructing a simple example of pheromone-based coordination, defining a way to measure the Shannon entropy at the macro (agent) and micro (pheromone) levels, and exhibiting an entropybased view of the coordination.

Patent
10 Aug 2001
TL;DR: In this paper, a method of order-ranking document clusters using entropy data and Bayesian self-organizing feature maps (SOM) is provided in which an accuracy of information retrieval is improved by adopting Bayesian SOM for performing a real-time document clustering for relevant documents.
Abstract: A method of order-ranking document clusters using entropy data and Bayesian self-organizing feature maps(SOM) is provided in which an accuracy of information retrieval is improved by adopting Bayesian SOM for performing a real-time document clustering for relevant documents in accordance with a degree of semantic similarity between entropy data extracted using entropy value and user profiles and query words given by a user, wherein the Bayesian SOM is a combination of Bayesian statistical technique and Kohonen network that is a type of an unsupervised learning.

Journal ArticleDOI
TL;DR: A classification of second-law based performance evaluation criteria to evaluate the performance of heat exchangers and emphasis is placed on the importance of second law-based thermoeconomic analysis ofheat exchangers.

Journal ArticleDOI
TL;DR: This work describes a form of the recently described 'island' method in detail, and uses it to investigate the functional dependence of these parameters on finite-length edge effects.
Abstract: The distribution of optimal local alignment scores of random sequences plays a vital role in evaluating the statistical significance of sequence alignments. These scores can be well described by an extreme-value distribution. The distribution's parameters depend upon the scoring system employed and the random letter frequencies; in general they cannot be derived analytically, but must be estimated by curve fitting. For obtaining accurate parameter estimates, a form of the recently described 'island' method has several advantages. We describe this method in detail, and use it to investigate the functional dependence of these parameters on finite-length edge effects.

Journal ArticleDOI
TL;DR: The Shannon entropy seems to be a useful electroencephalographic measure of anesthetic drug effect, as it increased continuously over the observed concentration range of desflurane.
Abstract: BackgroundThe Shannon entropy is a standard measure for the order state of sequences. It quantifies the degree of skew of the distribution of values. Increasing hypnotic drug concentrations increase electroencephalographic amplitude. The probability density function of the amplitude values broadens

Journal ArticleDOI
30 Sep 2001-Entropy
TL;DR: The present paper offers a self-contained and comprehensive treatment of fundamentals of both principles, especially regarding the study of continuity properties of the entropy function, and this leads to new results which allow a discussion of models with so-called entropy loss.
Abstract: In its modern formulation, the Maximum Entropy Principle was promoted by E.T. Jaynes, starting in the mid-fifties. The principle dictates that one should look for a distribution, consistent with available information, which maximizes the entropy. However, this principle focuses only on distributions and it appears advantageous to bring information theoretical thinking more prominently into play by also focusing on the "observer" and on coding. This view was brought forward by the second named author in the late seventies and is the view we will follow-up on here. It leads to the consideration of a certain game, the Code Length Game and, via standard game theoretical thinking, to a principle of Game Theoretical Equilibrium. This principle is more basic than the Maximum Entropy Principle in the sense that the search for one type of optimal strategies in the Code Length Game translates directly into the search for distributions with maximum entropy. In the present paper we offer a self-contained and comprehensive treatment of fundamentals of both principles mentioned, based on a study of the Code Length Game. Though new concepts and results are presented, the reading should be instructional and accessible to a rather wide audience, at least if certain mathematical details are left aside at a rst reading. The most frequently studied instance of entropy maximization pertains to the Mean Energy Model which involves a moment constraint related to a given function, here taken to represent "energy". This type of application is very well known from the literature with hundreds of applications pertaining to several different elds and will also here serve as important illustration of the theory. But our approach reaches further, especially regarding the study of continuity properties of the entropy function, and this leads to new results which allow a discussion of models with so-called entropy loss. These results have tempted us to speculate over the development of natural languages. In fact, we are able to relate our theoretical findings to the empirically found Zipf's law which involves statistical aspects of words in a language. The apparent irregularity inherent in models with entropy loss turns out to imply desirable stability properties of languages.

Journal ArticleDOI
TL;DR: The relationship between a probability partition and a fuzzy c-partition in thresholding is given and this relationship and the entropy approach are used to derive a thresholding technique to select the best fuzzy c -partition.
Abstract: Thresholding is a commonly used technique in image segmentation. Selecting the correct thresholds is a critical issue. In this paper, the relationship between a probability partition (PP) and a fuzzy c-partition (FP) in thresholding is given. This relationship and the entropy approach are used to derive a thresholding technique to select the best fuzzy c-partition. The measure of the selection quality is the compatibility between the FP and the PP generated by the problem. An entropy function defined by the PP and FP is used to measure the compatibility. A necessary condition of the entropy function arriving at a maximum is derived. Based on this condition, an efficient algorithm for three-level thresholding is deduced. Experiments to verify the efficiency of the proposed method and comparison to some existing techniques are also presented. The experiment results show that our proposed method gives the best performance in three-level thresholding using fuzzy c-partition.

Proceedings ArticleDOI
06 Jul 2001
TL;DR: An algorithm is presented for learning a phrase-structure grammar from tagged text that clusters sequences of tags together based on local distributional information, and selects clusters that satisfy a novel mutual information criterion.
Abstract: An algorithm is presented for learning a phrase-structure grammar from tagged text. It clusters sequences of tags together based on local distributional information, and selects clusters that satisfy a novel mutual information criterion. This criterion is shown to be related to the entropy of a random variable associated with the tree structures, and it is demonstrated that it selects linguistically plausible constituents. This is incorporated in a Minimum Description Length algorithm. The evaluation of unsupervised models is discussed, and results are presented when the algorithm has been trained on 12 million words of the British National Corpus.

Journal ArticleDOI
TL;DR: A self-organizing mixture network (SOMN) derived for learning arbitrary density functions that minimizes the Kullback-Leibler information metric by means of stochastic approximation methods and can provide an insight to the role of neighborhood function used in the SOM.
Abstract: A self-organizing mixture network (SOMN) is derived for learning arbitrary density functions. The network minimizes the Kullback-Leibler information metric by means of stochastic approximation methods. The density functions are modeled as mixtures of parametric distributions. A mixture needs not to be homogenous, i.e., it can have different density profiles. The first layer of the network is similar to Kohonen's self-organizing map (SOM), but with the parameters of the component densities as the learning weights. The winning mechanism is based on maximum posterior probability, and updating of the weights is limited to a small neighborhood around the winner. The second layer accumulates the responses of these local nodes, weighted by the learned mixing parameters. The network possesses a simple structure and computational form, yet yields fast and robust convergence. The network has a generalization ability due to the relative entropy criterion used. Applications to density profile estimation and pattern classification are presented. The SOMN can also provide an insight to the role of neighborhood function used in the SOM.

Journal ArticleDOI
TL;DR: The Arimoto-Blahut algorithm, generalized for cost constraints, can be used to derive and interpret the distribution of symbols for optimal energy-efficient coding in the presence of noise, and the possibilities and problems are outlined.
Abstract: Energy-efficient information transmission may be relevant to biological sensory signal processing as well as to low-power electronic devices. We explore its consequences in two different regimes. In an "immediate" regime, we argue that the information rate should be maximized subject to a power constraint, and in an "exploratory" regime, the transmission rate per power cost should be maximized. In the absence of noise, discrete inputs are optimally encoded into Boltzmann distributed output symbols. In the exploratory regime, the partition function of this distribution is numerically equal to 1. The structure of the optimal code is strongly affected by noise in the transmission channel. The Arimoto-Blahut algorithm, generalized for cost constraints, can be used to derive and interpret the distribution of symbols for optimal energy-efficient coding in the presence of noise. We outline the possibilities and problems in extending our results to information coding and transmission in neurobiological systems.

Proceedings Article
01 Jan 2001
TL;DR: LTS1 Reference LTS-CONF-2001-039 Record created on 2006-06-14, modified on 2016-08-08.
Abstract: Keywords: LTS1 Reference LTS-CONF-2001-039 Record created on 2006-06-14, modified on 2016-08-08

Journal ArticleDOI
TL;DR: An analysis is given of three major representations of tries in the form of array-tries, list tries, and bst-Tries (“ternary search tries”) to determine the probabilistic behaviour of the main parameters.
Abstract: Digital trees, also known as tries, are a general purpose flexible data structure that implements dictionaries built on sets of words. An analysis is given of three major representations of tries in the form of array-tries, list tries, and bst-tries (“ternary search tries”). The size and the search costs of the corresponding representations are analysed precisely in the average case, while a complete distributional analysis of the height of tries is given. The unifying data model used is that of dynamical sources and it encompasses classical models like those of memoryless sources with independent symbols, of finite Markov chains, and of nonuniform densities. The probabilistic behaviour of the main parameters, namely, size, path length, or height, appears to be determined by two intrinsic characteristics of the source: the entropy and the probability of letter coincidence. These characteristics are themselves related in a natural way to spectral properties of specific transfer operators of the Ruelle type.

Proceedings ArticleDOI
10 Mar 2001
TL;DR: An existing minimum variance estimation algorithm for out-of-sequence processing of sensor measurements is built on, extending the algorithm to handle multiple lags and multiple dynamic models and establishes a connection between the maximum entropy of a partially known multivariable Gaussian distribution and a particular Bayesian network.
Abstract: Two key challenges associated with fusion of information in large-scale systems are the asynchronous nature of information flow and the consistency requirements associated with decentralized processing. This paper provides contributions in both these areas. First, we build on an existing minimum variance estimation algorithm for out-of-sequence processing of sensor measurements, extending the algorithm to handle multiple lags and multiple dynamic models. We study the performance of the algorithms with numerical examples. Second, we establish a connection between the maximum entropy of a partially known multivariable Gaussian distribution and a particular Bayesian network, whose structure is based on the available information. The connection leads to a useful methodology for identifying missing information in systems described by Bayesian networks, a key tool in developing algorithms for information flow in decentralized systems.

Proceedings ArticleDOI
24 Jun 2001
TL;DR: New entropy and mutual information formulae for regenerative stochastic processes are obtained and tighter bounds on capacity and better algorithms are obtained than in Goldsmith and Varaiya.
Abstract: We obtain new entropy and mutual information formulae for regenerative stochastic processes. We use them on Markov channels to generalize the results in Goldsmith and Varaiya (1996). Also we obtain tighter bounds on capacity and better algorithms than in Goldsmith and Varaiya.

Journal ArticleDOI
TL;DR: This work shows how entropy densities may be constructed in a numerically efficient way as the minimization of a potential as well as characterize the skewness-Kurtosis domain for which densities are defined.
Abstract: The entropy principle yields, for a given set of moments, a density that involves the smallest amount of prior information,. We first show how entropy densities may be constructed in a numerically efficient way as the minimization of a potential. Next, for the case where the first four moments are given, we characterize the skewness-kurtosis domain for which densities are defined. This domain is found to be much larger than for Hermite or Edgeworth expansions. Last, we show how this technique can be used to estimate a GARCH model where skewness and kurtosis are time varying.

Journal ArticleDOI
TL;DR: A new Genetic Algorithm to optimize multimodal continuous functions is proposed, based on a splitting of the traditional GA into a sequence of three processes, which determine the best point s* among the best solutions issued from each of the preceding subpopulations.
Abstract: In this paper a new Genetic Algorithm (GA) to optimize multimodal continuous functions is proposed. It is based on a splitting of the traditional GA into a sequence of three processes. The first process creates several appropriate sub-populations using the information entropy theory. The second process applies the genetic operators (selection, crossover and mutation) on every subpopulation that is so gradually enriched with better individuals. We then determine the best point s* among the best solutions issued from each of the preceding subpopulations. In the neighbourhood of this point s* is generated a population used to initialize a traditional GA in the third process. Inthis last process, the population is entirely renewed after each generation, the new population being generated in the neighborhood of the best point found. The neighborhood size is decreased after each generation. A detailedcomparison of performances with several stochastic global search methods is presented, using test functions of which local and global minima are known.

Journal ArticleDOI
Thomas Kühn1
TL;DR: The behaviour of the entropy numbers e"k(id:l^ n"p->l^n"q), 0l+1/k)>=c(log(n/k+1)/k)^1/p^-^1^/^q for all 00 depending only on p.

Journal ArticleDOI
TL;DR: The extraction of feature information from biofilm images benefits from automatic thresholding and can be extended to other fields, such as medical imaging.