scispace - formally typeset
Search or ask a question

Showing papers on "Entropy (information theory) published in 2002"


Journal ArticleDOI
TL;DR: An unsupervised feature selection algorithm suitable for data sets, large in both dimension and size, based on measuring similarity between features whereby redundancy therein is removed, which does not need any search and is fast.
Abstract: In this article, we describe an unsupervised feature selection algorithm suitable for data sets, large in both dimension and size. The method is based on measuring similarity between features whereby redundancy therein is removed. This does not need any search and, therefore, is fast. A new feature similarity measure, called maximum information compression index, is introduced. The algorithm is generic in nature and has the capability of multiscale representation of data sets. The superiority of the algorithm, in terms of speed and performance, is established extensively over various real-life data sets of different sizes and dimensions. It is also demonstrated how redundancy and information loss in feature selection can be quantified with an entropy measure.

1,432 citations


Journal ArticleDOI
TL;DR: Results show that entropy falls before clinical signs of neonatal sepsis and that missing points are well tolerated, and proposes more informed selection of parameters and reexamination of studies where approximate entropy was interpreted solely as a regularity measure.
Abstract: Abnormal heart rate characteristics of reduced variability and transient decelerations are present early in the course of neonatal sepsis. To investigate the dynamics, we calculated sample entropy, a similar but less biased measure than the popular approximate entropy. Both calculate the probability that epochs of window length m that are similar within a tolerance r remain similar at the next point. We studied 89 consecutive admissions to a tertiary care neonatal intensive care unit, among whom there were 21 episodes of sepsis, and we performed numerical simulations. We addressed the fundamental issues of optimal selection of m and r and the impact of missing data. The major findings are that entropy falls before clinical signs of neonatal sepsis and that missing points are well tolerated. The major mechanism, surprisingly, is unrelated to the regularity of the data: entropy estimates inevitably fall in any record with spikes. We propose more informed selection of parameters and reexamination of studies where approximate entropy was interpreted solely as a regularity measure.

1,151 citations


Book
01 Jan 2002
TL;DR: This book provides the first comprehensive treatment of the theory of I-Measure, network coding theory, Shannon and non-Shannon type information inequalities, and a relation between entropy and group theory.
Abstract: This book provides an up-to-date introduction to information theory. In addition to the classical topics discussed, it provides the first comprehensive treatment of the theory of I-Measure, network coding theory, Shannon and non-Shannon type information inequalities, and a relation between entropy and group theory. ITIP, a software package for proving information inequalities, is also included. With a large number of examples, illustrations, and original problems, this book is excellent as a textbook or reference book for a senior or graduate level course on the subject, as well as a reference for researchers in related fields.

543 citations


Proceedings ArticleDOI
04 Nov 2002
TL;DR: The connection between clustering categorical data and entropy is explored: clusters of similar poi lower entropy than those of dissimilar ones, and an incremental heuristic algorithm, COOLCAT, which is capable of efficiently clustering large data sets of records with categorical attributes, and data streams.
Abstract: In this paper we explore the connection between clustering categorical data and entropy: clusters of similar poi lower entropy than those of dissimilar ones. We use this connection to design an incremental heuristic algorithm, COOLCAT, which is capable of efficiently clustering large data sets of records with categorical attributes, and data streams. In contrast with other categorical clustering algorithms published in the past, COOLCAT's clustering results are very stable for different sample sizes and parameter settings. Also, the criteria for clustering is a very intuitive one, since it is deeply rooted on the well-known notion of entropy. Most importantly, COOLCAT is well equipped to deal with clustering of data streams(continuously arriving streams of data point) since it is an incremental algorithm capable of clustering new points without having to look at every point that has been clustered so far. We demonstrate the efficiency and scalability of COOLCAT by a series of experiments on real and synthetic data sets.

418 citations


Journal ArticleDOI
TL;DR: In this paper, an efficient hybrid compact-WENO scheme is proposed to obtain high resolution in shock-turbulence interaction problems, which is based on a fifth-order compact upwind algorithm in conservation form to solve for the smooth part of the flow field.

403 citations


Journal ArticleDOI
TL;DR: Different aspects of the predictability problem in dynamical systems are reviewed, with emphasis on how a characterization of the unpredictability of a system gives a measure of its complexity.

353 citations


Journal ArticleDOI
TL;DR: It is shown that the global minimum of this nonparametric estimator for Renyi's entropy is the same as the actual entropy, and the performance of the error-entropy-minimization criterion is compared with mean-square-error- Minimization in the short-term prediction of a chaotic time series and in nonlinear system identification.
Abstract: The paper investigates error-entropy-minimization in adaptive systems training. We prove the equivalence between minimization of error's Renyi (1970) entropy of order /spl alpha/ and minimization of a Csiszar (1981) distance measure between the densities of desired and system outputs. A nonparametric estimator for Renyi's entropy is presented, and it is shown that the global minimum of this estimator is the same as the actual entropy. The performance of the error-entropy-minimization criterion is compared with mean-square-error-minimization in the short-term prediction of a chaotic time series and in nonlinear system identification.

327 citations


Journal ArticleDOI
TL;DR: This article presents applications of entropic spanning graphs to imaging and feature clustering applications, naturally suited to applications where entropy and information divergence are used as discriminants.
Abstract: This article presents applications of entropic spanning graphs to imaging and feature clustering applications. Entropic spanning graphs span a set of feature vectors in such a way that the normalized spanning length of the graph converges to the entropy of the feature distribution as the number of random feature vectors increases. This property makes these graphs naturally suited to applications where entropy and information divergence are used as discriminants: texture classification, feature clustering, image indexing, and image registration. Among other areas, these problems arise in geographical information systems, digital libraries, medical information processing, video indexing, multisensor fusion, and content-based retrieval.

313 citations


Journal ArticleDOI
TL;DR: Based on the complement behavior of information gain, a new definition of information entropy is proposed along with its justification in rough set theory and it is proved to also be a fuzzy entropy.
Abstract: Based on the complement behavior of information gain, a new definition of information entropy is proposed along with its justification in rough set theory. Some properties of this definition imply those of Shannon's entropy. Based on the new information entropy, conditional entropy and mutual information are then introduced and applied to knowledge bases. The new information entropy is proved to also be a fuzzy entropy.

299 citations


Journal ArticleDOI
TL;DR: In this article, the authors examined the predictability of stock market returns by employing a new metric entropy measure of dependence with several desirable properties, including the ability to detect nonlinear dependence within the returns series, and the ability of detecting nonlinear "affinity" between the returns and their predictions obtained from various models.

291 citations


Journal ArticleDOI
TL;DR: A segmentation method is presented that is able to segment a nonstationary symbolic sequence into stationary subsequences, and is applied to DNA sequences, which are known to be non stationary on a wide range of different length scales.
Abstract: We study statistical properties of the Jensen-Shannon divergence D, which quantifies the difference between probability distributions, and which has been widely applied to analyses of symbolic sequences. We present three interpretations of D in the framework of statistical physics, information theory, and mathematical statistics, and obtain approximations of the mean, the variance, and the probability distribution of D in random, uncorrelated sequences. We present a segmentation method based on D that is able to segment a nonstationary symbolic sequence into stationary subsequences, and apply this method to DNA sequences, which are known to be nonstationary on a wide range of different length scales.

Journal Article
TL;DR: The Approximate Entropy Reduction Principle (AERP) is introduced and NP-hardness of optimization tasks concerning application of various modifications of AERP to data analysis is shown.
Abstract: We use information entropy measure to extend the rough set based notion of a reduct. We introduce the Approximate Entropy Reduction Principle (AERP). It states that any simplification (reduction of attributes) in the decision model, which approximately preserves its conditional entropy (the measure of inconsistency of defining decision by conditional attributes) should be performed to decrease its prior entropy (the measure of the model's complexity). We show NP-hardness of optimization tasks concerning application of various modifications of AERP to data analysis.

Proceedings ArticleDOI
19 May 2002
TL;DR: The introduction and initial study of randomness conductors, a notion which generalizes extractors, expanders, condensers and other similar objects, is introduced and it is shown that the flexibility afforded by the conductor definition leads to interesting combinations of these objects, and to better constructions such as those above.
Abstract: The main concrete result of this paper is the first explicit construction of constant degree lossless expanders. In these graphs, the expansion factor is almost as large as possible: (1—e)D, where D is the degree and e is an arbitrarily small constant. The best previous explicit constructions gave expansion factor D/2, which is too weak for many applications. The D/2 bound was obtained via the eigenvalue method, and is known that that method cannot give better bounds.The main abstract contribution of this paper is the introduction and initial study of randomness conductors, a notion which generalizes extractors, expanders, condensers and other similar objects. In all these functions, certain guarantee on the input "entropy" is converted to a guarantee on the output "entropy". For historical reasons, specific objects used specific guarantees of different flavors. We show that the flexibility afforded by the conductor definition leads to interesting combinations of these objects, and to better constructions such as those above.The main technical tool in these constructions is a natural generalization to conductors of the zig-zag graph product, previously defined for expanders and extractors.

Journal ArticleDOI
TL;DR: A generalization of the error entropy criterion that enables the use of any order of Renyi's entropy and any suitable kernel function in density estimation is proposed and shown that the proposed entropy estimator preserves the global minimum of actual entropy.
Abstract: We have previously proposed the quadratic Renyi's error entropy as an alternative cost function for supervised adaptive system training. An entropy criterion instructs the minimization of the average information content of the error signal rather than merely trying to minimize its energy. In this paper, we propose a generalization of the error entropy criterion that enables the use of any order of Renyi's entropy and any suitable kernel function in density estimation. It is shown that the proposed entropy estimator preserves the global minimum of actual entropy. The equivalence between global optimization by convolution smoothing and the convolution by the kernel in Parzen windowing is also discussed. Simulation results are presented for time-series prediction and classification where experimental demonstration of all the theoretical concepts is presented.

Journal ArticleDOI
TL;DR: By building and maintaining a dictionary of individual user's path updates, the proposed adaptive on-line algorithm can learn subscribers' profiles and compressibility of the variable-to-fixed length encoding of the acclaimed Lempel–Ziv family of algorithms reduces the update cost.
Abstract: The complexity of the mobility tracking problem in a cellular environment has been characterized under an information-theoretic framework. Shannon's entropy measure is identified as a basis for comparing user mobility models. By building and maintaining a dictionary of individual user's path updates (as opposed to the widely used location updates), the proposed adaptive on-line algorithm can learn subscribers' profiles. This technique evolves out of the concepts of lossless compression. The compressibility of the variable-to-fixed length encoding of the acclaimed Lempel---Ziv family of algorithms reduces the update cost, whereas their built-in predictive power can be effectively used to reduce paging cost.

Journal ArticleDOI
TL;DR: A new method to minimize the closed loop randomness for general dynamic stochastic systems using the entropy concept is presented, and it is shown that this minimum entropy control concept generates a minimum variance control when the Stochastic system is represented by an ARMAX model which is subjected to Gaussian noises.
Abstract: This paper presents a new method to minimize the closed loop randomness for general dynamic stochastic systems using the entropy concept. The system is assumed to be subjected to any bounded random inputs. Using the recently developed linear B-spline model for the shape control of the system output probability density function, a control input is formulated which minimizes the output entropy of the closed-loop system. Since the entropy is the measure of randomness for a given random variable, this controller can thus reduce the uncertainty of the closed-loop system. A sufficient condition is established to guarantee the local stability of the closed-loop system. It is shown that this minimum entropy control concept generates a minimum variance control when the stochastic system is represented by an ARMAX model which is subjected to Gaussian noises. An illustrative example is utilized to demonstrate the use of the control algorithm, and satisfactory results are obtained.

Proceedings ArticleDOI
06 Jul 2002
TL;DR: A constancy rate principle governing language generation implies that local measures of entropy (ignoring context) should increase with the sentence number, and it is demonstrated that this is indeed the case by measuring entropy in three different ways.
Abstract: We present a constancy rate principle governing language generation. We show that this principle implies that local measures of entropy (ignoring context) should increase with the sentence number. We demonstrate that this is indeed the case by measuring entropy in three different ways. We also show that this effect has both lexical (which words are used) and non-lexical (how the words are used) causes.

Journal ArticleDOI
TL;DR: A heuristic algorithm based on rough entropy for knowledge reduction is proposed in incomplete information systems, the time complexity of this algorithm is O(|A|2|U|).
Abstract: Rough set theory is emerging as a powerful tool for reasoning about data, knowledge reduction is one of the important topics in the research on rough set theory. It has been proven that finding the minimal reduct of an information system is a NP-hard problem, so is finding the minimal reduct of an incomplete information system. Main reason of causing NP-hard is combination problem of attributes. In this paper, knowledge reduction is defined from the view of information, a heuristic algorithm based on rough entropy for knowledge reduction is proposed in incomplete information systems, the time complexity of this algorithm is O(|A|2|U|). An illustrative example is provided that shows the application potential of the algorithm.

Journal ArticleDOI
TL;DR: It is shown that the DEA determines the correct scaling exponent even when the statistical properties, as well as the dynamic properties, are anomalous, which means that all of them can be safely applied only to the case where ordinary statistical properties hold true even if strange kinetics are involved.
Abstract: The methods currently used to determine the scaling exponent of a complex dynamic process described by a time series are based on the numerical evaluation of variance This means that all of them can be safely applied only to the case where ordinary statistical properties hold true even if strange kinetics are involved We illustrate a method of statistical analysis based on the Shannon entropy of the diffusion process generated by the time series, called diffusion entropy analysis (DEA) We adopt artificial Gauss and Levy time series, as prototypes of ordinary and anomalous statistics, respectively, and we analyze them with the DEA and four ordinary methods of analysis, some of which are very popular We show that the DEA determines the correct scaling exponent even when the statistical properties, as well as the dynamic properties, are anomalous The other four methods produce correct results in the Gauss case but fail to detect the correct scaling in the case of Levy statistics

Journal ArticleDOI
TL;DR: In this article, an attribute-based approach is proposed to predict consumers' perceptions of assortment variety, which is based on the marginal and joint distributions of the attributes of the products in an assortment.
Abstract: In recent years, interest in category management has surged, and as a consequence, large retailers now systematically review their product assortments. Variety is a key property of assortments. Assortment variety can determine consumers' store choice and is only gaining in importance with today's increasing numbers of product offerings. To support retailers in managing their assortments, insight is needed into the influence of assortment composition on consumers' variety perceptions, and appropriate measures of assortment variety are required. This paper aims to extend the assortment variety model recently proposed by Hoch et al. (1999) in Marketing Science. It conceptualizes assortment variety from an attribute-based perspective and compares this with the productbased approach of Hoch, Bradlow, and Wansink (HBW). The attribute- based approach offers an alternative viewpoint for assortment variety. Attribute- and product-based approaches reflect basic conceptualizations of assortment variety that assume substantially different perception processes: a consumer comparing products one-by-one versus a consumer examining attributes across products in the assortment. While the product-based approach focuses on the dissimilarity between product pairs in an assortment, the attribute-based approach that we propose focuses on the marginal and joint distributions of the attributes. We conjecture and aim to show that an attribute-based approach suffices to predict consumers' perceptions of assortment variety.In operationalizing the attribute-based approach, two measures of assortment variety are described and compared to productbased measures. These two measures relate to the dispersion of attribute levels, e.g., if all products have the same color or different colors, and the dissociation between attributes, e.g., if product color and size are unrelated. The ability of product-based and attributed- based measures to predict consumers' perceptions of assortment variety is assessed. The product-based measures ( Hamming ) tap the dissimilarity of products in an assortment across attributes. The attribute-based measures tap the dispersion of attribute levels across products ( Entropy) and the dissociation between product attributes (1--Lambda ) in an assortment. In two studies, we examine the correlations between these measures in a well-behaved environment (study 1) and the predictive validity of the measures for perceived variety in a consumer experiment (study 2).Study 1, using synthetic data, shows that the attribute-based measures tap specific aspects of assortment variety and that the attribute-based measures are less sensitive to the size of assortments than product-based measures are. Whereas HBW focus on assortments of equal size, study 1 indicates that an extension to assortments of unequal size results in summed Hamming measures that correlate highly with assortment size. The latter is important when assortments of different size are compared. Next, we examine how well the measures capture consumers' perception of variety.Study 2, a consumer experiment, shows that the attribute-based measures account best for consumers' perceptions of variety. Attribute- based measures significantly add to the prediction of consumers' perceptions of variety, over and above the product-based measures, while the reverse is not the case. Interestingly, this study also indicates that assortment size may not be a good proxy for perceived assortment variety.The findings illustrate the value of an attribute-based conceptualization of assortment variety, since these measures (1) correlate only moderately with assortment size and (2) suffice to predict consumers' perceptions of assortment variety. In the final section we briefly discuss how attribute-based and product-based measures can be used in assortment management, and when productand attribute-based approaches may predict consumers' variety perceptions. We discuss how an attribute-based approach can identify which attribute levels and attribute combinations influence consumers' perceptions of variety most, while a productbased approach can identify influential products. Both approaches have applications in specific situations. For instance, an attributebased approach can identify influential attributes in an ordered, simultaneous presentation of products, while a product-based approach can assess the impact of sequential presentations of products better. In addition, we indicate how the random-intercept model estimated in study 2 can be further extended to capture the influence of, e.g., consumer characteristics.

Journal ArticleDOI
TL;DR: In this article, a sufficient criterion for a martingale measure for a locally bounded semimartingale to minimize relative entropy among all martingales is given, under the assumption that X is continuous and that the density process of some equivalent measure satisfies a reverse ρ-small -inequality.
Abstract: Let X be a locally bounded semimartingale. Using the theory of \textit{BMO}-martingales we give a sufficient criterion for a martingale measure for X to minimize relative entropy among all martingale measures. This is applied to prove convergence of the q-optimal martingale measure to the minimal entropy martingale measure in entropy for $q\downarrow 1$ under the assumption that X is continuous and that the density process of some equivalent martingale measure satisfies a reverse $\mathit{LLogL}$\small -inequality.

Journal ArticleDOI
TL;DR: In this article, the first four moments are given, and the skewness-kurtosis domain for which densities are defined is found to be much larger than for Hermite or Edgeworth expansions.

Journal ArticleDOI
TL;DR: An explicit formula is derived for the power spectrum with maximal entropy, and a linear fraction parametrization of all solutions is provided, which is consistent with a given state covariance of a linear filter.
Abstract: Input spectra, which are consistent with a given state covariance of a linear filter, correspond to solutions of an analytic interpolation problem. We derive an explicit formula for the power spectrum with maximal entropy, and provide a linear fraction parametrization of all solutions.

Journal ArticleDOI
Amos Golan1
TL;DR: Amos Golan Department of Economics, American University, Roper 200, 4400 Massachusetts Ave., NW, Washington, DC 20016, USA as mentioned in this paper has published a paper on the Golan Decision Process.

Book ChapterDOI
Kenji Abe1, Shinji Kawasoe1, Tatsuya Asai1, Hiroki Arimura1, Setsuo Arikawa1 
19 Aug 2002
TL;DR: An efficient algorithm is presented that discovers the best labeled ordered trees that optimize a given statistical measure, such as the information entropy and the classification accuracy, in a collection of semi-structured data.
Abstract: In this paper, we consider the problem of discovering interesting substructures from a large collection of semi-structured data in the framework of optimized pattern discovery. We model semi-structured data and patterns with labeled ordered trees, and present an efficient algorithm that discovers the best labeled ordered trees that optimize a given statistical measure, such as the information entropy and the classification accuracy, in a collection of semi-structured data. We give theoretical analyses of the computational complexity of the algorithm for patterns with bounded and unbounded size. Experiments show that the algorithm performs well and discovered interesting patterns on real datasets.

Journal ArticleDOI
TL;DR: A connectivity encoding method which extends valence enumeration to 2-manifold meshes consisting of faces with arbitrary degree and achieves the Tutte entropy bound for arbitrary planar graphs of two bits per edge in the worst case is introduced.
Abstract: Encoders for triangle mesh connectivity based on enumeration of vertex valences are among the best reported to date They are both simple to implement and report the best compressed file sizes for a large corpus of test models Additionally they have recently been shown to be near-optimal since they realize the Tutte entropy bound for all planar triangulations In this paper we introduce a connectivity encoding method which extends these ideas to 2-manifold meshes consisting of faces with arbitrary degree The encoding algorithm exploits duality by applying valence enumeration to both the primal and the dual mesh in a symmetric fashion It generates two sequences of symbols, vertex valences, and face degrees, and encodes them separately using two context-based arithmetic coders This allows us to exploit vertex or face regularity if present When the mesh exhibits perfect face regularity (eg, a pure triangle or quad mesh) or perfect vertex regularity (valence six or four respectively) the corresponding bit rate vanishes to zero asymptotically For triangle meshes, our technique is equivalent to earlier valence-driven approaches We report compression results for a corpus of standard meshes In all cases we are able to show coding gains over earlier coders, sometimes as large as 50% Remarkably, we even slightly gain over coders specialized to triangle or quad meshes A theoretical analysis reveals that our approach is near-optimal as we achieve the Tutte entropy bound for arbitrary planar graphs of two bits per edge in the worst case

Journal ArticleDOI
TL;DR: A measure of entropy for any discrete Choquet capacity is introduced and it is interpreted in the setting of aggregation by the Choquet integral.

Journal ArticleDOI
TL;DR: In this paper, a new algorithm for automatic phase correction of NMR spectra based on entropy minimization is proposed, where the optimal zero-order and first-order phase corrections for a NMR spectrum are determined by minimizing entropy.

Journal ArticleDOI
TL;DR: In this article, an analysis method is developed for the robust and efficient estimation of 3D seismic local structural entropy, which is a measure of local discontinuity, which avoids the computation of large covariance matrices and eigenvalues associated with the eigenstructure-based and semblance-based coherency estimates.
Abstract: In this work, an analysis method is developed for the robust and efficient estimation of 3-D seismic local structural entropy, which is a measure of local discontinuity. This method avoids the computation of large covariance matrices and eigenvalues, associated with the eigenstructure-based and semblance-based coherency estimates. We introduce a number of local discontinuity measures, based on the relations between subvolumes (quadrants) of the analysis cube. The scale of the analysis is determined by the type of geological feature that is of interest to the interpreter. By combining local structural entropy volumes using various scales, we obtain a higher lateral resolution and better discrimination between incoherent and coherent seismic events. Furthermore, the method developed is computationally much more efficient than the eigenstructure-based coherency method. Its robustness is demonstrated by synthetic and real data examples.

Journal ArticleDOI
TL;DR: The paper presents an existence result which relates the difficulty of the problem as characterized by the minimum distance between patterns of different classes to the weight range necessary to ensure that a solution exists, which allows us to calculate a weight range for a given category of problems and be confident that the network has the capability to solve the given problems with integer weights in that range.