scispace - formally typeset
Search or ask a question

Showing papers on "Entropy (information theory) published in 2011"


Proceedings ArticleDOI
20 Jun 2011
TL;DR: An efficient greedy algorithm for superpixel segmentation is developed by exploiting submodular and mono-tonic properties of the objective function and proving an approximation bound of ½ for the optimality of the solution.
Abstract: We propose a new objective function for superpixel segmentation This objective function consists of two components: entropy rate of a random walk on a graph and a balancing term The entropy rate favors formation of compact and homogeneous clusters, while the balancing function encourages clusters with similar sizes We present a novel graph construction for images and show that this construction induces a matroid — a combinatorial structure that generalizes the concept of linear independence in vector spaces The segmentation is then given by the graph topology that maximizes the objective function under the matroid constraint By exploiting submodular and mono-tonic properties of the objective function, we develop an efficient greedy algorithm Furthermore, we prove an approximation bound of ½ for the optimality of the solution Extensive experiments on the Berkeley segmentation benchmark show that the proposed algorithm outperforms the state of the art in all the standard evaluation metrics

894 citations


Journal ArticleDOI
TL;DR: This paper treats supplier selection as a group multiple criteria decision making (GMCDM) problem and obtain decision makers' opinions in the form of linguistic terms which are converted to trapezoidal fuzzy numbers and extended the VIKOR method with a mechanism to extract and deploy objective weights based on Shannon entropy concept.
Abstract: Recently, resolving the problem of evaluation and ranking the potential suppliers has become as a key strategic factor for business firms. With the development of intelligent and automated information systems in the information era, the need for more efficient decision making methods is growing. The VIKOR method was developed to solve multiple criteria decision making (MCDM) problems with conflicting and non-commensurable criteria assuming that compromising is acceptable to resolve conflicts. On the other side objective weights based on Shannon entropy concept could be used to regulate subjective weights assigned by decision makers or even taking into account the end-users' opinions. In this paper, we treat supplier selection as a group multiple criteria decision making (GMCDM) problem and obtain decision makers' opinions in the form of linguistic terms. Then, these linguistic terms are converted to trapezoidal fuzzy numbers. We extended the VIKOR method with a mechanism to extract and deploy objective weights based on Shannon entropy concept. The final result is obtained through next steps based on factors R, S and Q. A numerical example is proposed to illustrate an application of the proposed method.

612 citations


Posted Content
TL;DR: This paper develops desiderata for probabilistic optimization algorithms, then presents a concrete algorithm which addresses each of the computational intractabilities with a sequence of approximations and explicitly addresses the decision problem of maximizing information gain from each evaluation.
Abstract: Contemporary global optimization algorithms are based on local measures of utility, rather than a probability measure over location and value of the optimum. They thus attempt to collect low function values, not to learn about the optimum. The reason for the absence of probabilistic global optimizers is that the corresponding inference problem is intractable in several ways. This paper develops desiderata for probabilistic optimization algorithms, then presents a concrete algorithm which addresses each of the computational intractabilities with a sequence of approximations and explicitly adresses the decision problem of maximizing information gain from each evaluation.

424 citations


Journal ArticleDOI
TL;DR: Two new information metrics such as the generalized entropy metric and the information distance metric are proposed to detect low-rate DDoS attacks by measuring the difference between legitimate traffic and attack traffic.
Abstract: A low-rate distributed denial of service (DDoS) attack has significant ability of concealing its traffic because it is very much like normal traffic. It has the capacity to elude the current anomaly-based detection schemes. An information metric can quantify the differences of network traffic with various probability distributions. In this paper, we innovatively propose using two new information metrics such as the generalized entropy metric and the information distance metric to detect low-rate DDoS attacks by measuring the difference between legitimate traffic and attack traffic. The proposed generalized entropy metric can detect attacks several hops earlier (three hops earlier while the order α = 10 ) than the traditional Shannon metric. The proposed information distance metric outperforms (six hops earlier while the order α = 10) the popular Kullback-Leibler divergence approach as it can clearly enlarge the adjudication distance and then obtain the optimal detection sensitivity. The experimental results show that the proposed information metrics can effectively detect low-rate DDoS attacks and clearly reduce the false positive rate. Furthermore, the proposed IP traceback algorithm can find all attacks as well as attackers from their own local area networks (LANs) and discard attack traffic.

351 citations


Proceedings ArticleDOI
Michael Muter1, Naim Asaj1
05 Jun 2011
TL;DR: In this article, the authors explore the applicability of entropy-based attack detection for in-vehicle networks and illustrate the crucial aspects for an adaptation of such an approach to the automotive domain.
Abstract: Due to an increased connectivity and seamless integration of information technology into modern vehicles, a trend of research in the automotive domain is the development of holistic IT security concepts. Within the scope of this development, vehicular attack detection is one concept which gains an increased attention, because of its reactive nature that allows to respond to threats during runtime. In this paper we explore the applicability of entropy-based attack detection for in-vehicle networks. We illustrate the crucial aspects for an adaptation of such an approach to the automotive domain. Moreover, we show first exemplary results by applying the approach to measurements derived from a standard vehicle's CAN-Body network.

274 citations


Journal ArticleDOI
TL;DR: This work presents the open-source MATLAB toolbox TRENTOOL, an implementation of transfer entropy and mutual information analysis that aims to support the user in the application of this information theoretic measure.
Abstract: Background Transfer entropy (TE) is a measure for the detection of directed interactions. Transfer entropy is an information theoretic implementation of Wiener's principle of observational causality. It offers an approach to the detection of neuronal interactions that is free of an explicit model of the interactions. Hence, it offers the power to analyze linear and nonlinear interactions alike. This allows for example the comprehensive analysis of directed interactions in neural networks at various levels of description. Here we present the open-source MATLAB toolbox TRENTOOL that allows the user to handle the considerable complexity of this measure and to validate the obtained results using non-parametrical statistical testing. We demonstrate the use of the toolbox and the performance of the algorithm on simulated data with nonlinear (quadratic) coupling and on local field potentials (LFP) recorded from the retina and the optic tectum of the turtle (Pseudemys scripta elegans) where a neuronal one-way connection is likely present.

231 citations


Journal ArticleDOI
TL;DR: It is shown that the relative entropy offers tight control over the errors due to coarse-graining in arbitrary microscopic properties, and a systematic approach to reducing them is suggested.
Abstract: The ability to generate accurate coarse-grained models from reference fully atomic (or otherwise "first-principles") ones has become an important component in modeling the behavior of complex molecular systems with large length and time scales. We recently proposed a novel coarse-graining approach based upon variational minimization of a configuration-space functional called the relative entropy, S(rel), that measures the information lost upon coarse-graining. Here, we develop a broad theoretical framework for this methodology and numerical strategies for its use in practical coarse-graining settings. In particular, we show that the relative entropy offers tight control over the errors due to coarse-graining in arbitrary microscopic properties, and suggests a systematic approach to reducing them. We also describe fundamental connections between this optimization methodology and other coarse-graining strategies like inverse Monte Carlo, force matching, energy matching, and variational mean-field theory. We suggest several new numerical approaches to its minimization that provide new coarse-graining strategies. Finally, we demonstrate the application of these theoretical considerations and algorithms to a simple, instructive system and characterize convergence and errors within the relative entropy framework.

230 citations


Journal ArticleDOI
TL;DR: The fault classification results show that the support vector machine identified the fault categories of rolling element bearing more accurately and has a better diagnosis performance as compared to the learning vector quantization and self-organizing maps.

205 citations


Journal ArticleDOI
TL;DR: The approach follows a sequential procedure for nonuniform embedding of multivariate time series, whereby embedding vectors are built progressively on the basis of a minimization criterion applied to the entropy of the present state of the system conditioned to its past states.
Abstract: We present an approach, framed in information theory, to assess nonlinear causality between the subsystems of a whole stochastic or deterministic dynamical system. The approach follows a sequential procedure for nonuniform embedding of multivariate time series, whereby embedding vectors are built progressively on the basis of a minimization criterion applied to the entropy of the present state of the system conditioned to its past states. A corrected conditional entropy estimator compensating for the biasing effect of single points in the quantized hyperspace is used to guarantee the existence of a minimum entropy rate at which to terminate the procedure. The causal coupling is detected according to the Granger notion of predictability improvement, and is quantified in terms of information transfer. We apply the approach to simulations of deterministic and stochastic systems, showing its superiority over standard uniform embedding. Effects of quantization, data length, and noise contamination are investigated. As practical applications, we consider the assessment of cardiovascular regulatory mechanisms from the analysis of heart period, arterial pressure, and respiration time series, and the investigation of the information flow across brain areas from multichannel scalp electroencephalographic recordings.

204 citations


Journal ArticleDOI
Abstract: This paper proposes entropy balancing, a data preprocessing method to achieve covariate balance in observational studies with binary treatments. Entropy balancing relies on a maximum entropy reweighting scheme that calibrates unit weights so that the reweighted treatment and control group satisfy a potentially large set of prespecified balance conditions that incorporate information about known sample moments. Entropy balancing thereby exactly adjusts inequalities in representation with respect to the first, second, and possibly higher moments of the covariate distributions. These balance improvements can reduce model dependence for the subsequent estimation of treatment effects. The method assures that balance improves on all covariate moments included in the reweighting. It also obviates the need for continual balance checking and iterative searching over propensity score models that may stochastically balance the covariate moments. We demonstrate the use of entropy balancing with Monte Carlo simulations and empirical applications.

203 citations


Journal ArticleDOI
TL;DR: A new and brief proof of the EPI is developed through a mutual information inequality, which replaces Stam and Blachman's Fisher information inequality (FII) and an inequality for MMSE by Guo, Shamai, and Verdú used in earlier proofs.
Abstract: While most useful information theoretic inequalities can be deduced from the basic properties of entropy or mutual information, up to now Shannon's entropy power inequality (EPI) is an exception: Existing information theoretic proofs of the EPI hinge on representations of differential entropy using either Fisher information or minimum mean-square error (MMSE), which are derived from de Bruijn's identity. In this paper, we first present an unified view of these proofs, showing that they share two essential ingredients: 1) a data processing argument applied to a covariance-preserving linear transformation; 2) an integration over a path of a continuous Gaussian perturbation. Using these ingredients, we develop a new and brief proof of the EPI through a mutual information inequality, which replaces Stam and Blachman's Fisher information inequality (FII) and an inequality for MMSE by Guo, Shamai, and Verdu used in earlier proofs. The result has the advantage of being very simple in that it relies only on the basic properties of mutual information. These ideas are then generalized to various extended versions of the EPI: Zamir and Feder's generalized EPI for linear transformations of the random variables, Takano and Johnson's EPI for dependent variables, Liu and Viswanath's covariance-constrained EPI, and Costa's concavity inequality for the entropy power.

Journal ArticleDOI
TL;DR: Transfer entropy analysis of MEG source-level signals detected changes in the network between the different task types that prominently involved the left temporal pole and cerebellum--structures that have previously been implied in auditory short-term or working memory.
Abstract: The analysis of cortical and subcortical networks requires the identification of their nodes, and of the topology and dynamics of their interactions. Exploratory tools for the identification of nodes are available, e.g. magnetoencephalography (MEG) in combination with beamformer source analysis. Competing network topologies and interaction models can be investigated using dynamic causal modelling. However, we lack a method for the exploratory investigation of network topologies to choose from the very large number of possible network graphs. Ideally, this method should not require a pre-specified model of the interaction. Transfer entropy--an information theoretic implementation of Wiener-type causality--is a method for the investigation of causal interactions (or information flow) that is independent of a pre-specified interaction model. We analysed MEG data from an auditory short-term memory experiment to assess whether the reconfiguration of networks implied in this task can be detected using transfer entropy. Transfer entropy analysis of MEG source-level signals detected changes in the network between the different task types. These changes prominently involved the left temporal pole and cerebellum--structures that have previously been implied in auditory short-term or working memory. Thus, the analysis of information flow with transfer entropy at the source-level may be used to derive hypotheses for further model-based testing.

Journal ArticleDOI
TL;DR: Analysis of the intrinsic time scales of the chaotic dynamics of a semiconductor laser subject to optical feedback by estimating quantifiers derived from a permutation information approach finds that permutation entropy and permutation statistical complexity allow the extraction of important characteristics of the dynamics of the system.
Abstract: We analyze the intrinsic time scales of the chaotic dynamics of a semiconductor laser subject to optical feedback by estimating quantifiers derived from a permutation information approach. Based on numerically and experimentally obtained times series, we find that permutation entropy and permutation statistical complexity allow the extraction of important characteristics of the dynamics of the system. We provide evidence that permutation statistical complexity is complementary to permutation entropy, giving valuable insights into the role of the different time scales involved in the chaotic regime of the semiconductor laser dynamics subject to delay optical feedback. The results obtained confirm that this novel approach is a conceptually simple and computationally efficient method to identify the characteristic time scales of this relevant physical system.

Journal ArticleDOI
TL;DR: This work introduces a procedure to infer the interactions among a set of binary variables, based on their sampled frequencies and pairwise correlations, and successfully recovers benchmark Ising models even at criticality and in the low temperature phase.
Abstract: We introduce a procedure to infer the interactions among a set of binary variables, based on their sampled frequencies and pairwise correlations. The algorithm builds the clusters of variables contributing most to the entropy of the inferred Ising model and rejects the small contributions due to the sampling noise. Our procedure successfully recovers benchmark Ising models even at criticality and in the low temperature phase, and is applied to neurobiological data.

Journal ArticleDOI
TL;DR: In this article, the authors show that directed information theory can be used to assess Granger causality graphs of stochastic processes and that it is the adequate information theoretic framework needed for neuroscience applications, such as connectivity inference problems.
Abstract: Directed information theory deals with communication channels with feedback. When applied to networks, a natural extension based on causal conditioning is needed. We show here that measures built from directed information theory in networks can be used to assess Granger causality graphs of stochastic processes. We show that directed information theory includes measures such as the transfer entropy, and that it is the adequate information theoretic framework needed for neuroscience applications, such as connectivity inference problems.

Journal ArticleDOI
TL;DR: In this article, it was shown that Shannon's entropy function has a complementary dual function which is called "extropy" and that the entropy and the extropy of a binary distribution are identical, but the measure bifurcates into a pair of distinct measures for any quantity that is not merely an event indicator.
Abstract: This article provides a completion to theories of information based on entropy, resolving a longstanding question in its axiomatization as proposed by Shannon and pursued by Jaynes. We show that Shannon's entropy function has a complementary dual function which we call "extropy." The entropy and the extropy of a binary distribution are identical. However, the measure bifurcates into a pair of distinct measures for any quantity that is not merely an event indicator. As with entropy, the maximum extropy distribution is also the uniform distribution, and both measures are invariant with respect to permutations of their mass functions. However, they behave quite differently in their assessments of the refinement of a distribution, the axiom which concerned Shannon and Jaynes. Their duality is specified via the relationship among the entropies and extropies of course and fine partitions. We also analyze the extropy function for densities, showing that relative extropy constitutes a dual to the Kullback-Leibler divergence, widely recognized as the continuous entropy measure. These results are unified within the general structure of Bregman divergences. In this context they identify half the $L_2$ metric as the extropic dual to the entropic directed distance. We describe a statistical application to the scoring of sequential forecast distributions which provoked the discovery.

Journal ArticleDOI
TL;DR: In this article, a more general approach to sofic entropy which produces both measure and topological dynamical invariants was developed, and the variational principle was established in this context.
Abstract: Recently Lewis Bowen introduced a notion of entropy for measure-preserving actions of a countable sofic group on a standard probability space admitting a generating partition with finite entropy. By applying an operator algebra perspective we develop a more general approach to sofic entropy which produces both measure and topological dynamical invariants, and we establish the variational principle in this context. In the case of residually finite groups we use the variational principle to compute the topological entropy of principal algebraic actions whose defining group ring element is invertible in the full group C∗-algebra.

Proceedings ArticleDOI
22 Oct 2011
TL;DR: It is shown how to efficiently simulate the sending of a message to a receiver who has partial information about the message, so that the expected number of bits communicated in the simulation is close to the amount of additional information that the message reveals to the receiver, who has some information About the message.
Abstract: We show how to efficiently simulate the sending of a message to a receiver who has partial information about the message, so that the expected number of bits communicated in the simulation is close to the amount of additional information that the message reveals to the receiver who has some information about the message. This is a generalization and strengthening of the Slepian Wolf theorem, which shows how to carry out such a simulation with low amortized communication in the case that the message is a deterministic function of an input. A caveat is that our simulation is interactive. As a consequence, we prove that the internal information cost(namely the information revealed to the parties) involved in computing any relation or function using a two party interactive protocol is exactly equal to the amortized communication complexity of computing independent copies of the same relation or function. We also show that the only way to prove a strong direct sum theorem for randomized communication complexity is by solving a particular variant of the pointer jumping problem that we define. Our work implies that a strong direct sum theorem for communication complexity holds if and only if efficient compression of communication protocols is possible.

Journal ArticleDOI
TL;DR: According to the minimum entropy principle, the programming model to generate optimal weight of criteria is established, and the corresponding multicriteria decision making method is presented.
Abstract: This paper proposes the concept of intuitionistic fuzzy weighted entropy, and presents a new method for intuitionistic fuzzy multicriteria decision making problems. We first introduce some classical intuitionistic fuzzy entropy measures and verify the entropy of the intuitionistic fuzzy set is the average value of the entropies of its intuitionistic fuzzy values. Then, we present the concept of the intuitionistic fuzzy weighted entropy, which is a natural extension of the entropy for intuitionistic fuzzy sets. Some important weight entropy measures, such as the weighted Szmidt and Kacprzyk entropy, the weighted De Luca-Termini entropy, the weighted score function based entropy, and the weighted min-max entropy, are also given. According to the minimum entropy principle, we establish the programming model to generate optimal weight of criteria, and present the corresponding multicriteria decision making method. Finally, two illustrative examples are given to verify the developed approach.

Journal ArticleDOI
TL;DR: This paper introduces a new entropy-based approach to detecting various covert timing channels based on the observation that the creation of a covert timing channel has certain effects on the entropy of the original process, and hence, a change in the entropy in a process provides a critical clue for covert timingChannel detection.
Abstract: The detection of covert timing channels is of increasing interest in light of recent exploits of covert timing channels over the Internet. However, due to the high variation in legitimate network traffic, detecting covert timing channels is a challenging task. Existing detection schemes are ineffective at detecting most of the covert timing channels known to the security community. In this paper, we introduce a new entropy-based approach to detecting various covert timing channels. Our new approach is based on the observation that the creation of a covert timing channel has certain effects on the entropy of the original process, and hence, a change in the entropy of a process provides a critical clue for covert timing channel detection. Exploiting this observation, we investigate the use of entropy and conditional entropy in detecting covert timing channels. Our experimental results show that our entropy-based approach is sensitive to the current covert timing channels and is capable of detecting them in an accurate manner.

Journal ArticleDOI
Jian Ma1, Zengqi Sun1
TL;DR: It is proved that mutual information is actually negative copula entropy, based on which a method for mutual information estimation is proposed.

Journal ArticleDOI
TL;DR: In this paper, the authors used entropy methods to monitor the evolution of crude oil price movements and found that high entropy values should be related to a more complex and, hence, less predictable market evolution.

Journal ArticleDOI
TL;DR: A reconstruction scheme is derived where both the likelihood and the von Neumann entropy functionals are maximized in order to systematically select the most-likely estimator with the largest entropy, that is, the least-bias estimator, consistent with a given set of measurement data.
Abstract: Quantum-state reconstruction on a finite number of copies of a quantum system with informationally incomplete measurements, as a rule, does not yield a unique result. We derive a reconstruction scheme where both the likelihood and the von Neumann entropy functionals are maximized in order to systematically select the most-likely estimator with the largest entropy, that is, the least-bias estimator, consistent with a given set of measurement data. This is equivalent to the joint consideration of our partial knowledge and ignorance about the ensemble to reconstruct its identity. An interesting structure of such estimators will also be explored.

Journal ArticleDOI
07 Sep 2011-PLOS ONE
TL;DR: This work provides a universal analytical description of this classic scenario in terms of the horizontal visibility graphs associated with the dynamics within the attractors, that it calls Feigenbaum graphs, independent of map nonlinearity or other particulars, and shows that the network entropy mimics the Lyapunov exponent of the map independently of its sign.
Abstract: The recently formulated theory of horizontal visibility graphs transforms time series into graphs and allows the possibility of studying dynamical systems through the characterization of their associated networks. This method leads to a natural graph-theoretical description of nonlinear systems with qualities in the spirit of symbolic dynamics. We support our claim via the case study of the period-doubling and band-splitting attractor cascades that characterize unimodal maps. We provide a universal analytical description of this classic scenario in terms of the horizontal visibility graphs associated with the dynamics within the attractors, that we call Feigenbaum graphs, independent of map nonlinearity or other particulars. We derive exact results for their degree distribution and related quantities, recast them in the context of the renormalization group and find that its fixed points coincide with those of network entropy optimization. Furthermore, we show that the network entropy mimics the Lyapunov exponent of the map independently of its sign, hinting at a Pesin-like relation equally valid out of chaos.

Journal ArticleDOI
TL;DR: In this article, the authors studied discrete-time control systems subject to average data-rate limits and developed a framework to deal with average data rate constraints in a tractable manner that combines ideas from both information and control theories.
Abstract: This paper studies discrete-time control systems subject to average data-rate limits. We focus on a situation where a noisy linear system has been designed assuming transparent feedback and, due to implementation constraints, a source-coding scheme (with unity signal transfer function) has to be deployed in the feedback path. For this situation, and by focusing on a class of source-coding schemes built around entropy coded dithered quantizers, we develop a framework to deal with average data-rate constraints in a tractable manner that combines ideas from both information and control theories. As an illustration of the uses of our framework, we apply it to study the interplay between stability and average data-rates in the considered architecture. It is shown that the proposed class of coding schemes can achieve mean square stability at average data-rates that are, at most, 1.254 bits per sample away from the absolute minimum rate for stability established by Nair and Evans. This rate penalty is compensated by the simplicity of our approach.

DOI
01 Jan 2011
TL;DR: The generality of the polarization principle is investigated and it is shown that separately applying polarization constructions to two correlated processes polarizes both the processes themselves as well as the correlations between them, leading to polar coding theorems for multiple-access channels and separate compression of correlated sources.
Abstract: Polar coding is a recently invented technique for communication over binary-input memoryless channels. This technique allows one to transmit data at rates close to the symmetric-capacity of such channels with arbitrarily high reliability, using low-complexity encoding and decoding algorithms. As such, polar coding is the only explicit low-complexity method known to achieve the capacity of symmetric binary-input memoryless channels. The principle underlying polar coding is channel polarization: recursively combining several copies of a mediocre binary-input channel to create noiseless and useless channels. The same principle can also be used to obtain optimal low-complexity compression schemes for memoryless binary sources. In this dissertation, the generality of the polarization principle is investigated. It is first shown that polarization with recursive procedures is not limited to binary channels and sources. A family of low-complexity methods that polarize all discrete memoryless processes is introduced. In both data transmission and data compression, codes based on such methods achieve optimal rates, i.e., channel capacity and source entropy, respectively. The error probability behavior of such codes is as in the binary case. Next, it is shown that a large class of recursive constructions polarize memoryless processes, establishing the original polar codes as an instance of a large class of codes based on polarization methods. A formula to compute the error probability dependence of generalized constructions on the coding length is derived. Evaluating this formula reveals that substantial error probability improvements over the original polar codes can be achieved at large coding lengths by using generalized constructions, particularly over channels and sources with non-binary alphabets. Polarizing capabilities of recursive methods are shown to extend beyond memoryless processes: Any construction that polarizes memoryless processes will also polarize a large class of processes with memory. The principles developed are applied to settings with multiple memoryless processes. It is shown that separately applying polarization constructions to two correlated processes polarizes both the processes themselves as well as the correlations between them. These observations lead to polar coding theorems for multiple-access channels and separate compression of correlated sources. The proposed coding schemes achieve optimal sum rates in both problems.

Book
08 Dec 2011
TL;DR: This book presents a quantitative model of subjective information, a unified approach to discrete and continuous entropy, a theory of information for stochastic functions, and a model of Shannon entropy of deterministic maps which is quite different from Kolmogorov entropy.
Abstract: There are many open problems related to Shannon information theory. For instance, it has long been recognized that the theory does not take account of the subjectivity of the observer, but all previous attempts to deal with this remained at a rather qualitative level. Another problem is the apparent discrepancy between discrete and continuous entropy. And a task of paramount importance is to define the Shannon entropy of a stochastic trajectory and of a deterministic function. This book provides thorough answers to these questions by suitably modifying Shannon theory. It presents a quantitative model of subjective information, a unified approach to discrete and continuous entropy, a theory of information for stochastic functions, and a model of Shannon entropy of deterministic maps which is quite different from Kolmogorov entropy.

Proceedings ArticleDOI
21 Mar 2011
TL;DR: This article proposes a feature selection criterion, called Entropy based Category Coverage Difference (ECCD), which is based on the distribution of the documents containing the term in the categories, but on the other hand, it takes into account its entropy.
Abstract: In text categorization, feature selection can be essential not only for reducing the index size but also for improving the performance of the classifier. In this article, we propose a feature selection criterion, called Entropy based Category Coverage Difference (ECCD). On the one hand, this criterion is based on the distribution of the documents containing the term in the categories, but on the other hand, it takes into account its entropy. ECCD compares favorably with usual feature selection methods based on document frequency (DF), information gain (IG), mutual information (IM), χ2, odd ratio and GSS on a large collection of XML documents from Wikipedia encyclopedia. Moreover, this comparative study confirms the effectiveness of selection feature techniques derived from the χ2 statistics.

Posted Content
TL;DR: This work shows that the Zhang-Yeung inequality can actually be derived from just one auxiliary variable, and uses the same basic technique of adding auxiliary variables to give many other non-Shannon inequalities in four variables.
Abstract: Any unconstrained information inequality in three or fewer random variables can be written as a linear combination of instances of Shannon's inequality I(A;B|C) >= 0 . Such inequalities are sometimes referred to as "Shannon" inequalities. In 1998, Zhang and Yeung gave the first example of a "non-Shannon" information inequality in four variables. Their technique was to add two auxiliary variables with special properties and then apply Shannon inequalities to the enlarged list. Here we will show that the Zhang-Yeung inequality can actually be derived from just one auxiliary variable. Then we use their same basic technique of adding auxiliary variables to give many other non-Shannon inequalities in four variables. Our list includes the inequalities found by Xu, Wang, and Sun, but it is by no means exhaustive. Furthermore, some of the inequalities obtained may be superseded by stronger inequalities that have yet to be found. Indeed, we show that the Zhang-Yeung inequality is one of those that is superseded. We also present several infinite families of inequalities. This list includes some, but not all of the infinite families found by Matus. Then we will give a description of what additional information these inequalities tell us about entropy space. This will include a conjecture on the maximum possible failure of Ingleton's inequality. Finally, we will present an application of non-Shannon inequalities to network coding. We will demonstrate how these inequalities are useful in finding bounds on the information that can flow through a particular network called the Vamos network.

Journal ArticleDOI
Dong Yu1, Jinyu Li1, Li Deng1
TL;DR: Three confidence calibration methods have been developed and the importance of key features exploited: the generic confidence-score, the application-dependent word distribution, and the rule coverage ratio are demonstrated.
Abstract: Most speech recognition applications in use today rely heavily on confidence measure for making optimal decisions. In this paper, we aim to answer the question: what can be done to improve the quality of confidence measure if we cannot modify the speech recognition engine? The answer provided in this paper is a post-processing step called confidence calibration, which can be viewed as a special adaptation technique applied to confidence measure. Three confidence calibration methods have been developed in this work: the maximum entropy model with distribution constraints, the artificial neural network, and the deep belief network. We compare these approaches and demonstrate the importance of key features exploited: the generic confidence-score, the application-dependent word distribution, and the rule coverage ratio. We demonstrate the effectiveness of confidence calibration on a variety of tasks with significant normalized cross entropy increase and equal error rate reduction.