scispace - formally typeset
Search or ask a question

Showing papers on "Multivariate mutual information published in 2015"


Journal ArticleDOI
TL;DR: This work shows that Gaussian systems frequently exhibit net synergy, i.e., the information carried jointly by both sources is greater than the sum of information carried by each source individually, and provides independent formulas for synergy and redundancy applicable to continuous time-series data.
Abstract: To fully characterize the information that two source variables carry about a third target variable, one must decompose the total information into redundant, unique, and synergistic components, i.e., obtain a partial information decomposition (PID). However, Shannon's theory of information does not provide formulas to fully determine these quantities. Several recent studies have begun addressing this. Some possible definitions for PID quantities have been proposed and some analyses have been carried out on systems composed of discrete variables. Here we present an in-depth analysis of PIDs on Gaussian systems, both static and dynamical. We show that, for a broad class of Gaussian systems, previously proposed PID formulas imply that (i) redundancy reduces to the minimum information provided by either source variable and hence is independent of correlation between sources, and (ii) synergy is the extra information contributed by the weaker source when the stronger source is known and can either increase or decrease with correlation between sources. We find that Gaussian systems frequently exhibit net synergy, i.e., the information carried jointly by both sources is greater than the sum of information carried by each source individually. Drawing from several explicit examples, we discuss the implications of these findings for measures of information transfer and information-based measures of complexity, both generally and within a neuroscience setting. Importantly, by providing independent formulas for synergy and redundancy applicable to continuous time-series data, we provide an approach to characterizing and quantifying information sharing amongst complex system variables.

159 citations


Journal ArticleDOI
16 Sep 2015
TL;DR: Under a general source model without helpers, the capacity for multiterminal secret-key agreement is shown to be equal to the normalized divergence from the joint distribution of the random sources to the product of marginal distributions minimized over partitions of therandom sources.
Abstract: The capacity for multiterminal secret-key agreement inspires a natural generalization of Shannon’s mutual information from two random variables to multiple random variables. Under a general source model without helpers, the capacity is shown to be equal to the normalized divergence from the joint distribution of the random sources to the product of marginal distributions minimized over partitions of the random sources. The mathematical underpinnings are the works on co-intersecting submodular functions and the principle lattices of partitions of the Dilworth truncation. We clarify the connection to these works and enrich them with information-theoretic interpretations and properties that are useful in solving other related problems in information theory as well as machine learning.

80 citations


Journal ArticleDOI
22 May 2015-Entropy
TL;DR: An older approach to define synergistic information based on the projections on exponential families containing only up to k-th order interactions is compared, showing that these measures are not compatible with a decomposition into unique, shared and synergism information if one requires that all terms are always non-negative (local positivity).
Abstract: Recently, a series of papers addressed the problem of decomposing the information of two random variables into shared information, unique information and synergistic information. Several measures were proposed, although still no consensus has been reached. Here, we compare these proposals with an older approach to define synergistic information based on the projections on exponential families containing only up to k-th order interactions. We show that these measures are not compatible with a decomposition into unique, shared and synergistic information if one requires that all terms are always non-negative (local positivity). We illustrate the difference between the two measures for multivariate Gaussians.

70 citations


01 Jan 2015
TL;DR: The capacity for multiterminal secret-key agree-ment inspires a natural generalization of Shannon's mutual information from two random variables to multiple random variables as discussed by the authors, which is useful in solving other related problems in information theory as well as machine learning.
Abstract: The capacity for multiterminal secret-key agree- ment inspires a natural generalization of Shannon's mutual information from two random variables to multiple random variables. Under a general source model without helpers, the capacity is shown to be equal to the normalized divergence from the joint distribution of the random sources to the product of marginal distributions minimized over partitions of the random sources. The mathematical underpinnings are the works on co-intersecting submodular functions and the prin- ciple lattices of partitions of the Dilworth truncation. We clarify the connection to these works and enrich them with informa- tion-theoretic interpretations and properties that are useful in solving other related problems in information theory as well as machine learning.

54 citations


Journal ArticleDOI
TL;DR: By using a pathwise formulation of the population dynamics, it is shown that the optimal switching strategy is characterized by a consistency condition for time-forward and backward path probabilities and clarifies the underlying information-theoretic aspect of selection as a passive information compression.
Abstract: Phenotype switching with and without sensing environment is a common strategy of organisms to survive in a fluctuating environment. Understanding the evolutionary advantages of switching and sensing requires a quantitative evaluation of their fitness gain and its fluctuation together with the conditions for the switching and sensing strategies being adapted to a given environment. In this work, by using a pathwise formulation of the population dynamics, we show that the optimal switching strategy is characterized by a consistency condition for time-forward and backward path probabilities. The formulation also clarifies the underlying information-theoretic aspect of selection as a passive information compression. The loss of fitness by a suboptimal strategy is also shown to satisfy a fluctuation relation, which provides us with the information on how environmental fluctuation impacts the advantages of the optimal strategy. These results are naturally extended to the situation that organisms can use an environmental signal by actively sensing the environment. The fluctuation relations of the fitness gain by sensing are derived in which the multivariate mutual information among the phenotype, the environment, and the signal plays the role to quantify the relevant information in the signal for the fitness gain.

39 citations


Journal ArticleDOI
TL;DR: An important empirical observation is that propositions in a natural language are carriers of predominantly fuzzy possibilistic information (FPI) and fuzzy bimodal information (FBI).

31 citations


01 Jun 2015
TL;DR: It is shown that the multivariate mutual information can be used to characterize the minimum storage required as well as the conditions under which local Omniscience can be achieved for free without increasing the total communication rate required for global omniscience.
Abstract: The problem of successive omniscience is formulated for the study of a recently proposed multivariate mutual information measure. In this problem, a set of users want to achieve omniscience, i.e., recover the private sources of each other by exchanging messages. However, the omniscience is achieved in a successive manner such that local subgroups of users can first achieve local omniscience, i.e., recover the private sources of other users in the same subgroups. Global omniscience among all users is achieved by an additional exchange of messages. This formulation can be motivated by a distributed storage system that enables file sharing among groups of users. It is shown that the multivariate mutual information can be used to characterize the minimum storage required as well as the conditions under which local omniscience can be achieved for free without increasing the total communication rate required for global omniscience. Our results provide new interpretations of the multivariate mutual information.

29 citations


Journal ArticleDOI
TL;DR: The hierarchical mutual information is introduced, which is a generalization of the traditional mutual information and makes it possible to compare hierarchical partitions and hierarchical community structures and illustrates some of the practical applications of the hierarchical Mutual Information.
Abstract: The quest for a quantitative characterization of community and modular structure of complex networks produced a variety of methods and algorithms to classify different networks. However, it is not clear if such methods provide consistent, robust, and meaningful results when considering hierarchies as a whole. Part of the problem is the lack of a similarity measure for the comparison of hierarchical community structures. In this work we give a contribution by introducing the hierarchical mutual information, which is a generalization of the traditional mutual information and makes it possible to compare hierarchical partitions and hierarchical community structures. The normalized version of the hierarchical mutual information should behave analogously to the traditional normalized mutual information. Here the correct behavior of the hierarchical mutual information is corroborated on an extensive battery of numerical experiments. The experiments are performed on artificial hierarchies and on the hierarchical community structure of artificial and empirical networks. Furthermore, the experiments illustrate some of the practical applications of the hierarchical mutual information, namely the comparison of different community detection methods and the study of the consistency, robustness, and temporal evolution of the hierarchical modular structure of networks.

27 citations


Journal ArticleDOI
TL;DR: A new unified definition of mutual information is presented to cover all the various definitions and to fix their mathematical flaws, and the joint distribution of two random variables is defined by taking the marginal probabilities into consideration.
Abstract: There are various definitions of mutual information. Essentially, these definitions can be divided into two classes: (1) definitions with random variables and (2) definitions with ensembles. However, there are some mathematical flaws in these definitions. For instance, Class 1 definitions either neglect the probability spaces or assume the two random variables have the same probability space. Class 2 definitions redefine marginal probabilities from the joint probabilities. In fact, the marginal probabilities are given from the ensembles and should not be redefined from the joint probabilities. Both Class 1 and Class 2 definitions assume a joint distribution exists. Yet, they all ignore an important fact that the joint or the joint probability measure is not unique. In this paper, we first present a new unified definition of mutual information to cover all the various definitions and to fix their mathematical flaws. Our idea is to define the joint distribution of two random variables by taking the marginal probabilities into consideration. Next, we establish some properties of the newly defined mutual information. We then propose a method to calculate mutual information in machine learning. Finally, we apply our newly defined mutual information to credit scoring.

25 citations


Journal ArticleDOI
02 Jul 2015-Entropy
TL;DR: This work considers the problem of defining a measure of redundant information that quantifies how much common information two or more random variables specify about a target random variable and proposes new measures with some desirable properties.
Abstract: We consider the problem of defining a measure of redundant information that quantifies how much common information two or more random variables specify about a target random variable. We discussed desired properties of such a measure, and propose new measures with some desirable properties.

24 citations


Journal ArticleDOI
TL;DR: Simulation studies showed that the proposed method reduced the estimation errors by 45 folds and improved the correlation coefficients with true values by 99 folds, compared with the conventional calculation of mutual information.
Abstract: Mutual information, a general measure of the relatedness between two random variables, has been actively used in the analysis of biomedical data. The mutual information between two discrete variables is conventionally calculated by their joint probabilities estimated from the frequency of observed samples in each combination of variable categories. However, this conventional approach is no longer efficient for discrete variables with many categories, which can be easily found in large-scale biomedical data such as diagnosis codes, drug compounds, and genotypes. Here, we propose a method to provide stable estimations for the mutual information between discrete variables with many categories. Simulation studies showed that the proposed method reduced the estimation errors by 45 folds and improved the correlation coefficients with true values by 99 folds, compared with the conventional calculation of mutual information. The proposed method was also demonstrated through a case study for diagnostic data in electronic health records. This method is expected to be useful in the analysis of various biomedical data with discrete variables.

Journal ArticleDOI
TL;DR: The conditions proposed based on dynamic mutual information (CDMI) feature selection algorithm to overcome the traditional mutual information selection process dynamic correlation problem are dynamic valuation and accurate measurement sample.
Abstract: Aim at existing selection algorithm mutual information inaccurate valuation problem, a condition dynamic concept of mutual information. On this basis, the conditions proposed based on dynamic mutual information (CDMI) feature selection algorithm to overcome the traditional mutual information selection process dynamic correlation problem; conditions of dynamic mutual information throughout the selection process is dynamic valuation, those the samples can be identified after each selection features removed so that they no longer participate in conditional mutual information calculation process, accurate measurement sample. Accurate measurement sample on the degree of importance characteristics and at the same time ensure that the characteristics of information content. The experimental results verify the correctness and effectiveness of the algorithm.

Posted Content
TL;DR: It is shown that any reasonable measure of redundant information cannot be derived by optimization over a single random variable, and any common information based measure of redundancy cannot induce a nonnegative decomposition of the total mutual information.
Abstract: We consider the problem of decomposing the total mutual information conveyed by a pair of predictor random variables about a target random variable into redundant, unique and synergistic contributions. We focus on the relationship between "redundant information" and the more familiar information-theoretic notions of "common information". Our main contribution is an impossibility result. We show that for independent predictor random variables, any common information based measure of redundancy cannot induce a nonnegative decomposition of the total mutual information. Interestingly, this entails that any reasonable measure of redundant information cannot be derived by optimization over a single random variable.

Proceedings ArticleDOI
14 Jun 2015
TL;DR: The two-way Information Bottleneck problem, where two nodes exchange information iteratively about two arbitrarily dependent memoryless sources, is considered and the optimal trade-off between rates of relevance and complexity, and the number of exchange rounds, is obtained through a single-letter characterization.
Abstract: The two-way Information Bottleneck problem, where two nodes exchange information iteratively about two arbitrarily dependent memoryless sources, is considered Based on the observations and the information exchange, each node is required to extract “relevant information”, measured in terms of the normalized mutual information, from two arbitrarily dependent hidden sources The optimal trade-off between rates of relevance and complexity, and the number of exchange rounds, is obtained through a single-letter characterization We further extend the results to the Gaussian case Applications of our setup arise in the development of collaborative clustering algorithms

Proceedings ArticleDOI
10 Dec 2015
TL;DR: This paper investigates a proposed improvement on the three limitations of the Mutual Information estimator through the use of resampling techniques and formulation of mutual information based on differential entropic for regression problems.
Abstract: selecting relevant features for machine learning modeling improves the performance of the learning methods. Mutual information (MI) is known to be used as relevant criterion for selecting feature subsets from input dataset with a nonlinear relationship to the predicting attribute. However, mutual information estimator suffers the following limitation; it depends on smoothing parameters, the feature selection greedy methods lack theoretically justified stopping criteria and in theory it can be used for both classification and regression problems, however in practice more often it formulation is limited to classification problems. This paper investigates a proposed improvement on the three limitations of the Mutual Information estimator (as mentioned above), through the use of resampling techniques and formulation of mutual information based on differential entropic for regression problems.

Proceedings ArticleDOI
11 Sep 2015
TL;DR: The results show that the proposed model outperforms the baseline models in terms of receiver operating characteristic (ROC), indicating promising application of the BNMI in the credit scoring area.
Abstract: Credit scoring profiles the client relationships of empirical attributes (variables) and leverages a scoring model to draw client's credibility. However, empirical attributes often contains a certain degree of uncertainty and requires feature selection. Bayesian network (BN) is an important tool for dealing with uncertain problems and information. Mutual information (MI) measures dependencies between random variables and is therefore a suitable feature selection technique for evaluating the relationship between variables in a complex classification tasks. Using Bayesian network as a statistical model, this study leverages mutual information to build a credit scoring model called BNMI. The learned Bayesian network structure is adaptively adjusted according to mutual information. Empirical study compared the results of BNMI with three existing baseline models. The results show that the proposed model outperforms the baseline models in terms of receiver operating characteristic (ROC), indicating promising application of our BNMI in the credit scoring area.

Journal ArticleDOI
TL;DR: In this paper, the authors introduce a novel axiomatic framework for decomposing the joint entropy, which characterizes the various ways in which random variables can share information, and distinguish between interdependencies where the information is shared redundantly, and synergistic relationships where the sharing structure exists in the whole but not between the parts.
Abstract: The interactions between three or more random variables are often nontrivial, poorly understood, and yet, are paramount for future advances in fields such as network information theory, neuroscience, genetics and many others. In this work, we propose to analyze these interactions as different modes of information sharing. Towards this end, we introduce a novel axiomatic framework for decomposing the joint entropy, which characterizes the various ways in which random variables can share information. The key contribution of our framework is to distinguish between interdependencies where the information is shared redundantly, and synergistic interdependencies where the sharing structure exists in the whole but not between the parts. We show that our axioms determine unique formulas for all the terms of the proposed decomposition for a number of cases of interest. Moreover, we show how these results can be applied to several network information theory problems, providing a more intuitive understanding of their fundamental limits.

Book ChapterDOI
01 Jan 2015
TL;DR: Recent developments in information decomposition and local information dynamics that allow the quantitative decomposition of the computation in a network into the component processes of information storage, information transfer, and information modification are summarized.
Abstract: Neural systems perform acts of information processing. To characterize information processing, we first discuss established measures like entropy and mutual information, mostly in the context of neural coding. In this context, we also point to the open questions in information theory that are related to synergistic and redundant coding. Then, we summarize recent developments in information decomposition and local information dynamics that allow the quantitative decomposition of the computation in a network into the component processes of information storage, information transfer, and information modification. These recent approaches allow to analyze information processing in a network independent of stimuli and reveal the algorithmic structure of a neural information processing system. Most importantly, all measures are discussed with respect to the questions they can address, to the interpretation of their results, and also to their respective pitfalls in the context of neuroscience.

Posted Content
TL;DR: The forward difference expansion of the entropy function defined on all subsets of the variables under study is considered, finding elements of this expansion are invariant to permutation of their suffices and relate higher order mutual informations to lower order ones.
Abstract: Conditional mutual information is important in the selection and interpretation of graphical models. Its empirical version is well known as a generalised likelihood ratio test and that it may be represented as a difference in entropy. We consider the forward difference expansion of the entropy function defined on all subsets of the variables under study. The elements of this expansion are invariant to permutation of their suffices and relate higher order mutual informations to lower order ones. The third order difference is expressible as an, apparently assymmetric, difference between a marginal and a conditional mutual information. Its role in the decomposition for explained information provides a technical definition for synergy between three random variables. Positive values occur when two variables provide alternative explanations for a third; negative values, termed synergies, occur when the sum of explained information is greater than the sum of its parts. Synergies tend to be infrequent; they connect the seemingly unrelated concepts of suppressor variables in regression, on the one hand, and unshielded colliders in Bayes networks (immoralities), on the other. We give novel characterizations of these phenomena that generalise to categorical variables and to higher dimensions. We propose an algorithm for systematically computing low order differences from a given graph. Examples from small scale real-life studies indicate the potential of these techniques for empirical statistical analysis.

Proceedings ArticleDOI
24 Jun 2015
TL;DR: A method based on multivariate mutual information (MMI) is proposed for face recognition that is not hindered by rigorous computation for feature extraction and learning spaces and compared with existing principal component analysis (PCA) based face recognition algorithms.
Abstract: A method based on multivariate mutual information (MMI) is proposed for face recognition. Unlike the existing frameworks, the proposed method is not hindered by rigorous computation for feature extraction and learning spaces. The proposed method uses information-theoretic framework for face recognition. The training set is used to estimate the underlying joint and marginal densities, which are utilized to calculate the mutual information. The mutual information for each pixel value is used to highlight the regions, that correspond to maximum information that are used for face recognition process. Performance of the proposed method is evaluated on two image datasets. The recognition performance of the proposed method is also compared with existing principal component analysis (PCA) based face recognition algorithms.

Proceedings ArticleDOI
01 Oct 2015
TL;DR: Experimental results shows that neural networks can be organized so as to store information content on input patterns by the present method, and it could be observed that generalization performance was much improved by this increase in mutual information.
Abstract: The paper proposes a new information-theoretic method to improve the generalization performance of multi-layered neural networks, called "self-organized mutual information maximization learning". In the method, the self-organizing map (SOM) is successively applied to give the knowledge to the subsequent multi-layered neural networks. In this process, the mutual information between input patterns and competitive neurons is forced to increase by changing the spread parameter. Though several methods to increase information have been proposed in multi-layered neural networks, the present paper is the first to confirm that mutual information play important roles in learning in multi-layered neural networks and how to compute the mutual information. The method was applied to the extended Senate data. In the experiments, it is examined whether mutual information is actually increased by the present method, because mutual information can be seemingly increased by changing the spread parameter. Experimental results shows that even if the parameter responsible for changing mutual information was fixed, mutual information could be increased. This means that neural networks can be organized so as to store information content on input patterns by the present method. In addition, it could be observed that generalization performance was much improved by this increase in mutual information.

Journal ArticleDOI
TL;DR: The article generalizes recent novel findings and presents the first complete decomposition of fitness in terms of information theory, and establishes a formal link between evolutionary ecology and long-standing engineering efforts in the optimization of communication channels.
Abstract: Interactions among populations and their environment can be represented as a communication channel between the evolving organism and its environment. The more information is communicated over this channel, the more uncertainty is reduced, and the higher the attainable fitness. Following this intuitive notion, the article generalizes recent novel findings and presents the first complete decomposition of fitness in terms of information theory. Optimal population growth quantifies the amount of structure in the updated population that unequivocally comes from the environment through the mutual information between both. Turning this finding around, fitness can be optimized by ensuring that not more new information is included in the population distribution than is justified by environmental information. As such, the article goes beyond an ontological re-conceptualizations of fitness in terms of information theory, but also shows a way to optimize fitness. Additional side information about environmental dynamics can be used to increase fitness further, just like technological communication channels are enhanced by refining information about the source. This establishes a formal link between evolutionary ecology and long-standing engineering efforts in the optimization of communication channels. Two empirical applications to practical examples reveal inherent trade-offs among the involved information quantities during fitness optimization.

Book ChapterDOI
13 May 2015
TL;DR: This paper draws on recent work on hypergraph clustering to select the relevant feature subset (RFS) from a set of features using high-order (rather than pairwise) similarities, and constructs a coupled feature hypergraph to model the high- order relations among features.
Abstract: Real-world objects and their features tend to exhibit multiple relationships rather than simple pairwise ones, and as a result basic graph representation can lead to substantial loss of information. Hypergraph representations, on the other hand, allow vertices to be multiply connected by hyperedges and can hence capture multiple or higher order relationships among features. Due to their effectiveness in representing multiple relationships, in this paper, we draw on recent work on hypergraph clustering to select the relevant feature subset (RFS) from a set of features using high-order (rather than pairwise) similarities. Specifically, we first devise a coupled feature representation to represent the data by utilizing self-coupled and inter-feature coupling relationships, which can be more effective to capture the intrinsic linear and nonlinear information on data structure. Regarding the new data representation, we use a new information theoretic criterion referred to as multivariate mutual information to measure the high-order feature combinations with respect to the class labels. Therefore, we construct a coupled feature hypergraph to model the high-order relations among features. Finally, we locate the relevant feature subset (RFS) from feature hypergraph by maximizing features’ average relevance, which has both low redundancy and strong discriminating power. The size of the relevant feature subset (RFS) is determined automatically. Experimental results demonstrate the effectiveness of our feature selection method on a number of standard data-sets.

Proceedings ArticleDOI
02 Nov 2015
TL;DR: A new definition of gene-gene interaction: conditional independence and redundancy based definition of genes in genotype data or microarray expression data together with a definition of interaction group are proposed according to an inequality.
Abstract: Gene-gene interaction is an important factor to consider in selecting genes in genotype data or microarray expression data for association with diseases. However current definitions of gene-gene interaction are not very clear and accurate. An inequality is proved in this paper and a new definition of gene-gene interaction: conditional independence and redundancy based definition of gene-gene interaction (CIR), together with a definition of interaction group are proposed according to this inequality. A new algorithm to detect gene-gene interaction with order greater than two is also proposed based on these new definitions and a theorem. Experimental results show the usefulness of these new definitions and the effectiveness and efficiency of this new algorithm.

Proceedings ArticleDOI
01 Sep 2015
TL;DR: A very interesting observation is the strong correlation among all three signals which is very clear from the 3D phase plots and is also verified from both multivariate mutual information and the gradient.
Abstract: In this paper we study the correlation of respiration, heartbeat and blood pressure signals in 3D space using phase coupling. To the best of our knowledge, it is the first time that the correlation of these signals is investigated in 3D space. We produce 2D and 3D phase plots and examine phase coupling in all cases. We calculate the mutual information which is a widely used metric for estimating such correlations between two signals and the slopes of the 2D phase plots. For examining signals in 3D space we compute the multivariate mutual information. The multivariate mutual information is not a simple generalization of mutual information and reveals different information detecting the correlation among all three examined signals. Additionally to the common metrics for signal synchronization, we examined the use of the gradient, in order to extract and quantify the relations as depicted in the 3D plots. Results showed us a stronger correlation between blood pressure and heartbeat signals and a relatively small correlation between respiratory and heartbeat signals, as well as between respiratory and blood pressure signals. Also, a very interesting observation is the strong correlation among all three signals which is very clear from the 3D phase plots and is also verified from both multivariate mutual information and the gradient.