scispace - formally typeset
Search or ask a question

Showing papers on "Multivariate mutual information published in 2021"


Journal ArticleDOI
TL;DR: In this article, the authors present a measure that satisfies this property, emerges solely from information-theoretic principles, and has the form of a local mutual information, which can be understood from the perspective of exclusions of probability mass.
Abstract: Partial information decomposition of the multivariate mutual information describes the distinct ways in which a set of source variables contains information about a target variable The groundbreaking work of Williams and Beer has shown that this decomposition cannot be determined from classic information theory without making additional assumptions, and several candidate measures have been proposed, often drawing on principles from related fields such as decision theory None of these measures is differentiable with respect to the underlying probability mass function We here present a measure that satisfies this property, emerges solely from information-theoretic principles, and has the form of a local mutual information We show how the measure can be understood from the perspective of exclusions of probability mass, a principle that is foundational to the original definition of mutual information by Fano Since our measure is well defined for individual realizations of random variables it lends itself, for example, to local learning in artificial neural networks We also show that it has a meaningful M\"obius inversion on a redundancy lattice and obeys a target chain rule We give an operational interpretation of the measure based on the decisions that an agent should take if given only the shared information

21 citations


Journal ArticleDOI
TL;DR: This work derives discrete-set signal processing (SP), a novel shift-invariant linear signal processing framework for set functions that brings a new set of tools for analyzing and processing set functions, and, in particular, for dealing with their exponential nature.
Abstract: Set functions are functions (or signals) indexed by the powerset (set of all subsets) of a finite set $\boldsymbol N$ . They are fundamental and ubiquitous in many application domains and have been used, for example, to formally describe or quantify loss functions for semantic image segmentation, the informativeness of sensors in sensor networks the utility of sets of items in recommender systems, cooperative games in game theory, or bidders in combinatorial auctions. In particular, the subclass of submodular functions occurs in many optimization and machine learning problems. In this paper, we derive discrete-set signal processing (SP), a novel shift-invariant linear signal processing framework for set functions. Discrete-set SP considers different notions of shift obtained from set union and difference operations. For each shift it provides associated notions of shift-invariant filters, convolution, Fourier transform, and frequency response. We provide intuition for our framework using the concept of generalized coverage function that we define, identify multivariate mutual information as a special case of a discrete-set spectrum, and motivate frequency ordering. Our work brings a new set of tools for analyzing and processing set functions, and, in particular, for dealing with their exponential nature. We show two prototypical applications and experiments: compression in submodular function optimization and sampling for preference elicitation in combinatorial auctions.

17 citations


Journal ArticleDOI
TL;DR: Partial information decomposition (PID) as discussed by the authors decomposes the multivariate mutual information that a set of source variables contains about a target variable into basic pieces, the so-called atoms.
Abstract: Partial information decomposition (PID) seeks to decompose the multivariate mutual information that a set of source variables contains about a target variable into basic pieces, the so-called atoms...

10 citations


Proceedings ArticleDOI
13 Oct 2021
TL;DR: In this article, a simple, Boolean network model-based solution for efficient inference of Gene Regulatory Networks (GRNs) from time series gene expression data is introduced, which is an effective approach for unveiling important underlying gene-gene relationships and dynamics.
Abstract: The inference of Gene Regulatory Networks (GRNs) from time series gene expression data is an effective approach for unveiling important underlying gene-gene relationships and dynamics. While various computational models exist for accurate inference of GRNs, many are computationally inefficient, and do not focus on simultaneous inference of both network topology and dynamics. In this paper, we introduce a simple, Boolean network model-based solution for efficient inference of GRNs. First, the microarray expression data are discretized using the average gene expression value as a threshold. This step permits an experimental approach of defining the maximum indegree of a network. Next, regulatory genes, including the self-regulations for each target gene, are inferred using estimated multivariate mutual information-based Min-Redundancy Max-Relevance Criterion, and further accurate inference is performed by a swapping operation. Subsequently, we introduce a new method, combining Boolean network regulation modelling and Pearson correlation coefficient to identify the interaction types (inhibition or activation) of the regulatory genes. This method is utilized for the efficient determination of the optimal regulatory rule, consisting AND, OR, and NOT operators, by defining the accurate application of the NOT operation in conjunction and disjunction Boolean functions. The proposed approach is evaluated using two real gene expression datasets for an Escherichia coli gene regulatory network and a fission yeast cell cycle network. Although the Structural Accuracy is approximately the same as existing methods (MIBNI, REVEAL, Best-Fit, BIBN, and CST), the proposed method outperforms all these methods with respect to efficiency and Dynamic Accuracy.

7 citations


Proceedings ArticleDOI
10 Jan 2021
TL;DR: This paper develops an encoder-decoder architecture for a variant of the Variational Auto-Encoder with two latent codes so as to derive optimizable lower bounds of the conditional mutual information in the image synthesis processes and incorporate them into the training objective.
Abstract: In this paper, we look into the problem of disentangled representation learning and controllable image synthesis in a deep generative model. We develop an encoder-decoder architecture for a variant of the Variational Auto-Encoder (VAE) with two latent codes $z_{1}$ and $z_{2}$ . Our framework uses $z_{2}$ to capture specified factors of variation while $z_{1}$ captures the complementary factors of variation. To this end, we analyze the learning problem from the perspective of multivariate mutual information, derive optimizable lower bounds of the conditional mutual information in the image synthesis processes and incorporate them into the training objective. We validate our method empirically on the Color MNIST dataset and the CelebA dataset by showing controllable image syntheses. Our proposed paradigm is simple yet effective and is applicable to many situations, including those where there is not an explicit factorization of features available, or where the features are non-categorical.

5 citations


Journal ArticleDOI
09 Mar 2021-Entropy
TL;DR: The ensemble approach improves on penalty methods for several important real data and model scenarios, including when covariates are strongly associated with the response, when the complexity of the model is high and the trimmed average version of ensemble Lasso is often the best predictor.
Abstract: Regression models provide prediction frameworks for multivariate mutual information analysis that uses information concepts when choosing covariates (also called features) that are important for analysis and prediction. We consider a high dimensional regression framework where the number of covariates (p) exceed the sample size (n). Recent work in high dimensional regression analysis has embraced an ensemble subspace approach that consists of selecting random subsets of covariates with fewer than p covariates, doing statistical analysis on each subset, and then merging the results from the subsets. We examine conditions under which penalty methods such as Lasso perform better when used in the ensemble approach by computing mean squared prediction errors for simulations and a real data example. Linear models with both random and fixed designs are considered. We examine two versions of penalty methods: one where the tuning parameter is selected by cross-validation; and one where the final predictor is a trimmed average of individual predictors corresponding to the members of a set of fixed tuning parameters. We find that the ensemble approach improves on penalty methods for several important real data and model scenarios. The improvement occurs when covariates are strongly associated with the response, when the complexity of the model is high. In such cases, the trimmed average version of ensemble Lasso is often the best predictor.

3 citations


Journal ArticleDOI
TL;DR: In this paper, an explainable diagnostic method based on a fully convolutional neural network is proposed, which combines class activation mapping, multivariate mutual information, global average pooling and t-distributed stochastic neighbor embedding.

2 citations


Journal ArticleDOI
07 Jan 2021-Entropy
TL;DR: Neural Information Decomposition (NID) as mentioned in this paper is a new approach to information decomposition, which is both theoretically grounded and can be efficiently estimated in practice using neural networks.
Abstract: If regularity in data takes the form of higher-order functions among groups of variables, models which are biased towards lower-order functions may easily mistake the data for noise. To distinguish whether this is the case, one must be able to quantify the contribution of different orders of dependence to the total information. Recent work in information theory attempts to do this through measures of multivariate mutual information (MMI) and information decomposition (ID). Despite substantial theoretical progress, practical issues related to tractability and learnability of higher-order functions are still largely unaddressed. In this work, we introduce a new approach to information decomposition—termed Neural Information Decomposition (NID)—which is both theoretically grounded, and can be efficiently estimated in practice using neural networks. We show on synthetic data that NID can learn to distinguish higher-order functions from noise, while many unsupervised probability models cannot. Additionally, we demonstrate the usefulness of this framework as a tool for exploring biological and artificial neural networks.

1 citations


Proceedings Article
18 May 2021
TL;DR: In this article, a non-parametric estimator relying on the Bernstein copula is proposed to learn continuous nonparametric graphical models from continuous observational data, which is based on concepts from information theory in order to discover independences and causality between variables.
Abstract: We propose a new framework to learn non-parametric graphical models from continuous observational data. Our method is based on concepts from information theory in order to discover independences and causality between variables: the conditional and multivariate mutual information (such as \cite{verny2017learning} for discrete models). To estimate these quantities, we propose non-parametric estimators relying on the Bernstein copula and that are constructed by exploiting the relation between the mutual information and the copula entropy \cite{ma2011mutual, belalia2017testing}. To our knowledge, this relation is only documented for the bivariate case and, for the need of our algorithms, is here extended to the conditional and multivariate mutual information. This framework leads to a new algorithm to learn continuous non-parametric Bayesian network. Moreover, we use this estimator to speed up the BIC algorithm proposed in \cite{elidan2010copula} by taking advantage of the decomposition of the likelihood function in a sum of mutual information \cite{koller2009probabilistic}. Finally, our method is compared in terms of performances and complexity with other state of the art techniques to learn Copula Bayesian Networks and shows superior results. In particular, it needs less data to recover the true structure and generalizes better on data that are not sampled from Gaussian distributions.

1 citations


Journal ArticleDOI
TL;DR: It is shown that, under the info-clustering framework, correlated random variables can be clustered in an agglomerative manner, and that the existing divisive approach successively segregates the random variables into subsets with increasing multivariate mutual information, while the new agglomersative approach successfully merges subsets of random variables sharing a large amount of normalized total correlation.
Abstract: We show that, under the info-clustering framework, correlated random variables can be clustered in an agglomerative manner. While the existing divisive approach successively segregates the random variables into subsets with increasing multivariate mutual information, our agglomerative approach successively merges subsets of random variables sharing a large amount of normalized total correlation. We show that both approaches result in the same hierarchy of clusters, but the agglomerative approach is an order of magnitude faster than the divisive one. The uniqueness of the hierarchy produced by the two approaches is due to a fundamental connection that we uncover between the well-known total correlation and the recently proposed measure of multivariate mutual information. We implement the new algorithm and provide a data structure for efficient storage and retrieval of the hierarchical clustering solution.

1 citations


Book ChapterDOI
TL;DR: The negative minima of the 3-way multivariate mutual information correspond to Borromean links as mentioned in this paper, paving the way for providing probabilistic analogs of linking numbers.
Abstract: In a joint work with D. Bennequin, we suggested that the (negative) minima of the 3-way multivariate mutual information correspond to Borromean links, paving the way for providing probabilistic analogs of linking numbers. This short note generalizes the correspondence of the minima of k-multivariate interaction information with k Brunnian links in the binary variable case. Following Jakulin and Bratko, the negativity of the associated K-L divergence of the joint probability law with its Kirkwood approximation implies an obstruction to local decomposition into lower order interactions than k, defining a local decomposition inconsistency that reverses Abramsky's contextuality local-global relation. Those negative k-links provide a straightforward definition of collective emergence in complex k-body interacting systems or dataset.

Book ChapterDOI
02 Mar 2021
TL;DR: The negative minima of the 3-way multivariate mutual information correspond to Borromean links, paving the way for providing probabilistic analogs of linking numbers as discussed by the authors.
Abstract: In a joint work with D. Bennequin [8], we suggested that the (negative) minima of the 3-way multivariate mutual information correspond to Borromean links, paving the way for providing probabilistic analogs of linking numbers. This short note generalizes the correspondence of the minima of k multivariate interaction information with k Brunnian links in the binary variable case. Following [16], the negativity of the associated K-L divergence of the joint probability law with its Kirkwood approximation implies an obstruction to local decomposition into lower order interactions than k, defining a local decomposition inconsistency that reverses Abramsky’s contextuality local-global relation [1]. Those negative k-links provide a straightforward definition of collective emergence in complex k-body interacting systems or dataset.

Proceedings ArticleDOI
09 Feb 2021
TL;DR: This study proposes general mixed effect parameters (fixed and random) multivariate SDE model of a population growth which includes random forces governing the dynamic of multivariate distribution of tree size variables.
Abstract: Stochastic differential equations (SDEs) have been intensively used to analyze data from physics, finance, engineering, medicine, biology, and forestry. This study proposes general mixed effect parameters (fixed and random) multivariate SDE model of a population growth which includes random forces governing the dynamic of multivariate distribution of tree size variables. The dynamic of the multivariate probability density function of the size variables in a population is described by mixed effect parameters Gompertz type SDE. The advantages of the multivariate SDE model are that it does not need to choose many different equations to be tried, it relates the size variables dynamic against the time dimension, and considers the underlying covariance structure driving changes in the size variables. SDE model allows us a better understanding of processes driving the dynamic of natural phenomena. The new derived multivariate probability density function and its marginal univariate, bivariate, trivariate, conditional univariate, bivariate, trivariate, and much more distributions can be applied for the modeling of population attributes such as the mean value, quantiles and much more. This study introduces general multivariate mutual information measures based on the differential entropy to capture multivariate interactions between size variables. The purpose of the present study is therefore to experimentally confirm the effectiveness of using multivariate information measures to reconstruct multivariate interactions in size variables. The study of information sharing amongst size variables is illustrated using a dataset of the Scots pine (Pinus Sylvestris L.) stands measurements in Lithuania.