scispace - formally typeset
Search or ask a question

Showing papers on "Multivariate mutual information published in 2017"


Journal ArticleDOI
TL;DR: A novel computational model of DTIs prediction, based on machine learning methods, which uses molecular substructure fingerprints, Multivariate Mutual Information of proteins and network topology to represent drugs, targets and relationship between them is proposed.

150 citations


Journal ArticleDOI
TL;DR: An ensemble parallel processing bi-objective genetic algorithm based feature selection method is proposed that outperforms that of other state-of-the-art methods in classification accuracy and statistical measures.
Abstract: An ensemble parallel processing bi-objective genetic algorithm based feature selection method is proposed.Rough set theory and Mutual information gain are used to select informative data removing the vague one.Parallel processing in genetic algorithm reduces time complexity.The method is compared with the existing state-of-the-art methods using suitable datasets.Classification accuracy and statistical measures outperforms that of other state-of-the-art methods. Feature selection problem in data mining is addressed here by proposing a bi-objective genetic algorithm based feature selection method. Boundary region analysis of rough set theory and multivariate mutual information of information theory are used as two objective functions in the proposed work, to select only precise and informative data from the data set. Data set is sampled with replacement strategy and the method is applied to determine non-dominated feature subsets from each sampled data set. Finally, ensemble of such bi-objective genetic algorithm based feature selectors is developed with the help of parallel implementations to produce much generalized feature subset. In fact, individual feature selector outputs are aggregated using a novel dominance based principle to produce final feature subset. Proposed work is validated using repository especially for feature selection datasets as well as on UCI machine learning repository datasets and the experimental results are compared with related state of art feature selection methods to show effectiveness of the proposed ensemble feature selection method.

132 citations


Journal ArticleDOI
29 Jun 2017-Entropy
TL;DR: This work presents a new measure of redundancy which measures the common change in surprisal shared between variables at the local or pointwise level, and shows how this redundancy measure can be used within the framework of the Partial Information Decomposition (PID) to give an intuitive decomposition of the multivariate mutual information into redundant, unique and synergistic contributions.
Abstract: The problem of how to properly quantify redundant information is an open question that has been the subject of much recent research. Redundant information refers to information about a target variable S that is common to two or more predictor variables X i . It can be thought of as quantifying overlapping information content or similarities in the representation of S between the X i . We present a new measure of redundancy which measures the common change in surprisal shared between variables at the local or pointwise level. We provide a game-theoretic operational definition of unique information, and use this to derive constraints which are used to obtain a maximum entropy distribution. Redundancy is then calculated from this maximum entropy distribution by counting only those local co-information terms which admit an unambiguous interpretation as redundant information. We show how this redundancy measure can be used within the framework of the Partial Information Decomposition (PID) to give an intuitive decomposition of the multivariate mutual information into redundant, unique and synergistic contributions. We compare our new measure to existing approaches over a range of example systems, including continuous Gaussian variables. Matlab code for the measure is provided, including all considered examples.

125 citations


Journal ArticleDOI
TL;DR: This study constitutes the first rigorous application of information partitioning to environmental time‐series data, and addresses how noise, pdf estimation technique, or source dependencies can influence detected measures.
Abstract: Information theoretic measures can be used to identify non-linear interactions between source and target variables through reductions in uncertainty. In information partitioning, multivariate mutual information is decomposed into synergistic, unique, and redundant components. Synergy is information shared only when sources influence a target together, uniqueness is information only provided by one source, and redundancy is overlapping shared information from multiple sources. While this partitioning has been applied to provide insights into complex dependencies, several proposed partitioning methods overestimate redundant information and omit a component of unique information because they do not account for source dependencies. Additionally, information partitioning has only been applied to time-series data in a limited context, using basic pdf estimation techniques or a Gaussian assumption. We develop a Rescaled Redundancy measure (Rs) to solve the source dependency issue, and present Gaussian, autoregressive, and chaotic test cases to demonstrate its advantages over existing techniques in the presence of noise, various source correlations, and different types of interactions. This study constitutes the first rigorous application of information partitioning to environmental time-series data, and addresses how noise, pdf estimation technique, or source dependencies can influence detected measures. We illustrate how our techniques can unravel the complex nature of forcing and feedback within an ecohydrologic system with an application to 1-minute environmental signals of air temperature, relative humidity, and windspeed. The methods presented here are applicable to the study of a broad range of complex systems composed of interacting variables.

65 citations


Journal ArticleDOI
TL;DR: A framework for Temporal Information Partitioning Networks (TIPNets) is introduced, in which time-series variables are viewed as nodes, and lagged multivariate mutual information measures are links, which enables us to interpret process connectivity in a multivariate context.
Abstract: In an ecohydrologic system, components of atmospheric, vegetation, and root-soil subsystems participate in forcing and feedback interactions at varying time scales and intensities. The structure of this network of complex interactions varies in terms of connectivity, strength, and timescale due to perturbations or changing conditions such as rainfall, drought, or land use. However, characterization of these interactions is difficult due to multivariate and weak dependencies in the presence of noise, non-linearities, and limited data. We introduce a framework for Temporal Information Partitioning Networks (TIPNets), in which time-series variables are viewed as nodes, and lagged multivariate mutual information measures are links. These links are partitioned into synergistic, unique, and redundant}{redundant, synergistic, and unique information components, where synergy is information provided only jointly, unique information is only provided by a single source, and redundancy is overlapping information. We construct TIPNets from 1-minute weather station data over several hour time windows. From a comparison of dry, wet, and rainy conditions, we find that information strengths increase when solar radiation and surface moisture are present, and surface moisture and wind variability are redundant and synergistic influences, respectively. Over a growing season, network trends reveal patterns that vary with vegetation and rainfall patterns. The framework presented here enables us to interpret process connectivity in a multivariate context, which can lead to better inference of behavioral shifts due to perturbations in ecohydrologic systems. This work contributes to more holistic characterizations of system behavior, and can benefit a wide variety of studies of complex systems.

48 citations


ReportDOI
TL;DR: A class of “neighborhood-based” cost functions are introduced, which make it more costly to undertake experiments that can produce different results in similar states, and it is shown that the predictions of this alternative rational inattention theory better conform with evidence from perceptual discrimination experiments.
Abstract: We propose a new principle for measuring the cost of information structures in rational inattention problems, based on the cost of generating the information used to make a decision through a dynamic evidence accumulation process. We introduce a continuous-time model of sequential information sampling, and show that, in a broad class of cases, the choice frequencies resulting from optimal information accumulation are the same as those implied by a static rational inattention problem with a particular static information-cost function. Among the static cost functions that can be justified in this way is the mutual information cost function proposed by Sims [2010], but we show that other cost functions can be justified in this way as well. We introduce a class of "neighborhood-based" cost functions, which also summarize the results of dynamic evidence accumulation, and (unlike mutual information) incorporate a conception of the similarity of states to one another, making it more costly to undertake experiments that can produce different results in similar but non-identical states. With this alternative cost function, optimal information accumulation results in choice frequencies that are similar in similar states; in a continuous-state extension of the model, optimality implies choice frequencies that vary continuously with the state, even when the choice payoffs jump discontinuously with variation in the state. This feature of our version of the rational inattention model conforms with evidence from perceptual discrimination experiments.

41 citations


Journal ArticleDOI
16 Feb 2017-Entropy
TL;DR: This work generalizes the type of constructible lattices and examines the relations between different lattices, to provide an alternative procedure to build multivariate decompositions and shows how information gain and information loss dual lattices lead to a self-consistent unique decomposition, which allows a deeper understanding of the origin and meaning of synergy and redundancy.
Abstract: Williams and Beer (2010) proposed a nonnegative mutual information decomposition, based on the construction of information gain lattices, which allows separating the information that a set of variables contains about another variable into components, interpretable as the unique information of one variable, or redundant and synergy components. In this work, we extend this framework focusing on the lattices that underpin the decomposition. We generalize the type of constructible lattices and examine the relations between different lattices, for example, relating bivariate and trivariate decompositions. We point out that, in information gain lattices, redundancy components are invariant across decompositions, but unique and synergy components are decomposition-dependent. Exploiting the connection between different lattices, we propose a procedure to construct, in the general multivariate case, information gain decompositions from measures of synergy or unique information. We then introduce an alternative type of lattices, information loss lattices, with the role and invariance properties of redundancy and synergy components reversed with respect to gain lattices, and which provide an alternative procedure to build multivariate decompositions. We finally show how information gain and information loss dual lattices lead to a self-consistent unique decomposition, which allows a deeper understanding of the origin and meaning of synergy and redundancy.

37 citations


Journal ArticleDOI
03 Jul 2017-Entropy
TL;DR: A new measure of shared information is proposed, called extractable shared information, that is left monotonic; that is, the information share about S is bounded from below by the information shared about f ( S ) for any function f.
Abstract: We consider the problem of quantifying the information shared by a pair of random variables X 1 , X 2 about another variable S. We propose a new measure of shared information, called extractable shared information, that is left monotonic; that is, the information shared about S is bounded from below by the information shared about f ( S ) for any function f. We show that our measure leads to a new nonnegative decomposition of the mutual information I ( S ; X 1 X 2 ) into shared, complementary and unique components. We study properties of this decomposition and show that a left monotonic shared information is not compatible with a Blackwell interpretation of unique information. We also discuss whether it is possible to have a decomposition in which both shared and unique information are left monotonic.

32 citations


Posted Content
TL;DR: In this article, the authors proposed a novel estimator for mutual information of discrete-continuous mixtures, which significantly widens the applicability of mutual information estimation in real-world applications, where some variables are discrete, some continuous and others are a mixture between continuous and discrete components.
Abstract: Estimating mutual information from observed samples is a basic primitive, useful in several machine learning tasks including correlation mining, information bottleneck clustering, learning a Chow-Liu tree, and conditional independence testing in (causal) graphical models. While mutual information is a well-defined quantity in general probability spaces, existing estimators can only handle two special cases of purely discrete or purely continuous pairs of random variables. The main challenge is that these methods first estimate the (differential) entropies of X, Y and the pair (X;Y) and add them up with appropriate signs to get an estimate of the mutual information. These 3H-estimators cannot be applied in general mixture spaces, where entropy is not well-defined. In this paper, we design a novel estimator for mutual information of discrete-continuous mixtures. We prove that the proposed estimator is consistent. We provide numerical experiments suggesting superiority of the proposed estimator compared to other heuristics of adding small continuous noise to all the samples and applying standard estimators tailored for purely continuous variables, and quantizing the samples and applying standard estimators tailored for purely discrete variables. This significantly widens the applicability of mutual information estimation in real-world applications, where some variables are discrete, some continuous, and others are a mixture between continuous and discrete components.

23 citations


Journal ArticleDOI
TL;DR: The present study examines the application of Normalized Multiinformation (NMI) as a scalar measure of shared information content in a multivariate network that is robust with respect to changes in network size and is shown to be more sensitive to developmental effects than first order synchronous and nonsynchronous measures of network complexity.

21 citations


Proceedings ArticleDOI
27 Nov 2017
TL;DR: This work presents a unified analysis framework based on mutual information and two of its decomposition: specific and pointwise mutual information to quantify the amount of information content between different value combinations from multiple variables over time.
Abstract: Identification of salient features from a time-varying multivariate system plays an important role in scientific data understanding. In this work, we present a unified analysis framework based on mutual information and two of its decomposition: specific and pointwise mutual information to quantify the amount of information content between different value combinations from multiple variables over time. The pointwise mutual information (PMI), computed for each value combination, is used to construct informative scalar fields, which allow close examination of combined and complementary information possessed by multiple variables. Since PMI gives us a way of quantifying information shared among all combinations of scalar values for multiple variables, it is used to identify salient isovalue tuples. Visualization of isosurfaces on those selected tuples depicts combined or complementary relationships in the data. For intuitive interaction with the data, an interactive interface is designed based on the proposed information-theoretic measures. Finally, successful application of the proposed method on two time-varying data sets demonstrates the efficacy of the system.

Proceedings ArticleDOI
19 Aug 2017
TL;DR: The principle of total correlation ex-planation (CorEx) as discussed by the authors is to learn representations of data that "explain" as much dependence in the data as possible, which has been shown to be useful for unsupervised learning.
Abstract: Learning by children and animals occurs effortlessly and largely without obvious supervision. Successes in automating supervised learning have not translated to the more ambiguous realm of unsupervised learning where goals and labels are not provided. Barlow (1961) suggested that the signal that brains leverage for unsupervised learning is dependence, or redundancy, in the sensory environment. Dependence can be characterized using the information-theoretic multivariate mutual information measure called total correlation. The principle of Total Correlation Ex-planation (CorEx) is to learn representations of data that "explain" as much dependence in the data as possible. We review some manifestations of this principle along with successes in unsupervised learning problems across diverse domains including human behavior, biology, and language.

Proceedings Article
09 Jan 2017
TL;DR: First, it is shown that this bias is a particular case of the maximization of mutual information between words and meanings, and second, the optimality is proven within a more general information theoretic framework where mutual information maximization competes with other information theory principles.
Abstract: Vocabulary learning by children can be characterized by many biases. When encountering a new word, children as well as adults, are biased towards assuming that it means something totally different from the words that they already know. To the best of our knowledge, the 1st mathematical proof of the optimality of this bias is presented here. First, it is shown that this bias is a particular case of the maximization of mutual information between words and meanings. Second, the optimality is proven within a more general information theoretic framework where mutual information maximization competes with other information theoretic principles. The bias is a prediction from modern information theory. The relationship between information theoretic principles and the principles of contrast and mutual exclusivity is also shown.

Posted ContentDOI
13 Oct 2017
TL;DR: This work develops K nearest neighbor based estimators for this functional, employing importance sampling, and a coupling trick, and proves the finite k consistency of such an estimator.
Abstract: The conditional mutual information I(X; Y|Z) measures the average information that X and Y contain about each other given Z. This is an important primitive in many learning problems including conditional independence testing, graphical model inference, causal strength estimation and time-series problems. In several applications, it is desirable to have a functional purely of the conditional distribution p y|x, z rather than of the joint distribution p X, Y, Z . We define the potential conditional mutual information as the conditional mutual information calculated with a modified joint distribution p Y|X, Z q X, Z , where q X, Z is a potential distribution, fixed airport. We develop K nearest neighbor based estimators for this functional, employing importance sampling, and a coupling trick, and prove the finite k consistency of such an estimator. We demonstrate that the estimator has excellent practical performance and show an application in dynamical system inference.

Journal ArticleDOI
TL;DR: Experimental results on the KDDcup99 and Network Security Laboratory-Knowledge Discovery and Data Mining datasets showed that the proposed feature selection methods have a higher detection and accuracy and lower false-positive rate compared with the Pairwise linear correlation coefficient and the pairwise MI employed in several previous algorithms.
Abstract: Feature selection is one of the major problems in an intrusion detection system (IDS) since there are additional and irrelevant features. This problem causes incorrect classification and low detect...

Proceedings ArticleDOI
01 Jun 2017
TL;DR: This work uses interaction information, which can be negative, to find the direction of causal influences among variables in a triangle topology under some mild assumptions.
Abstract: Interaction information is one of the multivariate generalizations of mutual information, which expresses the amount of information shared among a set of variables, beyond the information shared in any proper subset of those variables. Unlike (conditional) mutual information, which is always non-negative, interaction information can be negative. We utilize this property to find the direction of causal influences among variables in a triangle topology under some mild assumptions.

Posted Content
TL;DR: The principle of total correlation explanation (CorEx) as discussed by the authors is to learn representations of data that "explain" as much dependence in the data as possible, which has been shown to be useful for unsupervised learning.
Abstract: Learning by children and animals occurs effortlessly and largely without obvious supervision. Successes in automating supervised learning have not translated to the more ambiguous realm of unsupervised learning where goals and labels are not provided. Barlow (1961) suggested that the signal that brains leverage for unsupervised learning is dependence, or redundancy, in the sensory environment. Dependence can be characterized using the information-theoretic multivariate mutual information measure called total correlation. The principle of Total Cor-relation Ex-planation (CorEx) is to learn representations of data that "explain" as much dependence in the data as possible. We review some manifestations of this principle along with successes in unsupervised learning problems across diverse domains including human behavior, biology, and language.

Journal ArticleDOI
TL;DR: A new method to measure the influence of a third variable on the interactions of two variables is proposed and a normalized TMI and other derivatives of the TMI are introduced to reveal more information among variables.
Abstract: In this paper, we propose a new method to measure the influence of a third variable on the interactions of two variables. The method called transfer mutual information (TMI) is defined by the difference between the mutual information and the partial mutual information. It is established on the assumption that if the presence or the absence of one variable does make change to the interactions of another two variables, then quantifying this change is supposed to be the influence from this variable to those two variables. Moreover, a normalized TMI and other derivatives of the TMI are introduced as well. The empirical analysis including the simulations as well as real-world applications is investigated to examine this measure and to reveal more information among variables.

Proceedings ArticleDOI
01 Nov 2017
TL;DR: A mutual information-based multivariate phase synchrony measure is used to assess local-scale connectivity and classify EEG to BCI control condition and shows that the channels on the right hemisphere (left hemisphere) are more synchronized during left (right) hand movement motor imagery.
Abstract: In electroencephalography(EEG)-based brain computer interfaces (BCIs), interactions between different areas of the user's brain can be measured using a phase synchronization measure. In this paper, a mutual information-based multivariate phase synchrony measure is used to assess local-scale connectivity and classify EEG to BCI control condition. The results obtained using a well-known database shows that the method proposed in this paper significantly outperforms the existing technique when used for classifying right and left hand movement motor imageries of 5 different subjects using their recorded EEG signals. Specifically, the mean accuracy of the proposed method is 70% higher than that of the existing techniques based on synchrony measures. Also, statistical test shows that the channels on the right hemisphere (left hemisphere) are more synchronized during left (right) hand movement motor imagery.

Journal ArticleDOI
TL;DR: It is demonstrated that this is in fact possible: the information X's minimal sufficient statistic preserves about Y is exactly the information that Y's minimal necessary statistic preservesAbout X, equivalent to the mutual information between the forward- and reverse-time causal states of computational mechanics.
Abstract: One of the most basic characterizations of the relationship between two random variables, X and Y, is the value of their mutual information. Unfortunately, calculating it analytically and estimating it empirically are often stymied by the extremely large dimension of the variables. One might hope to replace such a high-dimensional variable by a smaller one that preserves its relationship with the other. It is well known that either X (or Y) can be replaced by its minimal sufficient statistic about Y (or X) while preserving the mutual information. While intuitively reasonable, it is not obvious or straightforward that both variables can be replaced simultaneously. We demonstrate that this is in fact possible: the information X's minimal sufficient statistic preserves about Y is exactly the information that Y's minimal sufficient statistic preserves about X. We call this procedure information trimming. As an important corollary, we consider the case where one variable is a stochastic process' past and the other its future. In this case, the mutual information is the channel transmission rate between the channel's effective states. That is, the past-future mutual information (the excess entropy) is the amount of information about the future that can be predicted using the past. Translating our result about minimal sufficient statistics, this is equivalent to the mutual information between the forward- and reverse-time causal states of computational mechanics. We close by discussing multivariate extensions to this use of minimal sufficient statistics.

Journal ArticleDOI
TL;DR: A new computational method to realize mutual information that is based on the consideration of multiple neural networks when defining mutual information, thereby simplifying the method and showing that mutual information could be increased via the present method.

Journal ArticleDOI
TL;DR: The compressed secret key agreement problem helps shed new light on resolving the difficult problem ofsecret key agreement with rate-limited discussion, by offering a more structured achieving scheme and some simpler conjectures to prove.
Abstract: The multiterminal secret key agreement problem by public discussion is formulated with an additional source compression step where, prior to the public discussion phase, users independently compress their private sources to filter out strongly correlated components for generating a common secret key. The objective is to maximize the achievable key rate as a function of the joint entropy of the compressed sources. Since the maximum achievable key rate captures the total amount of information mutual to the compressed sources, an optimal compression scheme essentially maximizes the multivariate mutual information per bit of randomness of the private sources, and can therefore be viewed more generally as a dimension reduction technique. Single-letter lower and upper bounds on the maximum achievable key rate are derived for the general source model, and an explicit polynomial-time computable formula is obtained for the pairwise independent network model. In particular, the converse results and the upper bounds are obtained from those of the related secret key agreement problem with rate-limited discussion. A precise duality is shown for the two-user case with one-way discussion, and such duality is extended to obtain the desired converse results in the multi-user case. In addition to posing new challenges in information processing and dimension reduction, the compressed secret key agreement problem helps shed new light on resolving the difficult problem of secret key agreement with rate-limited discussion, by offering a more structured achieving scheme and some simpler conjectures to prove.

Book ChapterDOI
11 Sep 2017
TL;DR: In this paper, the authors proposed a method to find the optimal number of features for a task based on mutual information, which is useful for high-dimensional data analytics, where feature selection is an indispensable preprocessing step.
Abstract: For high dimensional data analytics, feature selection is an indispensable preprocessing step to reduce dimensionality and keep the simplicity and interpretability of models. This is particularly important for fuzzy modeling since fuzzy models are widely recognized for their transparency and interpretability. Despite the substantial work on feature selection, there is little research on determining the optimal number of features for a task. In this paper, we propose a method to help find the optimal number of feature effectively based on mutual information.

Journal ArticleDOI
TL;DR: In this article, the mutual information measure, which is an information-theoretic concept and able to detect linear and non-linear dependencies, is introduced and the empirical study of Baitinger and Papenbrock is replicated using mutual information-based networks.
Abstract: Today's asset management academia and practice is dominated by mean-variance thinking. In consequence, this leads to the quantification of the dependence structure of asset returns by the covariance or the Pearson's correlation coefficient matrix. However, the respective dependence measures are linear by construction and hence unable to detect non-linear dependencies. This article tackles the described concern with regard to the previous publication of Baitinger and Papenbrock (2017). We introduce the mutual information measure, which is an information-theoretic concept and able to detect linear and non-linear dependencies. Next, correlation-based networks are extensively compared to mutual information-based networks. Lastly, the empirical study of Baitinger and Papenbrock (2017) is replicated using mutual information-based networks.

Journal ArticleDOI
TL;DR: The approach presented here can distinguish CA-like systems with respect to their ability to implement contingent mappings and might lead to a novel physical property indicating how suitable a physical medium is to implement a semiotic system.
Abstract: The organic code concept and its operationalization by molecular codes have been introduced to study the semiotic nature of living systems. This contribution develops further the idea that the semantic capacity of a physical medium can be measured by assessing its ability to implement a code as a contingent mapping. For demonstration and evaluation, the approach is applied to a formal medium: elementary cellular automata (ECA). The semantic capacity is measured by counting the number of ways codes can be implemented. Additionally, a link to information theory is established by taking multivariate mutual information for quantifying contingency. It is shown how ECAs differ in their semantic capacities, how this is related to various ECA classifications, and how this depends on how a meaning is defined. Interestingly, if the meaning should persist for a certain while, the highest semantic capacity is found in CAs with apparently simple behavior, i.e., the fixed-point and two-cycle class. Synergy as a predictor for a CA's ability to implement codes can only be used if context implementing codes are common. For large context spaces with sparse coding contexts synergy is a weak predictor. Concluding, the approach presented here can distinguish CA-like systems with respect to their ability to implement contingent mappings. Applying this to physical systems appears straight forward and might lead to a novel physical property indicating how suitable a physical medium is to implement a semiotic system.

Journal ArticleDOI
14 Oct 2017-Entropy
TL;DR: In this article, the authors formulated the multiterminal secret key agreement problem by public discussion with an additional source compression step where, prior to the public discussion phase, users independently compress their private sources to filter out strongly correlated components in order to generate a common secret key.
Abstract: The multiterminal secret key agreement problem by public discussion is formulated with an additional source compression step where, prior to the public discussion phase, users independently compress their private sources to filter out strongly correlated components in order to generate a common secret key. The objective is to maximize the achievable key rate as a function of the joint entropy of the compressed sources. Since the maximum achievable key rate captures the total amount of information mutual to the compressed sources, an optimal compression scheme essentially maximizes the multivariate mutual information per bit of randomness of the private sources, and can therefore be viewed more generally as a dimension reduction technique. Single-letter lower and upper bounds on the maximum achievable key rate are derived for the general source model, and an explicit polynomial-time computable formula is obtained for the pairwise independent network model. In particular, the converse results and the upper bounds are obtained from those of the related secret key agreement problem with rate-limited discussion. A precise duality is shown for the two-user case with one-way discussion, and such duality is extended to obtain the desired converse results in the multi-user case. In addition to posing new challenges in information processing and dimension reduction, the compressed secret key agreement problem helps shed new light on resolving the difficult problem of secret key agreement with rate-limited discussion by offering a more structured achieving scheme and some simpler conjectures to prove.

Posted Content
TL;DR: In this paper, a non-parametric estimator for mutual information is used to create a nonparametric test for multivariate conditional independence, which is then combined with an efficient constraint-based algorithm for learning the graph structure.
Abstract: We propose a method for learning Markov network structures for continuous data without invoking any assumptions about the distribution of the variables. The method makes use of previous work on a non-parametric estimator for mutual information which is used to create a non-parametric test for multivariate conditional independence. This independence test is then combined with an efficient constraint-based algorithm for learning the graph structure. The performance of the method is evaluated on several synthetic data sets and it is shown to learn considerably more accurate structures than competing methods when the dependencies between the variables involve non-linearities.

Journal ArticleDOI
18 Nov 2017-Entropy
TL;DR: This article will analyze a still unexplored aspect of these information measures, their dynamic behavior.
Abstract: Information Theory is a branch of mathematics, more specifically probability theory, that studies information quantification. Recently, several researches have been successful with the use of Information Theoretic Learning (ITL) as a new technique of unsupervised learning. In these works, information measures are used as criterion of optimality in learning. In this article, we will analyze a still unexplored aspect of these information measures, their dynamic behavior. Autoregressive models (linear and non-linear) will be used to represent the dynamics in information measures. As a source of dynamic information, videos with different characteristics like fading, monotonous sequences, etc., will be used.

Posted Content
TL;DR: It is shown that correlated random variables can be clustered more efficiently in an agglomerative manner rather than a divisive one and reveals a fundamental connection between the well-known total correlation and the recently proposed multivariate mutual information.
Abstract: An agglomerative clustering of random variables is proposed, where clusters of random variables sharing the maximum amount of multivariate mutual information are merged successively to form larger clusters. Compared to the previous info-clustering algorithms, the agglomerative approach allows the computation to stop earlier when clusters of desired size and accuracy are obtained. An efficient algorithm is also derived based on the submodularity of entropy and the duality between the principal sequence of partitions and the principal sequence for submodular functions.

Proceedings Article
01 Feb 2017
TL;DR: This work defines illum information, the multivariate extension of lautum information and the Csiszár conjugate of multiinformation, and provides operational interpretations of this functional, including in the problem of independence testing of a set of random variables.
Abstract: Shannon's mutual information measures the degree of mutual dependence between two random variables. Two related information functionals have also been developed in the literature: multiinformation, a multivariate extension of mutual information; and lautum information, the Csiszar conjugate of mutual information. In this work, we define illum information, the multivariate extension of lautum information and the Csiszar conjugate of multiinformation. We provide operational interpretations of this functional, including in the problem of independence testing of a set of random variables. Further, we also provide informational characterizations of illum information such as the data processing inequality and the chain rule for distributions on tree-structured graphical models. Finally, as illustrative examples, we compute the illum information for Ising models and Gauss-Markov random fields.