scispace - formally typeset
Search or ask a question

Showing papers on "Entropy (information theory) published in 2013"


Journal ArticleDOI
TL;DR: The results demonstrate that both ApEn and SampEn are extremely sensitive to parameter choices, especially for very short data sets, N ≤ 200, and should be used with extreme caution when choosing parameters for experimental studies with both algorithms.
Abstract: Approximate entropy (ApEn) and sample entropy (SampEn) are mathematical algorithms created to measure the repeatability or predictability within a time series. Both algorithms are extremely sensitive to their input parameters: m (length of the data segment being compared), r (similarity criterion), and N (length of data). There is no established consensus on parameter selection in short data sets, especially for biological data. Therefore, the purpose of this research was to examine the robustness of these two entropy algorithms by exploring the effect of changing parameter values on short data sets. Data with known theoretical entropy qualities as well as experimental data from both healthy young and older adults was utilized. Our results demonstrate that both ApEn and SampEn are extremely sensitive to parameter choices, especially for very short data sets, N ≤ 200. We suggest using N larger than 200, an m of 2 and examine several r values before selecting your parameters. Extreme caution should be used when choosing parameters for experimental studies with both algorithms. Based on our current findings, it appears that SampEn is more reliable for short data sets. SampEn was less sensitive to changes in data length and demonstrated fewer problems with relative consistency.

669 citations


Journal ArticleDOI
TL;DR: The proposed local Shannon entropy measure overcomes several weaknesses of the conventional global Shannon entropyMeasure, including unfair randomness comparisons between images of different sizes, failure to discern image randomness before and after image shuffling, and possible inaccurate scores for synthesized images.

476 citations


Journal ArticleDOI
TL;DR: The Stata package ebalance implements entropy balancing, a multivariate reweighting method described in Hainmueller (2012) that allows users to reweight a dataset such that the covariate distributions in the reweighted data satisfy a set of specied moment conditions.
Abstract: The Stata package ebalance implements entropy balancing, a multivariate reweighting method described in Hainmueller (2011) that allows users to reweight a dataset such that the covariate distributions in the reweighted data satisfy a set of speci ed moment conditions. This can be useful to create balanced samples in observational studies with a binary treatment where the control group data can be reweighted to match the covariate moments in the treatment group. Entropy balancing can also be used to reweight a survey sample to known characteristics from a target population.

428 citations


Journal ArticleDOI
TL;DR: The Stata package ebalance as discussed by the authors implements entropy balancing, a multivariate reweighting method described in Hainmueller (2012) that allows users to reweight a dataset such that the covariate distributions in the reweighted data satisfy a set of specied moment conditions.
Abstract: The Stata package ebalance implements entropy balancing, a multivariate reweighting method described in Hainmueller (2012) that allows users to reweight a dataset such that the covariate distributions in the reweighted data satisfy a set of specied moment conditions. This can be useful to create balanced samples in observational studies with a binary treatment where the control group data can be reweighted to match the covariate moments in the treatment group. Entropy balancing can also be used to reweight a survey sample to known characteristics from a target population.

395 citations


Journal ArticleDOI
Bahriye Akay1
01 Jun 2013
TL;DR: Experiments based on Kapur's entropy indicate that the ABC algorithm can be efficiently used in multilevel thresholding, and CPU time results show that the algorithms are scalable and that the running times of the algorithms seem to grow at a linear rate as the problem size increases.
Abstract: Segmentation is a critical task in image processing. Bi-level segmentation involves dividing the whole image into partitions based on a threshold value, whereas multilevel segmentation involves multiple threshold values. A successful segmentation assigns proper threshold values to optimise a criterion such as entropy or between-class variance. High computational cost and inefficiency of an exhaustive search for the optimal thresholds leads to the use of global search heuristics to set the optimal thresholds. An emerging area in global heuristics is swarm-intelligence, which models the collective behaviour of the organisms. In this paper, two successful swarm-intelligence-based global optimisation algorithms, particle swarm optimisation (PSO) and artificial bee colony (ABC), have been employed to find the optimal multilevel thresholds. Kapur's entropy, one of the maximum entropy techniques, and between-class variance have been investigated as fitness functions. Experiments have been performed on test images using various numbers of thresholds. The results were assessed using statistical tools and suggest that Otsu's technique, PSO and ABC show equal performance when the number of thresholds is two, while the ABC algorithm performs better than PSO and Otsu's technique when the number of thresholds is greater than two. Experiments based on Kapur's entropy indicate that the ABC algorithm can be efficiently used in multilevel thresholding. Moreover, segmentation methods are required to have a minimum running time in addition to high performance. Therefore, the CPU times of ABC and PSO have been investigated to check their validity in real-time. The CPU time results show that the algorithms are scalable and that the running times of the algorithms seem to grow at a linear rate as the problem size increases.

391 citations


Journal ArticleDOI
TL;DR: The primary goal of the study is to suggest the systematic transformation of the entropy into the similarity measure for HFSs and vice versa, and two clustering algorithms are developed under a hesitant fuzzy environment.

328 citations


Journal ArticleDOI
TL;DR: A comparison technique is used to derive a new Entropy Stable Weighted Essentially Non-Oscillatory (SSWENO) finite difference method, appropriate for simulations of problems with shocks.

286 citations


Journal ArticleDOI
TL;DR: It is shown that a recent definition of relative Renyi entropy is monotone under completely positive, trace preserving maps, which proves a recent conjecture of Muller-Lennert et al.
Abstract: We show that a recent definition of relative Renyi entropy is monotone under completely positive, trace preserving maps. This proves a recent conjecture of Muller-Lennert et al.

268 citations


Journal ArticleDOI
TL;DR: In this paper, the authors proposed a new method to derive the probability distribution of a function of random variables representing the structural response, based on the maximum entropy principle in which constraints are specified in terms of the fractional moments, in place of commonly used integer moments.

221 citations


Book
04 Nov 2013
TL;DR: This monograph focuses on some of the key modern mathematical tools that are used for the derivation of concentration inequalities, on their links to information theory, and on their various applications to communications and coding.
Abstract: Concentration inequalities have been the subject of exciting developments during the last two decades, and have been intensively studied and used as a powerful tool in various areas. These include convex geometry, functional analysis, statistical physics, mathematical statistics, pure and applied probability theory (e.g., concentration of measure phenomena in random graphs, random matrices, and percolation), information theory, theoretical computer science, learning theory, and dynamical systems.This monograph focuses on some of the key modern mathematical tools that are used for the derivation of concentration inequalities, on their links to information theory, and on their various applications to communications and coding. In addition to being a survey, this monograph also includes various new recent results derived by the authors.The first part of the monograph introduces classical concentration inequalities for martingales, aswell as some recent refinements and extensions. The power and versatility of the martingale approach is exemplified in the context of codes defined on graphs and iterative decoding algorithms, as well as codes for wireless communication.The second part of the monograph introduces the entropy method, an information-theoretic technique for deriving concentration inequalities for functions of many independent random variables. The basic ingredients of the entropy method are discussed first in conjunction with the closely related topic of logarithmic Sobolev inequalities, which are typical of the so-called functional approach to studying the concentration of measure phenomenon. The discussion on logarithmic Sobolev inequalities is complemented by a related viewpoint based on probability in metric spaces. This viewpoint centers around the so-called transportation-cost inequalities, whose roots are in information theory. Some representative results on concentration for dependent random variables are briefly summarized, with emphasis on their connections to the entropy method. Finally, we discuss several applications of the entropy method and related information-theoretic tools to problems in communications and coding. These include strong converses, empirical distributions of good channel codes with non-vanishing error probability, and an information-theoretic converse for concentration of measure.

211 citations


Journal ArticleDOI
TL;DR: It is proved that, for independent identically distributed gray-scale host signals, the proposed method asymptotically approaches the rate-distortion bound of RDH as long as perfect compression can be realized, and establishes the equivalency between reversible data hiding and lossless data compression.
Abstract: State-of-the-art schemes for reversible data hiding (RDH) usually consist of two steps: first construct a host sequence with a sharp histogram via prediction errors, and then embed messages by modifying the histogram with methods, such as difference expansion and histogram shift. In this paper, we focus on the second stage, and propose a histogram modification method for RDH, which embeds the message by recursively utilizing the decompression and compression processes of an entropy coder. We prove that, for independent identically distributed (i.i.d.) gray-scale host signals, the proposed method asymptotically approaches the rate-distortion bound of RDH as long as perfect compression can be realized, i.e., the entropy coder can approach entropy. Therefore, this method establishes the equivalency between reversible data hiding and lossless data compression. Experiments show that this coding method can be used to improve the performance of previous RDH schemes and the improvements are more significant for larger images.

Journal ArticleDOI
01 Jan 2013-Language
TL;DR: This work adduces evidence for the crucial status of words and paradigms for understanding morphological organization from three sources: a comparison between languages of varying degrees of E-complexity, a case study from the particularly challenging conjugational system of Chiquihuitlán Mazatec, and a Monte Carlo simulation modeling the encoding of morphosyntactic properties into formal expressions.
Abstract: Crosslinguistically, inflectional morphology exhibits a spectacular range of complexity in both the structure of individual words and the organization of systems that words participate in. We distinguish two dimensions in the analysis of morphological complexity. enumerative complexity (E-complexity) reflects the number of morphosyntactic distinctions that languages make and the strategies employed to encode them, concerning either the internal composition of words or the arrangement of classes of words into inflection classes. This, we argue, is constrained by integrative complexity (I-complexity). The I-complexity of an inflectional system reflects the difficulty that a paradigmatic system poses for language users (rather than lexicographers) in information-theoretic terms. This becomes clear by distinguishing average paradigm entropy from average conditional entropy . The average entropy of a paradigm is the uncertainty in guessing the realization for a particular cell of the paradigm of a particular lexeme (given knowledge of the possible exponents). This gives one a measure of the complexity of a morphological system—systems with more exponents and more inflection classes will in general have higher average paradigm entropy—but it presupposes a problem that adult native speakers will never encounter. In order to know that a lexeme exists, the speaker must have heard at least one word form, so in the worst case a speaker will be faced with predicting a word form based on knowledge of one other word form of that lexeme. Thus, a better measure of morphological complexity is the average conditional entropy, the average uncertainty in guessing the realization of one randomly selected cell in the paradigm of a lexeme given the realization of one other randomly selected cell. This is the I-complexity of paradigm organization. Viewed from this information-theoretic perspective, languages that appear to differ greatly in their E-complexity—the number of exponents, inflectional classes, and principal parts—can actually be quite similar in terms of the challenge they pose for a language user who already knows how the system works. We adduce evidence for this hypothesis from three sources: a comparison between languages of varying degrees of E-complexity, a case study from the particularly challenging conjugational system of Chiquihuitlan Mazatec, and a Monte Carlo simulation modeling the encoding of morphosyntactic properties into formal expressions. The results of these analyses provide evidence for the crucial status of words and paradigms for understanding morphological organization.

Journal ArticleDOI
TL;DR: A novel approach to the inference of spectral functions from Euclidean time correlator data that makes close contact with modern Bayesian concepts is presented, which is devoid of the asymptotically flat directions present in the Shanon-Jaynes entropy.
Abstract: We present a novel approach to the inference of spectral functions from Euclidean time correlator data that makes close contact with modern Bayesian concepts. Our method differs significantly from the maximum entropy method (MEM). A new set of axioms is postulated for the prior probability, leading to an improved expression, which is devoid of the asymptotically flat directions present in the Shanon-Jaynes entropy. Hyperparameters are integrated out explicitly, liberating us from the Gaussian approximations underlying the evidence approach of the maximum entropy method. We present a realistic test of our method in the context of the nonperturbative extraction of the heavy quark potential. Based on hard-thermal-loop correlator mock data, we establish firm requirements in the number of data points and their accuracy for a successful extraction of the potential from lattice QCD. Finally we reinvestigate quenched lattice QCD correlators from a previous study and provide an improved potential estimation at T2.33TC.

Journal ArticleDOI
TL;DR: It is shown that a pairwise maximum entropy model, which takes into account region-specific activity rates and pairwise interactions, can be robustly and accurately fitted to resting-state human brain activities obtained by functional magnetic resonance imaging and reflects anatomical connexions more accurately than the conventional functional connectivity method.
Abstract: The resting-state human brain networks underlie fundamental cognitive functions and consist of complex interactions among brain regions. However, the level of complexity of the resting-state networks has not been quantified, which has prevented comprehensive descriptions of the brain activity as an integrative system. Here, we address this issue by demonstrating that a pairwise maximum entropy model, which takes into account region-specific activity rates and pairwise interactions, can be robustly and accurately fitted to resting-state human brain activities obtained by functional magnetic resonance imaging. Furthermore, to validate the approximation of the resting-state networks by the pairwise maximum entropy model, we show that the functional interactions estimated by the pairwise maximum entropy model reflect anatomical connexions more accurately than the conventional functional connectivity method. These findings indicate that a relatively simple statistical model not only captures the structure of the resting-state networks but also provides a possible method to derive physiological information about various large-scale brain networks.

Journal ArticleDOI
TL;DR: A simple and solvable model of a device that-like the "neat-fingered being" in Maxwell's famous thought experiment-transfers energy from a cold system to a hot system by rectifying thermal fluctuations is described.
Abstract: We describe a simple and solvable model of a device that---like the ``neat-fingered being'' in Maxwell's famous thought experiment---transfers energy from a cold system to a hot system by rectifying thermal fluctuations. In order to accomplish this task, our device requires a memory register to which it can write information: the increase in the Shannon entropy of the memory compensates the decrease in the thermodynamic entropy arising from the flow of heat against a thermal gradient. We construct the nonequilibrium phase diagram for this device, and find that it can alternatively act as an eraser of information. We discuss our model in the context of the second law of thermodynamics.

Journal ArticleDOI
TL;DR: It is shown that the super Rényi entropy is duality invariant and reduces to entanglement entropy in the q → 1 limit.
Abstract: We consider 3d $ \mathcal{N}\geq 2 $ superconformal field theories on a branched covering of a three-sphere. The Renyi entropy of a CFT is given by the partition function on this space, but conical singularities break the supersymmetry preserved in the bulk. We turn on a compensating R-symmetry gauge field and compute the partition function using localization. We define a supersymmetric observable, called the super Renyi entropy, parametrized by a real number q. We show that the super Renyi entropy is duality invariant and reduces to entanglement entropy in the q → 1 limit. We provide some examples.

Journal ArticleDOI
TL;DR: A new entropy measure, Fuzzy Measure Entropy (FuzzyMEn), is proposed for the analysis of heart rate variability (HRV) signals and could be considered as a valid and reliable method for a clinical HRV application.

Proceedings Article
05 Dec 2013
TL;DR: This work proposes a novel modification of the Good-Turing frequency estimation scheme, which seeks to estimate the shape of the unobserved portion of the distribution, and is robust, general, and theoretically principled; it is expected that it may be fruitfully used as a component within larger machine learning and data analysis systems.
Abstract: Recently, Valiant and Valiant [1, 2] showed that a class of distributional properties, which includes such practically relevant properties as entropy, the number of distinct elements, and distance metrics between pairs of distributions, can be estimated given a sublinear sized sample. Specifically, given a sample consisting of independent draws from any distribution over at most n distinct elements, these properties can be estimated accurately using a sample of size O(n/ log n).We propose a novel modification of this approach and show: 1) theoretically, this estimator is optimal (to constant factors, over worst-case instances), and 2) in practice, it performs exceptionally well for a variety of estimation tasks, on a variety of natural distributions, for a wide range of parameters. Perhaps unsurprisingly, the key step in our approach is to first use the sample to characterize the "unseen" portion of the distribution. This goes beyond such tools as the Good-Turing frequency estimation scheme, which estimates the total probability mass of the unobserved portion of the distribution: we seek to estimate the shape of the unobserved portion of the distribution. This approach is robust, general, and theoretically principled; we expect that it may be fruitfully used as a component within larger machine learning and data analysis systems.

Posted Content
TL;DR: In this paper, the authors examined the sample complexity of MDL based learning procedures for Bayesian networks and showed that the number of samples needed to learn an epsilon-close approximation (in terms of entropy distance) with confidence delta is O((1/epsilon)^(4/3)log(1/epsilon) log( 1/delta)loglog (1/Delta)).
Abstract: In recent years there has been an increasing interest in learning Bayesian networks from data. One of the most effective methods for learning such networks is based on the minimum description length (MDL) principle. Previous work has shown that this learning procedure is asymptotically successful: with probability one, it will converge to the target distribution, given a sufficient number of samples. However, the rate of this convergence has been hitherto unknown. In this work we examine the sample complexity of MDL based learning procedures for Bayesian networks. We show that the number of samples needed to learn an epsilon-close approximation (in terms of entropy distance) with confidence delta is O((1/epsilon)^(4/3)log(1/epsilon)log(1/delta)loglog (1/delta)). This means that the sample complexity is a low-order polynomial in the error threshold and sub-linear in the confidence bound. We also discuss how the constants in this term depend on the complexity of the target distribution. Finally, we address questions of asymptotic minimality and propose a method for using the sample complexity results to speed up the learning process.

Journal ArticleDOI
TL;DR: In this paper, transfer entropy is used to quantify information flows between financial markets and propose a suitable bootstrap procedure for statistical inference, which allows to determine, measure and test for information transfer without being restricted to linear dynamics.
Abstract: We use transfer entropy to quantify information flows between financial markets and propose a suitable bootstrap procedure for statistical inference. Transfer entropy is a model-free measure designed as the Kullback-Leibler distance of transition probabilities. Our approach allows to determine, measure and test for information transfer without being restricted to linear dynamics. In our empirical application, we examine the importance of the credit default swap market relative to the corporate bond market for the pricing of credit risk. We also analyze the dynamic relation between market risk and credit risk proxied by the VIX and the iTraxx Europe, respectively. We conduct the analyses for pre-crisis, crisis and post-crisis periods.

Journal ArticleDOI
TL;DR: Two validity theorems guarantee that the proposed conditional entropy can be used as a reasonable uncertainty measure for incomplete decision systems, and applications of the proposed uncertainty measure in ranking attributes and feature selection are studied with experiments.
Abstract: Uncertainty measures can supply new viewpoints for analyzing data. They can help us in disclosing the substantive characteristics of data. The uncertainty measurement issue is also a key topic in the rough-set theory. Although there are some measures to evaluate the uncertainty for complete decision systems (also called decision tables), they cannot be trivially transplanted into incomplete decision systems. There are relatively few studies on uncertainty measurement in incomplete decision systems. In this paper, we propose a new form of conditional entropy, which can be used to measure the uncertainty in incomplete decision systems. Some important properties of the conditional entropy are obtained. In particular, two validity theorems guarantee that the proposed conditional entropy can be used as a reasonable uncertainty measure for incomplete decision systems. Experiments on some real-life data sets are conducted to test and verify the validity of the proposed measure. Applications of the proposed uncertainty measure in ranking attributes and feature selection are also studied with experiments.

Journal ArticleDOI
TL;DR: A comparison of popular period finding algorithms applied to the light curves of variable stars from the Catalina Real-Time Transient Survey, MACHO and ASAS data sets shows that a new conditional entropy-based algorithm is the most optimal in terms of completeness and speed.
Abstract: This paper presents a comparison of popular period finding algorithms applied to the light curves of variable stars from the Catalina Real-Time Transient Survey, MACHO and ASAS data sets. We analyse the accuracy of the methods against magnitude, sampling rates, quoted period, quality measures (signal-to-noise and number of observations), variability and object classes. We find that measure of dispersion-based techniques – analysis of variance with harmonics and conditional entropy – consistently give the best results but there are clear dependences on object class and light-curve quality. Period aliasing and identifying a period harmonic also remain significant issues. We consider the performance of the algorithms and show that a new conditional entropy-based algorithm is the most optimal in terms of completeness and speed. We also consider a simple ensemble approach and find that it performs no better than individual algorithms.

Book ChapterDOI
20 Aug 2013
TL;DR: Implementation of the generator in two FPGA families confirmed its feasibility in digital technologies and also confirmed it can provide high quality random bit sequences that pass the statistical tests required by AIS31 at rates as high as 200 Mbit/s.
Abstract: The proposed true random number generator (TRNG) exploits the jitter of events propagating in a self-timed ring (STR) to generate random bit sequences at a very high bit rate. It takes advantage of a special feature of STRs that allows the time elapsed between successive events to be set as short as needed, even in the order of picoseconds. If the time interval between the events is set in concordance with the clock jitter magnitude, a simple entropy extraction scheme can be applied to generate random numbers. The proposed STR-based TRNG (STRNG) follows AIS31 recommendations: by using the proposed stochastic model, designers can compute a lower entropy bound as a function of the STR characteristics (number of stages, oscillation period and jitter magnitude). Using the resulting entropy assessment, they can then set the compression rate in the arithmetic post-processing block to reach the required security level determined by the entropy per output bit. Implementation of the generator in two FPGA families confirmed its feasibility in digital technologies and also confirmed it can provide high quality random bit sequences that pass the statistical tests required by AIS31 at rates as high as 200 Mbit/s.

Journal ArticleDOI
TL;DR: A dimension incremental strategy for redcut computation that can find a new reduct in a much shorter time when an attribute set is added to a decision table, and the developed algorithm is effective and efficient.
Abstract: Many real data sets in databases may vary dynamically. With the rapid development of data processing tools, databases increase quickly not only in rows (objects) but also in columns (attributes) nowadays. This phenomena occurs in several fields including image processing, gene sequencing and risk prediction in management. Rough set theory has been conceived as a valid mathematical tool to analyze various types of data. A key problem in rough set theory is executing attribute reduction for a data set. This paper focuses on attribute reduction for data sets with dynamically-increasing attributes. Information entropy is a common measure of uncertainty and has been widely used to construct attribute reduction algorithms. Based on three representative entropies, this paper develops a dimension incremental strategy for redcut computation. When an attribute set is added to a decision table, the developed algorithm can find a new reduct in a much shorter time. Experiments on six data sets downloaded from UCI show that, compared with the traditional non-incremental reduction algorithm, the developed algorithm is effective and efficient.

Journal ArticleDOI
TL;DR: It is shown that knowledge information entropy and knowledge granularity measure can be used to evaluate the certainty degree of knowledge in set-valued information systems, and knowledge rough entropy and know-it-all-in-one-size-fits-all status can be evaluated.

Journal ArticleDOI
TL;DR: In this paper, it is shown that the largest maximin information rate through a memoryless, error-prone channel in this framework coincides with the block-coding zero-error capacity of the channel.
Abstract: In communications, unknown variables are usually modelled as random variables, and concepts such as independence, entropy and information are defined in terms of the underlying probability distributions. In contrast, control theory often treats uncertainties and disturbances as bounded unknowns having no statistical structure. The area of networked control combines both fields, raising the question of whether it is possible to construct meaningful analogues of stochastic concepts such as independence, Markovness, entropy and information without assuming a probability space. This paper introduces a framework for doing so, leading to the construction of a maximin information functional for nonstochastic variables. It is shown that the largest maximin information rate through a memoryless, error-prone channel in this framework coincides with the block-coding zero-error capacity of the channel. Maximin information is then used to derive tight conditions for uniformly estimating the state of a linear time-invariant system over such a channel, paralleling recent results of Matveev and Savkin.

Journal ArticleDOI
TL;DR: In this article, the maximum entropy (ME) and a Gram-Charlier (GC) expansion are applied to generate voltage magnitude, voltage angle and power flow probability density functions (PDFs) based on cumulant arithmetic treatment of linearized power flow equations.
Abstract: Probabilistic load flow (PLF) modeling is gaining renewed popularity as power grid complexity increases due to growth in intermittent renewable energy generation and unpredictable probabilistic loads such as plug-in hybrid electric vehicles (PEVs). In PLF analysis of grid design, operation and optimization, mathematically correct and accurate predictions of probability tail regions are required. In this paper, probability theory is used to solve electrical grid power load flow. The method applies two Maximum Entropy (ME) methods and a Gram-Charlier (GC) expansion to generate voltage magnitude, voltage angle and power flow probability density functions (PDFs) based on cumulant arithmetic treatment of linearized power flow equations. Systematic ME and GC parameter tuning effects on solution accuracy and performance is reported relative to converged deterministic Monte Carlo (MC) results. Comparing ME and GC results versus MC techniques demonstrates that ME methods are superior to the GC methods used in historical literature, and tens of thousands of MC iterations are required to reconstitute statistically accurate PDF tail regions. Direct probabilistic solution methods with ME PDF reconstructions are therefore proposed as mathematically correct, statistically accurate and computationally efficient methods that could be applied in the load flow analysis of large-scale networks.

Journal ArticleDOI
27 May 2013-PLOS ONE
TL;DR: This paper studies the use of the predictive information (PI) of the sensorimotor process as a driving force to generate behavior and introduces the time-local predicting information (TiPI) which allows for exact results to be derived together with explicit update rules for the parameters of the controller in the dynamical systems framework.
Abstract: Information theory is a powerful tool to express principles to drive autonomous systems because it is domain invariant and allows for an intuitive interpretation. This paper studies the use of the predictive information (PI), also called excess entropy or effective measure complexity, of the sensorimotor process as a driving force to generate behavior. We study nonlinear and nonstationary systems and introduce the time-local predicting information (TiPI) which allows us to derive exact results together with explicit update rules for the parameters of the controller in the dynamical systems framework. In this way the information principle, formulated at the level of behavior, is translated to the dynamics of the synapses. We underpin our results with a number of case studies with high-dimensional robotic systems. We show the spontaneous cooperativity in a complex physical system with decentralized control. Moreover, a jointly controlled humanoid robot develops a high behavioral variety depending on its physics and the environment it is dynamically embedded into. The behavior can be decomposed into a succession of low-dimensional modes that increasingly explore the behavior space. This is a promising way to avoid the curse of dimensionality which hinders learning systems to scale well.

Journal ArticleDOI
Graham Cormode1
01 May 2013
TL;DR: This survey introduces the model of continuous distributed monitoring, and describes a selection results in this setting, from the simple counting problem to a variety of other functions that have been studied.
Abstract: In the model of continuous distributed monitoring, a number of observers each see a stream of observations. Their goal is to work together to compute a function of the union of their observations. This can be as simple as counting the total number of observations, or more complex non-linear functions such as tracking the entropy of the induced distribution. Assuming that it is too costly to simply centralize all the observations, it becomes quite challenging to design solutions which provide a good approximation to the current answer, while bounding the communication cost of the observers, and their other resources such as their space usage. This survey introduces this model, and describe a selection results in this setting, from the simple counting problem to a variety of other functions that have been studied.

Journal ArticleDOI
TL;DR: In this paper, four estimators of the directed information rate between a pair of jointly stationary ergodic finite-alphabet processes are proposed, based on universal probability assignments, each exhibiting relative merits such as smoothness, nonnegativity, and boundedness.
Abstract: Four estimators of the directed information rate between a pair of jointly stationary ergodic finite-alphabet processes are proposed, based on universal probability assignments. The first one is a Shannon-McMillan-Breiman-type estimator, similar to those used by Verdu in 2005 and Cai in 2006 for estimation of other information measures. We show the almost sure and L1 convergence properties of the estimator for any underlying universal probability assignment. The other three estimators map universal probability assignments to different functionals, each exhibiting relative merits such as smoothness, nonnegativity, and boundedness. We establish the consistency of these estimators in almost sure and L1 senses, and derive near-optimal rates of convergence in the minimax sense under mild conditions. These estimators carry over directly to estimating other information measures of stationary ergodic finite-alphabet processes, such as entropy rate and mutual information rate, with near-optimal performance and provide alternatives to classical approaches in the existing literature. Guided by these theoretical results, the proposed estimators are implemented using the context-tree weighting algorithm as the universal probability assignment. Experiments on synthetic and real data are presented, demonstrating the potential of the proposed schemes in practice and the utility of directed information estimation in detecting and measuring causal influence and delay.