What is the main explanation for the cycle-to-cycle variability?

Despite a large amount of noise, some form of deterministic nonlinear dynamics is a plausible explanation for the cycle-to-cycle variability.

What is the reversible representation of the symbolic sequences?

Despite the topological equivalence, the presence or absence of probabilistic reversibility in the symbolic sequences becomes reversed by the change in presentation.

What is the main experimental parameter for the spark ignition engine?

For the spark-ignition engine, the input air-fuel ratio was maintained in stoichiometric conditions, but the proportion of exhaust gas recirculation (EGR) was altered for various runs and was the principal experimental parameter.

What is the implicit-state version of the graph?

The implicit-state version is a sofic shift with an associated graph and labeling: a distinct binary symbol is emitted depending on which edge is taken on the transition.

What is the effective experimental parameter for the diesel data?

For the Diesel data, the fraction of residual gas remaining from one combustion cycle to the next was estimated with changes in experimental parameters and is the effective experimental parameter.

What is the language of theoretical symbolic dynamics?

In the language of theoretical symbolic dynamics [19], the explicit-state version is a presentation of a “vertex shift,” as a symbol is emitted corresponding to each new vertex of the transition graph which is visited, and hence explicitly a shift of finite type (with memory 1) on a three-symbol alphabet.

What is the likely explanation for the low-dimensional chaos?

Physically what is most likely is that this dynamics is dominated by sufficiently high-dimensional turbulent fluctuations that globally averaged quantities such as the one considered here are effectively indistinguishable from linear processes by some kind of central limit theorem effect.

What is the way to compress a dictionary?

The algorithm appends only one symbol at a time to each dictionary entry to form new dictionary entries, thus the phrases it finds are not sufficiently long to have excellent compression.

What is the standard deviation of z under the null?

As expected, there is no indication of time asymmetry in u or z, and the standard deviation of z under the null is close to unity.

What is the shortest segment in the input sequence?

The Lempel-Ziv [11] dictionary compression algorithm sequentially parses the input symbol sequence from left to right, at each step finding the longest segment in the remaining input which already exists in a dictionary of codewords [21].

What is the reversibility test for squaring?

As squaring is a nonmonotonic transformation, these data would reject the null with this sort of surrogate data method, but here the reversibility test correctly recognizes the data as being in the null class.

What is the way to break up a data set?

In that case, the data set ought to be broken up into more, shorter, interleaved training and test sets, accumulated and repeated.

What is the effect of changing alphabet size?

Figure 6 shows the effect of changing alphabets: with significant irreversibility, increasing alphabet size improved detecting it, but if irreversibility were minimal, the alphabet size was unimportant.

What is the entropy rate of the two representations?

The Shannon entropy rates fhSsM1d <0.5623 bits/ symbol,hSsM2d<0.7602 bit/ symbolg of the two representations are identical, as there is the same amount of uncertainty about the next state and the same invariant density.

(Open Access) Testing time symmetry in time series using data compression dictionaries. (2004) | Matthew B. Kennel

Q: What is the standard deviation of z for the shortest data sets?

For the larger data sets the standard deviation of z is near unity and distribution of pk is uniform, but for the shortest data sets, N=250, the standard deviation of z is less than 1, i.e., there is somewhat of a central ten-dency in the pk.

Q: What is the length of the string?

The dictionary is initialized with A length-one strings, each comprising each unique symbol in the alphabet of size A. Absent a priori bounds on the maximum size of the integers, the length, in bits, of the compressed stream is proportional to n log2 n with n the number of phrases.

UC San Diego

UC San Diego Previously Published Works

Title

Testing time symmetry in time series using data compression dictionaries

Permalink

https://escholarship.org/uc/item/031564f1

Journal

Physical Review E, 69(5)

ISSN

1063-651X

Author

Kennel, Matthew B

Publication Date

2004-05-01

Peer reviewed

eScholarship.org Powered by the California Digital Library

University of California

Testing time symmetry in time series using data compression dictionaries

Matthew B. Kennel

Institute For Nonlinear Science, University of California, San Diego, La Jolla, California 92093-0402, USA

(Received 21 July 2003; published 14 May 2004; publisher error corrected 20 May 2004

)

Time symmetry, often called statistical time reversibility, in a dynamical process means that any segment of

time-series output has the same probability of occurrence in the process as its time reversal. A technique, based

on symbolic dynamics, is proposed to distinguish such symmetrical processes from asymmetrical ones, given

a time-series observation of the otherwise unknown process. Because linear stochastic Gaussian processes, and

static nonlinear transformations of them, are statistically reversible, but nonlinear dynamics such as dissipative

chaos are usually statistically irreversible, a test will separate large classes of hypotheses for the data. A

general-purpose and robust statistical test procedure requires adapting to arbitrary dynamics which may have

signiﬁcant time correlation of undetermined form. Given a symbolization of the observed time series, the

technology behind adaptive dictionary data compression algorithms offers a suitable estimate of reversibility,

as well as a statistical likelihood test. The data compression methods create approximately independent seg-

ments permitting a simple and direct null test without resampling or surrogate data. We demonstrate the results

on various time-series-reversible and irreversible systems.

DOI: 10.1103/PhysRevE.69.056208 PACS number(s): 05.45.Tp

I. INTRODUCTION

A well-known issue in the analysis of observed data is to

distinguish colored noise produced from a Gaussian linear

process from data produced from nonlinear sources. The

tools of traditional, linear, signal processing and time-series

statistics, power spectra, transfer functions, autoregressive

modeling, etc., often fail in such cases when their assump-

tions are violated; but when these assumptions are fulﬁlled

they are often provably optimal.

The technique [1–3] most commonly employed for this

task is to generate Monte Carlo simulations of “surrogate

data,” a linear Gaussian noisy data set, with similar charac-

teristics (e.g., power spectrum, autocorrelation, or autore-

gressive coefﬁcients) as the original data and compare the

original and surrogates on some statistic of the user’s choice

which is sensitive to various nonlinear features. This method

is quite general but there are a number of subtle and tricky

technical issues [4–7] which are not always appreciated, and

it may be computationally intensive.

Testing for time asymmetry (e.g., Ref. [8,9] and their ref-

erences) is a useful alternative to surrogate methods for dis-

tinguishing linear noise and static nonlinear transformations

thereof from nonlinear dynamics. This idea relies on the fact

that a stationary linear Gaussian stochastic process is statis-

tically time symmetrical, also often called time reversible: the

literal time reverse of the observed series would have the

same probability to be emitted from the source as the ob-

served one [10]. Any ﬁxed static nonlinear transformation of

such a process—including nonmonotonic transformations

which have proven to be problematic in the surrogate-data

method [7]—stays time reversible. Importantly for this work,

one such transformation is the symbolization or discretization

of a continuous state space to a coarse alphabet of a small

number of symbols. Dissipative chaos, by contrast, will pro-

duce a statistically time-irreversible signal as the creation of

information via instability in the time-forward direction is

distinct from the destruction of past state information via

dissipation. The meaning of statistical “irreversibility” used

herein is not exactly the same as the “irreversibility” of

physical processes in the traditional thermodynamic sense.

Herein, we assume that the measured process is already in its

statistically stationary condition, and use “reversibility” in its

statistical sense: and the word “reversible” is a synonym for

“time symmetrical.”

This work does not give an explicit description of the

“null hypothesis” (e.g., a linear Gaussian process) as would

be done with a parametric estimate for the entire process, i.e.,

it is not feasible to directly evaluate the two likelihoods for

seeing the observed set in its original orientation and its

time-reversed orientation. With the usual requirements of sta-

tionarity and the absence of very long time dependence, one

may empirically estimate the likelihood of reversible dynam-

ics by looking at statistics of short-term segments from the

data set, using ergodicity in the usual way so that a single

long observed data set provides an ensemble. Our goal in-

cludes not merely a number quantifying the amount of time

asymmetry, but a statistical test procedure with a null hy-

pothesis and p value for rejection of the null. The generic

complication is that general dynamics, linear or nonlinear,

can possess rather arbitrary serial dependence. We want ad-

ditionally a general procedure which requires as few assump-

tions about the structure of the dynamics as possible. The

common theme is to try to construct a test out of sufﬁciently

independent elements so that the assumptions of classical

statistical test procedures hold.

Daw et al. [9] suggested using the observed frequency of

symbolic words formed from nearby symbols as seen in the

forward and reverse directions. For instance, in a binary al-

phabet, if a word length of 5 and a time delay of 1 were

chosen, then one would accumulate the observed frequency

of 11001, and its time reverse 10011, as the word window

Electronic address: mkennel@ucsd.edu

PHYSICAL REVIEW E 69, 056208 (2004)

slid incrementally over the symbolized observed data. The

assumption under the null hypothesis of time symmetry is

that the observed frequencies came from an equiprobable

distribution which could be tested with a simple binomial

test. This would be done for all nonpalindromic pairs of

words of a ﬁxed length, and the results of tests on all words

combined. The difﬁculty comes in serial correlation which

can make the assumption of independent observations in the

binomial test incorrect, and the statistical dependence in the

combination of results from many pairs. The ﬁrst was ame-

liorated with a decorrelation window and additional correla-

tion test, but the second does not have a clear solution. The

appropriate word length is also an undesirable free param-

eter. As usual, short words provide a more accurate estima-

tion of probabilities (high counts) but may improperly aver-

age over different dynamics which would be more visible

with longer words. This work proposes a different method,

adapting techniques from data compression, to rectify all

these issues. It provides approximately independent quanti-

ties for a statistical test as well as automatic word-length

selection.

II. ADAPTIVE DICTIONARY-BASED TIME-SYMMETRY

TESTING

The Lempel-Ziv [11] dictionary compression algorithm

sequentially parses the input symbol sequence from left to

right, at each step ﬁnding the longest segment in the remain-

ing input which already exists in a dictionary of codewords

[21]. Then a new codeword, consisting of the longest exist-

ing match concatenated with the next subsequent symbol in

the input, is added to the dictionary [12]. An index for the

codeword which was originally located and the subsequent

symbol are emitted. The input pointer is advanced by the

length of the codeword just added plus one. The compressed

output is a sequence of pairs of codeword indices and the

additional symbol: 共w

兲共w

兲¯ 共w

兲. The dictionary is

initialized with A length-one strings, each comprising each

unique symbol in the alphabet of size A. Absent a priori

bounds on the maximum size of the integers, the length, in

bits, of the compressed stream is proportional to n log

with n the number of phrases. This compression is universal:

the length of the compressed sequence divided by the length

of the input will asymptotically approach the Shannon en-

tropy rate (the best possible compression rate) for almost any

source, meaning that the method is guaranteed to learn char-

acteristics of the source. Frequently occurring sequences

generate longer dictionary entries whose codeword indices

(represented as integers) may be transmitted more compactly

than their plaintexts.

One may parse a new sequence relative to a given ﬁxed

dictionary, for instance, that obtained after compressing an-

other sequence as previously discussed. The longest code-

word in the dictionary which is a preﬁx of the remaining

input is identiﬁed and emitted. The input pointer is advanced

by the length of this codeword. This is like compressing the

latter half of a sequence except that the adaptation (adding

new phrases to the dictionary) is not performed. Fundamen-

tal results in information theory [13,14] imply that when the

parsed sequence arises from the same information source

which produced the sequence used to train the dictionary, it

will nearly always take fewer bits (and phrases) than a pars-

ing using a dictionary trained on a different source. This

statement is technically only true asymptotically but in prac-

tice exceptions grow exponentially unlikely for mixing

sources. This property concerning the relative entropy was

recently used to distinguish and categorize natural languages

from only representative samples of their texts [15], although

there the slightly different algorithm was used and adaptation

to the second sequence continued during its parsing, lower-

ing the discrimination power somewhat.

We use this fact to test for time symmetry by comparing

the compression performance using dictionaries which were

trained on normal, and time-reversed, examples. There are

many possible speciﬁc ways one could consider using com-

pression to see if there is a difference, for example, parse a

test sequence completely by the two different dictionaries

and see which dictionary emits the fewest phrases, or, per-

haps, looking at the statistical distribution of the lengths of

words emitted. The following statistic and test, though, was

powerful in detecting irreversibility, the relatively easy task,

as well as having a good calibration of the null hypothesis

under various diverse instantiations of reversible dynamics,

which is the more difﬁcult requirement.

Consider for a moment the generic problem of sequen-

tially parsing a sequence S with respect to two dictionaries

simultaneously. At each step, there is a longest match-

ing codeword for each individual dictionary. Of those two,

either the ﬁrst dictionary provides the longest match, or the

second does, or the lengths are tied (both dictionaries pro-

vide the same codeword). The input is advanced by the

length of the longest match. We deﬁne our notation as fol-

lows: n

共S兩D

兲 is the count of number of times the

ﬁrst dictionary (D

) provided the codeword, and similarly for

共S兩D

兲; accumulating the counts the second was

the best match. The number of ties is discarded. The two

counts C

and C

are computed simultaneously for identical

arguments. For our purposes, no actual literal compressed

output is necessary, merely the accumulation of these counts.

The key notion is that since dictionary-based universal com-

pression attempts to make approximately independent code-

words, the “observation” of a parsed phrase is as if it were

nearly an independent event in a renewal-type process. This

assumption of independence (which will be tested empiri-

cally) justiﬁes simple classical statistical tests.

Specializing to the problem at hand, the key idea is to

parse a test sequence with respect to dictionaries which were

constructed on either forward or backward versions of a dif-

ferent training sequence. If the data are reversible, then either

of those dictionaries is as good as the other, statistically, in

providing longest matches and hence, on average gives as

good compression as the other. Moreover, the assumption is

that in time symmetry the distribution of “which dictionary

provides a superior match here” is an independent Bernoulli

binary random variable with equal probability, and thus the

accumulated counts would be distributed like Poisson ran-

dom variables.

Divide the input sequence S into its two contiguous halves

共S

and S

兲, create literal time-reversed versions of them

MATTHEW B. KENNEL PHYSICAL REVIEW E 69, 056208 (2004)

056208-2

共R

and R

兲, and create four dictionaries

共D

,and D

兲 using the Lempel-Ziv construction

as before. Parse each of the four sequences with respect to

the the two dictionaries trained on the other half of the data.

Accumulate the total number of same-direction (n

) matches,

= C

共S

兩D

兲 + C

共R

兩D

兲 + C

共S

兩D

兲

+ C

共R

兩D

兲, 共1兲

and different-direction 共n

兲 matches,

= C

共S

兩D

兲 + C

共R

兩D

兲 + C

共S

兩D

兲

+ C

共R

兩D

兲. 共2兲

With n=n

, deﬁne the time-symmetry statistic

␪

− n

. 共3兲

Under the null hypothesis,

␪

→ 0asn→ ⬁. For n艌25, the

null distribution of

z共

␪

,n兲 = n

1/2

冉

␪

−

sgn共

␪

兲

冊

共4兲

is well approximated by a zero-mean unit-variance Gaussian

[16], with an associated upper tail probability p共z兲

erfc共z/

冑

2兲. For smaller n the exact binomial tail probabil-

ity should be used. When the sequence comes from an irre-

versible source, there will typically be a larger fraction of

same-direction matches, hence positive

␪

. Observing

␪

⬎0

with corresponding p共z兲⬍

␣

implies a rejection of time sym-

metry with the given level of signiﬁcance. This test is one

sided since irreversibility should [17] increase n

relative to

III. PERFORMANCE ON VARIOUS DATA SETS

The quality of any statistical test is governed by two is-

sues: how close the actual distribution matches the assumed

null distribution with data from the null class, and how well

the test is able to detect violations of that null. In particular,

the null hypothesis of the time-symmetry test is ﬂagrantly

composite, encompassing a wide variety of reversible sym-

bol streams. The justiﬁcation for the test procedure is intu-

itively appealing—that compression automatically yields in-

dependent segments—but admittedly not rigorously proven.

The success of this assertion is tested empirically by com-

puting the statistic on ensembles of data sets taken from in-

puts known to be statistically reversible. Take an ensemble of

M data sets from a reversible data class and compute

␪

and

=p共z

兲 for k=1,...,M. If the data are reversible and the

test assumptions are fulﬁlled, the p

ought to be as if drawn

from the uniform distribution on 关0,1兴, or equivalently, the

empirical cumulative distribution of p

, C共p

兲, ought to con-

verge with increasing M to a straight line, plotting C共p

兲

versus p

. Similarly, over ensembles the standard deviation

of z ought to tend towards one in the null class.

We ﬁrst demonstrate on seemingly trivial data, white in-

dependent symbols. Figure 1 shows results of Monte Carlo

simulations on these data. As expected, there is no indication

of time asymmetry in

␪

or z, and the standard deviation of z

under the null is close to unity.

Next, we consider time-symmetrical dynamical data.

These were generated from samples of the logistic map,

n+1

=1−ax

in a generic chaotic regime (a=1.8). By itself x

is certainly time-asymmetrical chaotic dynamics. We take

two independent samples of length N from the map, x

i;1

i;2

and form the mixture

FIG. 1. (Color online) Summary statistics for white equiproba-

ble symbols. There were 200 data sets drawn for each data set size,

N=200,2500,25 000 (red circle, blue diamond, black square), and

the reversibility statistics

␪

and z were evaluated for each. Top: 具

␪

典,

the ensemble average (arb. units), and its standard deviation. Bot-

tom: 具z典 (arb. units), and its standard deviation.

FIG. 2. (Color online) Top: summary statistics for a reversible

mixture of logistic map dynamics. Symbolization was by equal-

probability bins with 兩A兩 from 2 to 6. There were 200 data sets

drawn for each data set size, N=200,2500,25 000 (red circle, blue

diamond, black square), and the reversibility statistics

␪

and z were

evaluated for each. Top: 具

␪

典, the ensemble average (arb. units), and

its standard deviation. Bottom: 具z典 (arb. units) and its standard de-

viation. x axis is the size of the alphabet.

TESTING TIME SYMMETRY IN TIME SERIES USING… PHYSICAL REVIEW E 69, 056208 (2004)

056208-3

= x

i;1

␣

N−i;2

. 共5兲

When

␣

=1 the time series y

is statistically reversible by

construction; lower values of

␣

give increasingly irreversible

data. Figure 2 shows results over ensembles of M=200

samples of the reversible time series, each symbolized with

varying small alphabets with equal-probability histograms.

The statistic shows no time asymmetry, and the distribution

of p

is statistically close to uniform (see Table I), which is

desirable for a correct null test.

Figure 3 shows a sample of a time series and its power

spectrum from an arbitrarily constructed linear, Gaussian,

and hence time-symmetrical [10], stochastic process. The top

panel of Fig. 4 shows summary results on ensembles mea-

suring reversibility on sample time series of varying size,

analogously to Fig. 2. For the larger data sets the standard

deviation of z is near unity and distribution of p

is uniform,

but for the shortest data sets, N=250, the standard deviation

of z is less than 1, i.e., there is somewhat of a central ten-

dency in the p

. What is happening here is that the training

sets are so short (each 125 symbols) that the dictionary built

from observations is not sufﬁciently good to remove visible

correlation. This is not unexpected as dictionary compression

learns with increasing data. The total number of phrase

matches n=n

used in the statistic is very small, even

being as low as 10–20 for some of the samples. Neverthe-

less, the test is only slightly conservative, and data from

system would not be characterized incorrectly as irreversible.

The lower panel shows results on the square of the same

process. The stochastic time series, which has mean zero, is

FIG. 3. (Color online) Top: sample time series from a discrete

linear Gaussian process, constructed by a bandpass ﬁlter of an in-

dependent random Gaussian process. y axis is signal value (arb.

units), x axis is sample number in integer-valued time. Bottom:

power spectral density vs frequency (in units of the sampling

frequency).

FIG. 4. (Color online) Summary statistics for linear Gaussian

process, and square of that process. Top: 具z典 (arb. units) ± standard

deviation for N=250,2500,25 000 on linear process. Bottom: 具z典

(arb. units) ± standard deviation for square of that process, i.e., a

nonmonotonic static nonlinear transformation of a reversible

process.

FIG. 5. (Color online) Time-asymmetry statistic z on M=200

sets of points from a mixture of logistic map time series. The x axis

shows the mixing coefﬁcient

␣

(

␣

=1 is reversible) and y axis is 具z典

(arb. units) with bars displaying the sample standard deviation on

the ensemble. Curves from bottom to top show N=250, N=2500,

N=25 000. Each data set was partitioned at A=3 with equal prob-

ability histograms.

TABLE I. For the ensembles in Fig. 2. Kolmogorov-Smirnov

test p values comparing the observed distribution of p

to the uni-

form distribution in 关0,1兴. Only the values for A=3 and N

=250,2500 appear to be signiﬁcant. These apparent rejections are

spurious and disappear in a different ensemble, being 0.175 and

0.713, respectively. There is no signiﬁcant evidence that the p

are

distributed nonuniformly, showing a good calibration of the statistic

under this instantiation of the null hypothesis.

Alphabet N=250 N=2500 N=25 000

2 0.218 0.457 0.0569

3 0.00335 0.0103 0.645

4 0.0326 0.522 0.303

5 0.332 0.349 0.386

6 0.0383 0.148 0.407

MATTHEW B. KENNEL PHYSICAL REVIEW E 69, 056208 (2004)

056208-4

Testing time symmetry in time series using data compression dictionaries.

Figures

Citations

Towards parameter-free data mining

Complex network approaches to nonlinear time series analysis

Time series irreversibility: a visibility graph approach

Time series irreversibility: a visibility graph approach

Compression-based data mining of sequential data

References

Elements of information theory

Compression of individual sequences via variable-rate coding

Testing for nonlinearity in time series: the method of surrogate data

Testing for nonlinearity in time series: The method of surrogate data

An Introduction to Symbolic Dynamics and Coding

Related Papers (5)

Broken asymmetry of the human heartbeat: loss of time irreversibility in aging and disease.

Time-reversibility of linear stochastic processes

Multiscale Analysis of Heart Rate Dynamics: Entropy and Time Irreversibility Measures

From time series to complex networks: The visibility graph

Horizontal visibility graphs: Exact results for random time series

Frequently Asked Questions (17)

Q1. What is the main explanation for the cycle-to-cycle variability?

Q2. What is the standard deviation of z for the shortest data sets?

Q3. What is the reversible representation of the symbolic sequences?

Q4. What is the main experimental parameter for the spark ignition engine?

Q5. What is the implicit-state version of the graph?

Q6. What is the effective experimental parameter for the diesel data?

Q7. What is the length of the string?

Q8. What is the language of theoretical symbolic dynamics?

Q9. What is the likely explanation for the low-dimensional chaos?

Q10. What is the way to compress a dictionary?

Q11. What is the standard deviation of z under the null?

Q12. What is the difficult requirement to test for time symmetry?

Q13. What is the shortest segment in the input sequence?

Q14. What is the reversibility test for squaring?

Q15. What is the way to break up a data set?

Q16. What is the effect of changing alphabet size?

Q17. What is the entropy rate of the two representations?