scispace - formally typeset
Open AccessJournal ArticleDOI

Testing time symmetry in time series using data compression dictionaries.

Matthew B. Kennel
- 01 May 2004 - 
- Vol. 69, Iss: 5, pp 056208-056208
Reads0
Chats0
TLDR
Given a symbolization of the observed time series, the technology behind adaptive dictionary data compression algorithms offers a suitable estimate of reversibility, as well as a statistical likelihood test, which creates approximately independent segments permitting a simple and direct null test without resampling or surrogate data.
Abstract
Time symmetry, often called statistical time reversibility, in a dynamical process means that any segment of time-series output has the same probability of occurrence in the process as its time reversal. A technique, based on symbolic dynamics, is proposed to distinguish such symmetrical processes from asymmetrical ones, given a time-series observation of the otherwise unknown process. Because linear stochastic Gaussian processes, and static nonlinear transformations of them, are statistically reversible, but nonlinear dynamics such as dissipative chaos are usually statistically irreversible, a test will separate large classes of hypotheses for the data. A general-purpose and robust statistical test procedure requires adapting to arbitrary dynamics which may have significant time correlation of undetermined form. Given a symbolization of the observed time series, the technology behind adaptive dictionary data compression algorithms offers a suitable estimate of reversibility, as well as a statistical likelihood test. The data compression methods create approximately independent segments permitting a simple and direct null test without resampling or surrogate data. We demonstrate the results on various time-series-reversible and irreversible systems.

read more

Content maybe subject to copyright    Report

UC San Diego
UC San Diego Previously Published Works
Title
Testing time symmetry in time series using data compression dictionaries
Permalink
https://escholarship.org/uc/item/031564f1
Journal
Physical Review E, 69(5)
ISSN
1063-651X
Author
Kennel, Matthew B
Publication Date
2004-05-01
Peer reviewed
eScholarship.org Powered by the California Digital Library
University of California

Testing time symmetry in time series using data compression dictionaries
Matthew B. Kennel
*
Institute For Nonlinear Science, University of California, San Diego, La Jolla, California 92093-0402, USA
(Received 21 July 2003; published 14 May 2004; publisher error corrected 20 May 2004
)
Time symmetry, often called statistical time reversibility, in a dynamical process means that any segment of
time-series output has the same probability of occurrence in the process as its time reversal. A technique, based
on symbolic dynamics, is proposed to distinguish such symmetrical processes from asymmetrical ones, given
a time-series observation of the otherwise unknown process. Because linear stochastic Gaussian processes, and
static nonlinear transformations of them, are statistically reversible, but nonlinear dynamics such as dissipative
chaos are usually statistically irreversible, a test will separate large classes of hypotheses for the data. A
general-purpose and robust statistical test procedure requires adapting to arbitrary dynamics which may have
significant time correlation of undetermined form. Given a symbolization of the observed time series, the
technology behind adaptive dictionary data compression algorithms offers a suitable estimate of reversibility,
as well as a statistical likelihood test. The data compression methods create approximately independent seg-
ments permitting a simple and direct null test without resampling or surrogate data. We demonstrate the results
on various time-series-reversible and irreversible systems.
DOI: 10.1103/PhysRevE.69.056208 PACS number(s): 05.45.Tp
I. INTRODUCTION
A well-known issue in the analysis of observed data is to
distinguish colored noise produced from a Gaussian linear
process from data produced from nonlinear sources. The
tools of traditional, linear, signal processing and time-series
statistics, power spectra, transfer functions, autoregressive
modeling, etc., often fail in such cases when their assump-
tions are violated; but when these assumptions are fulfilled
they are often provably optimal.
The technique [1–3] most commonly employed for this
task is to generate Monte Carlo simulations of “surrogate
data,” a linear Gaussian noisy data set, with similar charac-
teristics (e.g., power spectrum, autocorrelation, or autore-
gressive coefficients) as the original data and compare the
original and surrogates on some statistic of the users choice
which is sensitive to various nonlinear features. This method
is quite general but there are a number of subtle and tricky
technical issues [4–7] which are not always appreciated, and
it may be computationally intensive.
Testing for time asymmetry (e.g., Ref. [8,9] and their ref-
erences) is a useful alternative to surrogate methods for dis-
tinguishing linear noise and static nonlinear transformations
thereof from nonlinear dynamics. This idea relies on the fact
that a stationary linear Gaussian stochastic process is statis-
tically time symmetrical, also often called time reversible: the
literal time reverse of the observed series would have the
same probability to be emitted from the source as the ob-
served one [10]. Any fixed static nonlinear transformation of
such a process—including nonmonotonic transformations
which have proven to be problematic in the surrogate-data
method [7]—stays time reversible. Importantly for this work,
one such transformation is the symbolization or discretization
of a continuous state space to a coarse alphabet of a small
number of symbols. Dissipative chaos, by contrast, will pro-
duce a statistically time-irreversible signal as the creation of
information via instability in the time-forward direction is
distinct from the destruction of past state information via
dissipation. The meaning of statistical “irreversibility” used
herein is not exactly the same as the “irreversibility” of
physical processes in the traditional thermodynamic sense.
Herein, we assume that the measured process is already in its
statistically stationary condition, and use “reversibility” in its
statistical sense: and the word “reversible” is a synonym for
“time symmetrical.”
This work does not give an explicit description of the
“null hypothesis” (e.g., a linear Gaussian process) as would
be done with a parametric estimate for the entire process, i.e.,
it is not feasible to directly evaluate the two likelihoods for
seeing the observed set in its original orientation and its
time-reversed orientation. With the usual requirements of sta-
tionarity and the absence of very long time dependence, one
may empirically estimate the likelihood of reversible dynam-
ics by looking at statistics of short-term segments from the
data set, using ergodicity in the usual way so that a single
long observed data set provides an ensemble. Our goal in-
cludes not merely a number quantifying the amount of time
asymmetry, but a statistical test procedure with a null hy-
pothesis and p value for rejection of the null. The generic
complication is that general dynamics, linear or nonlinear,
can possess rather arbitrary serial dependence. We want ad-
ditionally a general procedure which requires as few assump-
tions about the structure of the dynamics as possible. The
common theme is to try to construct a test out of sufficiently
independent elements so that the assumptions of classical
statistical test procedures hold.
Daw et al. [9] suggested using the observed frequency of
symbolic words formed from nearby symbols as seen in the
forward and reverse directions. For instance, in a binary al-
phabet, if a word length of 5 and a time delay of 1 were
chosen, then one would accumulate the observed frequency
of 11001, and its time reverse 10011, as the word window
*
Electronic address: mkennel@ucsd.edu
PHYSICAL REVIEW E 69, 056208 (2004)
1539-3755/2004/69(5)/056208(9)/$22.50 ©2004 The American Physical Society69 056208-1

slid incrementally over the symbolized observed data. The
assumption under the null hypothesis of time symmetry is
that the observed frequencies came from an equiprobable
distribution which could be tested with a simple binomial
test. This would be done for all nonpalindromic pairs of
words of a fixed length, and the results of tests on all words
combined. The difficulty comes in serial correlation which
can make the assumption of independent observations in the
binomial test incorrect, and the statistical dependence in the
combination of results from many pairs. The first was ame-
liorated with a decorrelation window and additional correla-
tion test, but the second does not have a clear solution. The
appropriate word length is also an undesirable free param-
eter. As usual, short words provide a more accurate estima-
tion of probabilities (high counts) but may improperly aver-
age over different dynamics which would be more visible
with longer words. This work proposes a different method,
adapting techniques from data compression, to rectify all
these issues. It provides approximately independent quanti-
ties for a statistical test as well as automatic word-length
selection.
II. ADAPTIVE DICTIONARY-BASED TIME-SYMMETRY
TESTING
The Lempel-Ziv [11] dictionary compression algorithm
sequentially parses the input symbol sequence from left to
right, at each step finding the longest segment in the remain-
ing input which already exists in a dictionary of codewords
[21]. Then a new codeword, consisting of the longest exist-
ing match concatenated with the next subsequent symbol in
the input, is added to the dictionary [12]. An index for the
codeword which was originally located and the subsequent
symbol are emitted. The input pointer is advanced by the
length of the codeword just added plus one. The compressed
output is a sequence of pairs of codeword indices and the
additional symbol: w
1
s
1
兲共w
2
s
2
¯ w
n
s
n
. The dictionary is
initialized with A length-one strings, each comprising each
unique symbol in the alphabet of size A. Absent a priori
bounds on the maximum size of the integers, the length, in
bits, of the compressed stream is proportional to n log
2
n
with n the number of phrases. This compression is universal:
the length of the compressed sequence divided by the length
of the input will asymptotically approach the Shannon en-
tropy rate (the best possible compression rate) for almost any
source, meaning that the method is guaranteed to learn char-
acteristics of the source. Frequently occurring sequences
generate longer dictionary entries whose codeword indices
(represented as integers) may be transmitted more compactly
than their plaintexts.
One may parse a new sequence relative to a given fixed
dictionary, for instance, that obtained after compressing an-
other sequence as previously discussed. The longest code-
word in the dictionary which is a prefix of the remaining
input is identified and emitted. The input pointer is advanced
by the length of this codeword. This is like compressing the
latter half of a sequence except that the adaptation (adding
new phrases to the dictionary) is not performed. Fundamen-
tal results in information theory [13,14] imply that when the
parsed sequence arises from the same information source
which produced the sequence used to train the dictionary, it
will nearly always take fewer bits (and phrases) than a pars-
ing using a dictionary trained on a different source. This
statement is technically only true asymptotically but in prac-
tice exceptions grow exponentially unlikely for mixing
sources. This property concerning the relative entropy was
recently used to distinguish and categorize natural languages
from only representative samples of their texts [15], although
there the slightly different algorithm was used and adaptation
to the second sequence continued during its parsing, lower-
ing the discrimination power somewhat.
We use this fact to test for time symmetry by comparing
the compression performance using dictionaries which were
trained on normal, and time-reversed, examples. There are
many possible specific ways one could consider using com-
pression to see if there is a difference, for example, parse a
test sequence completely by the two different dictionaries
and see which dictionary emits the fewest phrases, or, per-
haps, looking at the statistical distribution of the lengths of
words emitted. The following statistic and test, though, was
powerful in detecting irreversibility, the relatively easy task,
as well as having a good calibration of the null hypothesis
under various diverse instantiations of reversible dynamics,
which is the more difficult requirement.
Consider for a moment the generic problem of sequen-
tially parsing a sequence S with respect to two dictionaries
D
1
,D
2
simultaneously. At each step, there is a longest match-
ing codeword for each individual dictionary. Of those two,
either the first dictionary provides the longest match, or the
second does, or the lengths are tied (both dictionaries pro-
vide the same codeword). The input is advanced by the
length of the longest match. We define our notation as fol-
lows: n
1
=C
1
SD
1
,D
2
is the count of number of times the
first dictionary (D
1
) provided the codeword, and similarly for
n
2
=C
2
SD
1
,D
2
; accumulating the counts the second was
the best match. The number of ties is discarded. The two
counts C
1
and C
2
are computed simultaneously for identical
arguments. For our purposes, no actual literal compressed
output is necessary, merely the accumulation of these counts.
The key notion is that since dictionary-based universal com-
pression attempts to make approximately independent code-
words, the “observation” of a parsed phrase is as if it were
nearly an independent event in a renewal-type process. This
assumption of independence (which will be tested empiri-
cally) justifies simple classical statistical tests.
Specializing to the problem at hand, the key idea is to
parse a test sequence with respect to dictionaries which were
constructed on either forward or backward versions of a dif-
ferent training sequence. If the data are reversible, then either
of those dictionaries is as good as the other, statistically, in
providing longest matches and hence, on average gives as
good compression as the other. Moreover, the assumption is
that in time symmetry the distribution of “which dictionary
provides a superior match here” is an independent Bernoulli
binary random variable with equal probability, and thus the
accumulated counts would be distributed like Poisson ran-
dom variables.
Divide the input sequence S into its two contiguous halves
S
1
and S
2
, create literal time-reversed versions of them
MATTHEW B. KENNEL PHYSICAL REVIEW E 69, 056208 (2004)
056208-2

R
1
and R
2
, and create four dictionaries
D
S1
,D
S2
,D
R1
,and D
R2
using the Lempel-Ziv construction
as before. Parse each of the four sequences with respect to
the the two dictionaries trained on the other half of the data.
Accumulate the total number of same-direction (n
s
) matches,
n
s
= C
1
S
2
D
S1
,D
R1
+ C
1
R
2
D
R1
,D
S1
+ C
1
S
1
D
S2
,D
R2
+ C
1
R
1
D
R2
,D
S2
, 1
and different-direction n
d
matches,
n
d
= C
2
S
2
D
S1
,D
R1
+ C
2
R
2
D
R1
,D
S1
+ C
2
S
1
D
S2
,D
R2
+ C
2
R
1
D
R2
,D
S2
. 2
With n=n
s
+n
d
, define the time-symmetry statistic
ˆ
=
n
s
n
d
n
. 3
Under the null hypothesis,
ˆ
0asn . For n25, the
null distribution of
z
ˆ
,n = n
1/2
ˆ
1
2n
sgn
ˆ
4
is well approximated by a zero-mean unit-variance Gaussian
[16], with an associated upper tail probability pz
=
1
2
erfcz/
2. For smaller n the exact binomial tail probabil-
ity should be used. When the sequence comes from an irre-
versible source, there will typically be a larger fraction of
same-direction matches, hence positive
ˆ
. Observing
ˆ
0
with corresponding pz
implies a rejection of time sym-
metry with the given level of significance. This test is one
sided since irreversibility should [17] increase n
s
relative to
n
d
.
III. PERFORMANCE ON VARIOUS DATA SETS
The quality of any statistical test is governed by two is-
sues: how close the actual distribution matches the assumed
null distribution with data from the null class, and how well
the test is able to detect violations of that null. In particular,
the null hypothesis of the time-symmetry test is flagrantly
composite, encompassing a wide variety of reversible sym-
bol streams. The justification for the test procedure is intu-
itively appealing—that compression automatically yields in-
dependent segments—but admittedly not rigorously proven.
The success of this assertion is tested empirically by com-
puting the statistic on ensembles of data sets taken from in-
puts known to be statistically reversible. Take an ensemble of
M data sets from a reversible data class and compute
ˆ
k
and
p
k
=pz
k
for k=1,...,M. If the data are reversible and the
test assumptions are fulfilled, the p
k
ought to be as if drawn
from the uniform distribution on 0,1, or equivalently, the
empirical cumulative distribution of p
k
, Cp
k
, ought to con-
verge with increasing M to a straight line, plotting Cp
k
versus p
k
. Similarly, over ensembles the standard deviation
of z ought to tend towards one in the null class.
We first demonstrate on seemingly trivial data, white in-
dependent symbols. Figure 1 shows results of Monte Carlo
simulations on these data. As expected, there is no indication
of time asymmetry in
or z, and the standard deviation of z
under the null is close to unity.
Next, we consider time-symmetrical dynamical data.
These were generated from samples of the logistic map,
x
n+1
=1−ax
n
2
in a generic chaotic regime (a=1.8). By itself x
i
is certainly time-asymmetrical chaotic dynamics. We take
two independent samples of length N from the map, x
i;1
,x
i;2
,
and form the mixture
FIG. 1. (Color online) Summary statistics for white equiproba-
ble symbols. There were 200 data sets drawn for each data set size,
N=200,2500,25 000 (red circle, blue diamond, black square), and
the reversibility statistics
and z were evaluated for each. Top:
ˆ
,
the ensemble average (arb. units), and its standard deviation. Bot-
tom: z (arb. units), and its standard deviation.
FIG. 2. (Color online) Top: summary statistics for a reversible
mixture of logistic map dynamics. Symbolization was by equal-
probability bins with A from 2 to 6. There were 200 data sets
drawn for each data set size, N=200,2500,25 000 (red circle, blue
diamond, black square), and the reversibility statistics
and z were
evaluated for each. Top:
ˆ
, the ensemble average (arb. units), and
its standard deviation. Bottom: z (arb. units) and its standard de-
viation. x axis is the size of the alphabet.
TESTING TIME SYMMETRY IN TIME SERIES USING PHYSICAL REVIEW E 69, 056208 (2004)
056208-3

y
i
= x
i;1
+
x
Ni;2
. 5
When
=1 the time series y
i
is statistically reversible by
construction; lower values of
give increasingly irreversible
data. Figure 2 shows results over ensembles of M=200
samples of the reversible time series, each symbolized with
varying small alphabets with equal-probability histograms.
The statistic shows no time asymmetry, and the distribution
of p
k
is statistically close to uniform (see Table I), which is
desirable for a correct null test.
Figure 3 shows a sample of a time series and its power
spectrum from an arbitrarily constructed linear, Gaussian,
and hence time-symmetrical [10], stochastic process. The top
panel of Fig. 4 shows summary results on ensembles mea-
suring reversibility on sample time series of varying size,
analogously to Fig. 2. For the larger data sets the standard
deviation of z is near unity and distribution of p
k
is uniform,
but for the shortest data sets, N=250, the standard deviation
of z is less than 1, i.e., there is somewhat of a central ten-
dency in the p
k
. What is happening here is that the training
sets are so short (each 125 symbols) that the dictionary built
from observations is not sufficiently good to remove visible
correlation. This is not unexpected as dictionary compression
learns with increasing data. The total number of phrase
matches n=n
s
+n
d
used in the statistic is very small, even
being as low as 10–20 for some of the samples. Neverthe-
less, the test is only slightly conservative, and data from
system would not be characterized incorrectly as irreversible.
The lower panel shows results on the square of the same
process. The stochastic time series, which has mean zero, is
FIG. 3. (Color online) Top: sample time series from a discrete
linear Gaussian process, constructed by a bandpass filter of an in-
dependent random Gaussian process. y axis is signal value (arb.
units), x axis is sample number in integer-valued time. Bottom:
power spectral density vs frequency (in units of the sampling
frequency).
FIG. 4. (Color online) Summary statistics for linear Gaussian
process, and square of that process. Top: z (arb. units) ± standard
deviation for N=250,2500,25 000 on linear process. Bottom: z
(arb. units) ± standard deviation for square of that process, i.e., a
nonmonotonic static nonlinear transformation of a reversible
process.
FIG. 5. (Color online) Time-asymmetry statistic z on M=200
sets of points from a mixture of logistic map time series. The x axis
shows the mixing coefficient
(
=1 is reversible) and y axis is z
(arb. units) with bars displaying the sample standard deviation on
the ensemble. Curves from bottom to top show N=250, N=2500,
N=25 000. Each data set was partitioned at A=3 with equal prob-
ability histograms.
TABLE I. For the ensembles in Fig. 2. Kolmogorov-Smirnov
test p values comparing the observed distribution of p
k
to the uni-
form distribution in 0,1. Only the values for A=3 and N
=250,2500 appear to be significant. These apparent rejections are
spurious and disappear in a different ensemble, being 0.175 and
0.713, respectively. There is no significant evidence that the p
k
are
distributed nonuniformly, showing a good calibration of the statistic
under this instantiation of the null hypothesis.
Alphabet N=250 N=2500 N=25 000
2 0.218 0.457 0.0569
3 0.00335 0.0103 0.645
4 0.0326 0.522 0.303
5 0.332 0.349 0.386
6 0.0383 0.148 0.407
MATTHEW B. KENNEL PHYSICAL REVIEW E 69, 056208 (2004)
056208-4

Figures
Citations
More filters
Proceedings ArticleDOI

Towards parameter-free data mining

TL;DR: This work shows that recent results in bioinformatics and computational theory hold great promise for a parameter-free data-mining paradigm, and shows that this approach is competitive or superior to the state-of-the-art approaches in anomaly/interestingness detection, classification, and clustering with empirical tests on time series/DNA/text/video datasets.
Journal ArticleDOI

Complex network approaches to nonlinear time series analysis

TL;DR: An in-depth review of existing approaches of time series networks, covering their methodological foundations, interpretation and practical considerations with an emphasis on recent developments, and emphasizes which fundamental new insights complex network approaches bring into the field of nonlinear time series analysis.
Journal ArticleDOI

Time series irreversibility: a visibility graph approach

TL;DR: A method to measure real-valued time series irreversibility which combines two different tools: the horizontal visibility algorithm and the Kullback-Leibler divergence, which correctly distinguishes between reversible and irreversible stationary time series.
Journal ArticleDOI

Time series irreversibility: a visibility graph approach

TL;DR: In this article, the horizontal visibility algorithm and the Kullback-Leibler divergence are combined to measure real-valued time series irreversibility, and the method correctly distinguishes reversible and irreversible stationary time series.
Journal ArticleDOI

Compression-based data mining of sequential data

TL;DR: This work shows that recent results in bioinformatics, learning, and computational theory hold great promise for a parameter-light data-mining paradigm, and shows that this approach is competitive or superior to many of the state-of-the-art approaches in anomaly/interestingness detection, classification, and clustering with empirical tests on time series/DNA/text/XML/video datasets.
References
More filters
Book

Elements of information theory

TL;DR: The author examines the role of entropy, inequality, and randomness in the design of codes and the construction of codes in the rapidly changing environment.
Journal ArticleDOI

Compression of individual sequences via variable-rate coding

TL;DR: The proposed concept of compressibility is shown to play a role analogous to that of entropy in classical information theory where one deals with probabilistic ensembles of sequences rather than with individual sequences.
Journal ArticleDOI

Testing for nonlinearity in time series: the method of surrogate data

TL;DR: In this article, a statistical approach for identifying nonlinearity in time series is described, which first specifies some linear process as a null hypothesis, then generates surrogate data sets which are consistent with this null hypothesis and finally computes a discriminating statistic for the original and for each of the surrogate sets.

Testing for nonlinearity in time series: The method of surrogate data

TL;DR: A statistical approach for identifying nonlinearity in time series which is demonstrated for numerical data generated by known chaotic systems, and applied to a number of experimental time series, which arise in the measurement of superfluids, brain waves, and sunspots.
Book

An Introduction to Symbolic Dynamics and Coding

TL;DR: Requiring only a undergraduate knowledge of linear algebra, this first general textbook includes over 500 exercises that explore symbolic dynamics as a method to study general dynamical systems.
Related Papers (5)
Frequently Asked Questions (17)
Q1. What is the main explanation for the cycle-to-cycle variability?

Despite a large amount of noise, some form of deterministic nonlinear dynamics is a plausible explanation for the cycle-to-cycle variability. 

For the larger data sets the standard deviation of z is near unity and distribution of pk is uniform, but for the shortest data sets, N=250, the standard deviation of z is less than 1, i.e., there is somewhat of a central ten-dency in the pk. 

Despite the topological equivalence, the presence or absence of probabilistic reversibility in the symbolic sequences becomes reversed by the change in presentation. 

For the spark-ignition engine, the input air-fuel ratio was maintained in stoichiometric conditions, but the proportion of exhaust gas recirculation (EGR) was altered for various runs and was the principal experimental parameter. 

The implicit-state version is a sofic shift with an associated graph and labeling: a distinct binary symbol is emitted depending on which edge is taken on the transition. 

For the Diesel data, the fraction of residual gas remaining from one combustion cycle to the next was estimated with changes in experimental parameters and is the effective experimental parameter. 

The dictionary is initialized with A length-one strings, each comprising each unique symbol in the alphabet of size A. Absent a priori bounds on the maximum size of the integers, the length, in bits, of the compressed stream is proportional to n log2 n with n the number of phrases. 

In the language of theoretical symbolic dynamics [19], the explicit-state version is a presentation of a “vertex shift,” as a symbol is emitted corresponding to each new vertex of the transition graph which is visited, and hence explicitly a shift of finite type (with memory 1) on a three-symbol alphabet. 

Physically what is most likely is that this dynamics is dominated by sufficiently high-dimensional turbulent fluctuations that globally averaged quantities such as the one considered here are effectively indistinguishable from linear processes by some kind of central limit theorem effect. 

The algorithm appends only one symbol at a time to each dictionary entry to form new dictionary entries, thus the phrases it finds are not sufficiently long to have excellent compression. 

As expected, there is no indication of time asymmetry in u or z, and the standard deviation of z under the null is close to unity. 

The following statistic and test, though, was powerful in detecting irreversibility, the relatively easy task, as well as having a good calibration of the null hypothesis under various diverse instantiations of reversible dynamics, which is the more difficult requirement. 

The Lempel-Ziv [11] dictionary compression algorithm sequentially parses the input symbol sequence from left to right, at each step finding the longest segment in the remaining input which already exists in a dictionary of codewords [21]. 

As squaring is a nonmonotonic transformation, these data would reject the null with this sort of surrogate data method, but here the reversibility test correctly recognizes the data as being in the null class. 

In that case, the data set ought to be broken up into more, shorter, interleaved training and test sets, accumulated and repeated. 

Figure 6 shows the effect of changing alphabets: with significant irreversibility, increasing alphabet size improved detecting it, but if irreversibility were minimal, the alphabet size was unimportant. 

The Shannon entropy rates fhSsM1d <0.5623 bits/ symbol,hSsM2d<0.7602 bit/ symbolg of the two representations are identical, as there is the same amount of uncertainty about the next state and the same invariant density.