scispace - formally typeset
Open AccessJournal ArticleDOI

Some equivalences between Shannon entropy and Kolmogorov complexity

S. Leung-Yan-Cheong, +1 more
- 01 May 1978 - 
- Vol. 24, Iss: 3, pp 331-338
Reads0
Chats0
TLDR
It is shown that, for all computable probability distributions, the universal prefix codes associated with the conditional Chaitin complexity have expected codeword length within a constant of the Shannon entropy.
Abstract
It is known that the expected codeword length L_{UD} of the best uniquely decodable (UD) code satisfies H(X)\leqL_{UD} . Let X be a random variable which can take on n values. Then it is shown that the average codeword length L_{1:1} for the best one-to-one (not necessarily uniquely decodable) code for X is shorter than the average codeword length L_{UD} for the best uniquely decodable code by no more than (\log_{2} \log_{2} n)+ 3 . Let Y be a random variable taking on a finite or countable number of values and having entropy H . Then it is proved that L_{1:1}\geq H-\log_{2}(H + 1)-\log_{2}\log_{2}(H + 1 )-\cdots -6 . Some relations are established among the Kolmogorov, Chaitin, and extension complexities. Finally it is shown that, for all computable probability distributions, the universal prefix codes associated with the conditional Chaitin complexity have expected codeword length within a constant of the Shannon entropy.

read more

Content maybe subject to copyright    Report

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL.
1~24,
NO.
3,
MAY
1918
331
[24] A. N. Kolmogorov, On the approximation of distributions of
1271 G. H. Hardy, J. E. Littlewood, and G. Polya, ZnequaIities. New
sums of independent summa
nds by infinitely divisible distribu-
tions, San/&v& vol. 25, pp. 159-174, 1963.
[28]
York and London: Cambridge Univ. Press? 1934.
D. E. Da kin and C. J. Eliezer, Generalization of Holders and
(251 A. Renyi, On the amount of missing information and the
Minkows
&
s inequalities, Proc. Cambridge Phil. Sot., vol. 64, pp.
Neyman-Pearson lemma, in Research Papers in Statistics, David,
1023-1027, 1968.
Ed. New York: Wiley, 1966, pp. 281-288.
[29] L. Kanal, Patterns in
attem recognition, IEEE Trans. Znform.
[26] G. T. Toussaint, Some upper bounds on error probability for
%
multiclass pattern recognition, IEEE Trans. Comput., C-20, pp.
[30]
Theory, vol. IT-20, no. pp. 697-722, 1974.
C. E. Shannon, A mathematical theory of communication, Bell
943-944, 1971.
System Tech. J., vol. 27, pp. 379-423, 623-656, 1948.
Some Equivalences Between Shannon Entropy
and Kolmogorov Complexity
SIK K. LEUNG-YAN-CHEONG,
MEMBER, IEEE,
AND THOMAS M. COVER,
FELLOW, IEEE
Abstmct-It is known that the expected codeword length L,, of the
best uniquely decodable (UD) code satisfies H(X) < L,, < H(X) + 1.
LetXbearandomvariablewhichcantakeonnvalues.Thenitisshown
that the average codeword length L, :, for the best one-to-one (not
necessBluy uniquely decodable) code for X is shorter than the average
codeword length L,, for the best mdquely decodable code by no more
thau (log2 log, n) + 3. Let Y be a random variable taking OII a fiite or
countable number of values and having entropy H. Then it is proved that
L,:,>H-log2 (H+l)-log, log2 (H+l)-... -6. Some relations are
eatahlished amoug the Kolmogorov, Cl&in, and extension complexities.
Finally it is shown that, for all computable probability distributions, the
universal prefix codes associated with the conditional Chaitin complexity
have expected codeword length within a constant of the Shannon entropy.
I.
INTRODUCTION
S
HANNON has shown that the minimal expected
length L of a prefix code for a random variable X
satisfies
H(X)<L<H(X)+l
(1)
where H is the entropy of the random variable. Shannons
restriction of the encoding or description of X to prefix
codes is highly motivated by the implicit assumption that
the descriptions will be concatenated and thus must be
uniquely decodable. Since the set of allowed codeword
lengths is the same for the uniquely decodable and in-
stantaneous codes [ 11, [2], the expected codeword length L
is the same for both sets of codes. Shannons result
follows by assigning codeword length li = [log 1 /piI to the
Manuscript received September 16, 1975; revised September 6, 1977.
This work was sup
under Grants GK- r
rted in part by the National Science Foundation
3250, ENG-10173, and ENG 76-03684, and in part
by the Air Force Office of Scientific Research under Contract
F4462O-74C-0068. This paper was previously presented at the IEEE
~;$onal Symposium on Information Theory, Ithaca, NY, October
S. K. Leung-Yan-Cheong was with Stanford University. He is now
with the Department of Electrical Engineering and Computer Science,
Massachusetts Institute of Technology, Cambndge, MA.
T. M. Cover is with the Department of Electical Engineering and
Statistics, Stanford University, Stanford, CA.
ith outcome of the random variable, where
pi
is the
probability of the ith outcome. Thus the entropy H plays
a fundamental role and may be interpreted as the minimal
expected length of the description of X. The intuition
behind the entropy H is so compelling that it would be
disconcerting if H did not figure prominently in a descrip-
tion of the most efficient coding with respect to other less
constrained coding schemes. In particular we have in
mind one-to-one (1: 1) codes, i.e., codes which assign a
distinct binary codeword to each outcome of the random
variable, without regard to the constraint that concatena-
tions of these descriptions be uniquely decodable. It will
be shown here that H is also a first order approximation
to the minimal expected length of one-to-one codes.
Throughout this paper we use L, :, and Lu, to denote
the average codeword lengths for the best 1: 1 code and
uniquely decodable code, respectively. Since the class of
1: 1 codes contains the class of uniquely decodable codes,
it follows that L, : i < L,,,. We show that L, :, > H-log
log n - 3 where 12 is the number of values that the random
variable X can take on. Perhaps more to the point, we also
show that L,,, > H - log(H + 1) - 0 (log lo&H + 1)).
Thus, to first order, a 1: 1 code allows no more compres-
sion than a uniquely decodable or prefix code.
As a consequence of the work of Kolmogorov and
Chaitin, a notion of the intrinsic descriptive complexity of
a finite object hgs been developed. This is closely related
to the work of Siannon in which the complexity of a class
of objects is defined in terms of the probability distribu-
tion over that class. The complexity measures of
Kolmogorov and Chaitin, together with a new complexity
measure which we call the extension complexity, have
associated with them universal coding schemes. We shall
establish that the universal encoding associated with the
complexity of Chaitin [3] and Willis [6] has an expected
codeword length with respect to any computable probabil-
ity distribution on the set of possible outcomes which is
within a constant of the Shannon entropy, thus connect-
OOlS-9448/78/0500-0331$00.75 01978 IEEE

332
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL.
IT-24,
NO.
3,
MAY
1978
ing the individual complexity measure of Chaitin and
III.
LOWER
BOUNDS
ON
L, :, IN
TERMS OF THE ENTROPY
H
Kolmogorov with the average statistical complexity
measure of Shannon.
In Section II, we consider a random variable which can
take on only a finite. number of values, and we maximize
(L,,- L, : i). In Section III we derive lower bounds on
L, : i in terms of the entropy of a random variable taking
values in a countable set. In Section IV we recall the
definitions of the Kolmogorov and Chaitin complexities
of binary sequences and introduce the notion of an exten-
sion complexity. We then derive some relationships
among these quantities. Finally, in Section V we show
that, for all computable probability distributions, the uni-
versal prefix codes associated with the conditional Chaitin
complexity have expected codeword length within a con-
stant of the Shannon entropy.
II.
MAXIMIZATION OF
(L,, - L, : J
Let X be a random variable (RV) taking on a finite
number of values, i.e.,
With no loss of generality, assume pI >p2 > * * . Zp,,.
Let li, i=1,2;--
,n be the lengths of the codewords in the
best 1: 1 code for encoding the RV X, where 1,. is the
length of the codeword assigned to xi.
Remark: Unless otherwise stated, all logarithms are to
the base 2. The set of available codewords is
{0,1,00,01,10,11$00,001,~~~}.
It is clear that the best 1: 1 code must have I, < Z2 < Zs
< . . . . Thus, by inspection, we have precisely I, = 1, I2 = 1,
I, = 2, ’ ’ ’ )
The objective in this section is to obtain lower bounds
on L, :, in terms of the entropy H of the random variable.
As a first step, we consider transformations of 1: 1 to UD
codes. The random variables considered may take on a
countable number of values.
Some Possible Transformations from I : 1 to UD Codes
The aim here is to find efficient means of transforming
1: 1 codes to UD codes.
Let I,, I,, . . . be the lengths of the codewords for the
best 1: 1 code; assume I, < I, < . - . .
Let
f
be any function such that Z i2-f(h) < 1. Then from
Krafts inequality, the set of lengths
{[f
(li)l} yields
acceptable word lengths for a prefix (or UD) code. If
f
is
integer-valued and Zi2-f(t) > 1, {
f
(li)} cannot yield a
prefix code.
Theorem 2: The following functions represent possible
transformations from 1: 1 to UD codes.
i) f(l,)=I,+a[logI,l+log(z), wherea>l;
ii) f(li)=J+2[log(li+1)];
ii
iii) f(~)=~+[logZi+log(log~)+~~~~ ]+4.
WV
The proof of Theorem 2 follows from verification of the
Kraft inequality for
f
(4) and is given in Appendix B.
We now make use of Theorem 2 to prove some lower
bounds on L, : i in terms of the entropy H.
Theorem 3: The expected length L,: i of the best 1: 1
code satisfies the following lower bounds
(2) i) L,:,>H-a(l+log(H+l))-log(G)
and
wherea>l; (11)
L,,,=~tpi~i=~tpi~log(~+l)]. (3) ii) Ll:+H-210g(H+2);
(12)
iii) L,:,>H-log(H+l)-loglog(H+l)-*-- -6.
We now prove the following theorem which gives an
upperbound on (L,, - L, : J.
(13)
Proof i) From Theorem 2 i) and the fact that the
Theorem I:
expected length for a UD code > H(X), we can write
L, : I> L,
-loglogn-3.
(4)
Proof From (1) we have L,, < H (X,) + 1. Therefore
max(L,,-L,:,)<l+max(H(X>l.-L,:,).
(5)
Noting from (3) that
we can write
(6)
.
(7)
We then use the method of Lagrange multipliers to maxi-
mize the right side of (7). The proof is completed by using
(5). Details of the proof are given in Appendix A.
E(I+a[logIl+c)> H,
2-1
where a> 1, c=log -
( )
2-2 *
Therefore El+a(l+E log I)+c>H where El=L,:,.
From Jensens inequality and the convexity of -log I, we
have El+a+a log El+c>H. But El<H+l, since I
corresponds to the best 1: 1 code which is certainly better
than the best prefix code, and we know that the expected
length for the best prefix code is less than (H + 1). Thus
E/>H-a(l+log(H+l))-log(s)
ii) From Theorem 2 ii) and the fact that L,, > H, we
have
E(l+211og (I+ l)]) > H,
E1+2E log (I+ 1) > H.

LEUNG-YAN-CHEONG AND COVER: ENTROPY AND KOLMOGOROV COMPLEXITY
333
By Jensens inequality, El + 2 log(El+ 1) > H. But El < H
We now introduce a new complexity measure that is
+ 1 as before. Thus
useful in prediction and inference.
El+2 log (H+2) > H
Definition: Let U: (0, l}*+{O, l}* be a partial recursive
I,,,, > H-2 log (H+2).
function with a prefix domain. Then the extension com-
plexity of a binary sequence x with respect to U is defined
iii) From Theorem 2 iii) and the fact that Lo, > H, we
by
have
J%(x)= ,(g& l(P)
(22)
E(I+[logI+log(logI)+e..]+4)>H.
(14
where U(p)> x means that U(p) is an extension of x, or
Thus
equivalently that x is a prefix of U(p).
E(l+logl+log(logI)+... +4)>H.
(15)
Definition: Given a complexity measure C,* : Q+N
where Q is countable and B is a partial recursive function,
Definition: For convenience we will define the function
we say that C* is universal if there exists a partial recur-
log* n by
sive function U, such that for any other partial recursive
function A, there exists a constant c such that for all
log*nk logn+loglogn+***,
(16)
w Et& C&(w) < c, (w) + c.
(23)
stopping at the last positive term. Then
It has been shown [3], [4] that the Kolmogorov and
E(Z+log* Z+4) > H.
(17) Chaitin complexity measures are universal. The same re-
Although log* I is not concave, we prove in Appendix C
sult can be shown to hold for the extension complexity
that there exists a (piecewise-linear) concave function
measure. Thus from now on we will assume that the
F*(I) such that E*(l) < log* I< F*(1)+2. Thus E log*
complexities are measured with respect to some fixed
I < EF*(I) + 2 < P*(EI) + 2 < log*(El) + 2 yielding,
from
appropriate universal function, and the subscripts will be
(17),
dropped. We shall denote the Chaitin, Kolmogorov, and
El+log* Eli-6) H.
extension complexities of a binary sequence x E (0, l}* by
(l*) C(x), K(x]Z(x)), and E(x), respectively.
But El < H + 1 as before. Therefore
Theorem 4: There exist constants c0 and c, such that
L,:,>H-log(H+I)-loglog(H+l)-.*. -6. (19) forallxE{ol]*
E(x)+c,<C(x)<E(x)+logI(x)
IV. SOME
RELATIONS BETWEEN KOLMOGOROV,
+loglogI(x)+*** +c,
CHAITIN, AND EXTENSION COMPLEXITIES
=E(x)+log* I(x)++
(24)
Let (0, l}* denote the set of all binary finite length
sequences, including the empty sequence. For any x=
(
X1,X2,
*
~)E{O,l}*u{O,l}OO, let x(n)=(x,,x,;**,x,)
denote the first n bits of x.
Definition: A subset S of (0, l}* is said to have the
prefix property if and only if no sequence in S is the
proper prefix of any other sequence in S.
For example, (00, lOO} has the prefix property, but
{OO,OOl} does not.
Definition: The Kolmogoroo complexity of a binary
sequence x(n) E { 0, 1 }
with respect to a partial recursive
function A : {O,l}*
x
N+{O, l}* is defined to be
(20)
where 1(s) is the length of the sequencep, and N denotes
the set of natural numbers.
Here A may be considered to be a computer, p its
program, and x its output. We shall use interchangeably
the recursive function theoretic terminology and computer
terminology. (See, for example, Chaitin [3] for a discus-
sion of the equivalence of the two.)
Definition: Let U : { 0, 1 } *+ { 0, 1 } * be a partial recursive
function with a prefix domain. Then the Chaitin complex-
ity of a binary sequence x with respect to U is given by
Proof The first inequality follows directly from the
definitions of E(x) and C(x). To prove the second in-
equality, note that the Chaitin complexity program p can
be constructed from the extension complexity program p
as follows. Let s be the shortest program (from a set
having the prefix property) for calculating Z(x). Thenp is
the concatenation qsp where 4 consists of a few bits to tell
the computer to expect two programs and interpret them
appropriately. So we have
C(~)\<E(x)+C(l(x))+c~.
(25)
From Theorem 2 iii)
c(l(x))<logI(x)+loglogI(x)+~~~ +c,. (26)
Combining (26) and (27) yields Theorem 4.
Let
(27)
be the (conditional) Chaitin complexity of x(n) given n,
where n* is the shortest length binary program for n (see
Chaitin [3] for definitions of conditional complexities). As
before, the domain of U( .,n*) has the prefix property for
each n.
The conditional Chaitin complexity of x given its length
I(x) and the unconditional Chaitin complexity of x are
closely related in the following sense.

334
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL.
IT-24,
NO.
3,
MAY
1978
Theorem 5: There exist constants c,, and ci such that expected word length equal to first order to the optimal
for all x E (0, l}*,
Shannon bound H(X,; * * ,X,).
C(xll(x))+c,< C(x)< c(x]r(x))+log* Z(x)++
First we remark that Levin [7] has asserted (the proof
does not appear) that for any finite alphabet ergodic
(28) process (with computable probability distribution)
Proof The lower bound follows from Chaitin [3, (l/n)K(Xi~* *
X,Jn)-+H(X) with probability one. Thus
Theorem 3.l.e]. The upper bound follows from Chaitin [3,
from Theorem 5 it follows that (l/n)C(X,,X,;-- ,X,Jn)
Theorems 3.l.d, 3.l.f] where it is shown that
-+H (X) with probability one. We shall show that the
c(x)~c(x,I(x))+0(l)<c(x]z(x))+c(z(x))+0(1).
behavior of C is good for finite n, for all n.
But from Theorem 2 iii), C(l(x))<log* r(x)+ O(1).
Theorem 7: For every computable probability measure
Hence the theorem is proved.
p : (0, l}*+[O, l] for a stochastic process, there exists a
constant c such that for all n
Theorem 6: There exist constants co and cl such that
for all x E (0, l>*,
H(X ,,..., X,)(E,C(X ,,..., X,ln)<H(X, ,..., Xn)+c.
(31)
K(xll(x))+c,< C(x)< K(xJl(x))+log K(xJl(x))+-
Proof For each n, C(x(n)ln), x(n)E{O,l} must
+logI(x)+loglogZ(x)+- +c,. (29)
satisfy the Kraft inequality. So we have
Proof: The first inequality is a direct consequence of
H(X,; . .
,X,> ( E,C(X,,* *. ,X,ln>.
(32)
the definitions. To prove the second inequality, we first
note that the Chaitin complexity measure is defined with
For the right half of the inequality, we must use a
respect to a computer whose programs belong to a set
theorem of Chaitin and Willis relating C and a certain
with the prefix property. From Theorem 2 iii), we know
universal probability measure P*. We then relate P* to
that we can transform the domain of a Kolmogorov
the true distribution P to achieve the desired proof. We
complexity measure computer into one which has the
d f.
e me, for some universal computer U,
prefix property by extending the length of the
Kolmogorov complexity program from K(xJl(x)) to
P*(+)b)= u(p ~-x~n~2~1(p)~
(33)
,n* -
K(x]I(x))+log K(xll(x))+ - -. + c2. Let us denote this
Chaitin has shown [3, Theorem 3.51 (see also Willis [6,
extended program by p. From the proof of Theorem 4, we
Theorem 161) that there exists a constant csuch that
also know that a program s (belonging to a set with the
prefix property) which describes the length of x need not
C(x(n)ln) <log
P*(xin)]n) +
(34)
be longer than log I(x) + log log I(x) + . . * + c3. The
Chaitin complexity program can be the concatenation qsp
for all n. In addition, he has shown that for any other
where q consists of a few bits to tell the computer to
prefix domain computer A, there exists a constant c such
expect two programs and interpret them appropriately. So that
P*(x(n)ln) > cPA (x(n)ln)
(35)
C(x) < K(x]Z(x))+log K(xll(x))+ * * *
for all n, where PA(-) is defined as in (33).
+logI(x)+loglogl(x)+-* +c.
In Lemma 1 below we show that, for the given comput-
able probability mass function p: {O,l}*-+[O, l] for a
This completes the proof of Theorem 5.
stochastic process, there exists a prefix domain computer
A such that PA(x(n)ln)=p(x(n)) for all n. The proof can
V.
RELATIONOFCHAITINCODELENGTHTO
SHANNON CODELENGTH
then be completed as follows
E,Cb(nIn)=
X pbtW(xtn)ln)
qqE{O,l)n
(36)
Let {Xi}? be a stationary binary stochastic process
with marginals p(x(n)),x(n)E{O,l}*, n=1,2;--,and ( 2
P (x(n)> log
p using (34),
Shannon entropy
x(n)E(O,l}
(
P*&ln) +c
)
H(X) = ,jiir H (X,,X,; - . ,X,)/n.
(30)
(37)
The Shannon entropy H (X,, - - - ,X,J is a real number,
< 2
while the Chaitin complexity C(X,, - * - ,X&I) is a random
x(~)E(O,l)
p(x(n+g c,,p citn,ln,) +c, using (3%
A
variable equal to the length of the shortest codeword
(38)
(program) assigned to (Xi, - - - ,X,J by U. The prefix set of
= 2
P (x(4> log
, using Lemma 1,
codewords so defined may be thought of as a universal
x(~)~(O,l]
p(x(nN +c
prefix encoding of n-sequences for each n. Note in partic-
(39)
ular that the prefix encoding induced by U is completely
oblivious to the true underlying statistics p(x,, - * - ,x,J. We
= H(X,; + - ,Xn)+c,
for all 12.
(40)
shall show, however, that this universal encoding has an
Q.E.D.

LEUNG-YAN-CHEONG AND COVER: ENTROPY AND KOLMOGOROV COMPLlXlTY
335
Lemma I: For any computable probability mass func-
tion p : (0, l}*+[O, I] for a stochastic process, there exists
a prefix domain computer A such that PA(x(n)]n)=
p(x(n)) for all n.
Remark I: Willis [6, Theorem 121 has proved a similar
lemma under the constraint that p ( *) be r-computable,
i.e., that p(x,; . *
,x,) have a finite base-r expansion for
every
x1,x2;
- - ,x,.
Remark 2: Here we define a number to be computable
if we can calculate its nth bit in finite time for all finite n.
An analogous result can be proved if by a computable
number we mean instead of a number which we can
approximate arbitrarily closely.
Proof Letp@)(x(n)) denotep(x(n)) truncated after k
bits. For example, if p(x(n)) =0.001011001~ * . , then
p(5)(x(n))=0.00101. Define
F(@(x(n)) = 2 p@)(x(n))
x(n)<x(n)
(41)
where x(n) < x(n) means x(n) precedes x(n) in a lexico-
graphic ordering of the n-sequences. Note that p(x(n))
being computable does not guarantee that F(x(n)) is
computable.
Let A be a computer that has n* on its work tape. It
also has at its disposal for inspection a random program
P =PIP2P3P4 E{~ l}*
We now describe how A oper-
ates.
Step 1: Calculate n.
Step 2: Set m=l.
Step 3: Compute F()(x(n)), for all x(n)E (0,
l}.
Step 4: The error in summing 2binary terms each in
[0, I] and each truncated after m places is bounded above
by 2-.
Using this crude bound on the difference be-
tween I;()(x(n)) and the true distribution function
F(x(n)) L Cx++,x~n~p(x(n)), and between -pcm) = *
P1P2
*p, and -p, decide if at this stage it can be guaran-
teed that
*pE(F(x*(n)),F(x*(n)+ OO;~l~l)]
(42)
for some x*(n)E{O,l}. Here x(n)+OO-**OOl means the
sequence obtained by adding *x(n) and (i> and reinter-
preting it as a sequence. If (42) can be decided, proceed to
step 6.
Step 5: Increment m by 1. Go back to Step 3.
Step 6: Print out x*(n) and stop.
It is easily seen that
Pr
i
(
*PE F(x(n)),
F
(
=p(x(n)) (43)
for all x(n) E (0, l}. Since limm+m -p(@ = *p and
1i~m-m
F()(x(n))= F(x(n)), A
will fail to halt only if
.p = F(x(n)) for some x(n) E (0, l}. This event has proba-
bility zero. Thus there exists a computer A such that a
Bernoulli random program p will induce the stochastic
process {Xi} as its output.
Q.E.D.
VI.
CONCLUSIONS
This study can be perceived in three parts. First, the
minimal average code length with respect to a known
distribution has been shown to be equal to the Shannon
entropy H to first order under different coding con-
straints. Second, the individual complexity measures of
Kolmogorov, Chaitin, and others have been shown to be
equivalent to one another, also to first order. Finally, the
expected code length of the individual algorithmic code
has been shown to be equal to first order to the Shannon
entropy, thus identifying the statistical and the logical
definitions of entropy.
ACKNOWLEDGMENT
The authors would like to thank Professor John T. Gill
for suggesting the method used for lower bounding L, :, in
Section III. They also wish to thank both referees for aid
in improving the proofs and making the concepts more
precise.
APPENDIX
A:
PROOF OF THEOREM 1.
Theorem I:
L1 : 1 a LUD
-loglogn-3.
Proof:
From (1)
We now proceed to find max(H(X)-
L, : ,).
Let A g H(X) -
L,
: ,. Then
A= 5 pilg~-~~pi~log(f+l)] Cm)
i-1
n
<~pilog~-log ;+1
i=l
( ( )I
,
I
maxA <max i pi
(
log +-log
I
( 1)
;+1 .
(-44)
i=l
Let
Let
J(P,,. * *
CA6)
Differentiating J(p,, . . . ,p,J with respect top;, we obtain
CIJ
-=-
aPi
q+h--l+ln $.
I
Setting
aJ/api
= 0, we obtain
lnp,=X-(q+l)
i.e.,
648)
p.=e-(c,+)=ae-c,
where a is some constant. Now
(A9)
WO)

Citations
More filters
Journal ArticleDOI

A universal prior for integers and estimation by minimum description length

TL;DR: In this article, the minimum description length (MDL) criterion is used to estimate the total number of binary digits required to rewrite the observed data, when each observation is given with some precision.
Journal ArticleDOI

Universal coding, information, prediction, and estimation

TL;DR: A connection between universal codes and the problems of prediction and statistical estimation is established, and a known lower bound for the mean length of universal codes is sharpened and generalized, and optimum universal codes constructed.
Journal ArticleDOI

Universal modeling and coding

TL;DR: A general class of so-called first-in first-out (FIFO) arithmetic codes is described which require no alphabet extension devices and which therefore can be used in conjunction with the best models.
Posted Content

Shannon Information and Kolmogorov Complexity

TL;DR: The basic notions of both theories are discussed, and the relation of both to universal coding, Shannon mutual information versus Kolmogorov (`algorithmic') mutual information, probabilistic sufficient statistic versus algorithmic sufficient statistic (related to lossy compression in the Shannon theory versus meaningful information in the Kolmogsorov theory) are related.
Journal ArticleDOI

Zero-error network coding for acyclic networks

TL;DR: The results in this paper can be regarded as zero-error network coding theorems for acyclic communication networks and inner and outer bounds on the zero- error admissible coding rate region are obtained in term of the regions /spl Gamma//sub N//sup */ and /spl gamma/~/sub N/sup */, which are fundamental regions in the entropy space defined by Yeung.
References
More filters
Journal ArticleDOI

A Theory of Program Size Formally Identical to Information Theory

TL;DR: A new definition of program-size complexity is made, which has precisely the formal properties of the entropy concept of information theory.
Journal ArticleDOI

The complexity of finite objects and the development of the concepts of information and randomness by means of the theory of algorithms

TL;DR: The present article is a survey of the fundamental results connected with the concept of complexity as the minimum number of binary signs containing all the information about a given object that are sufficient for its recovery (decoding).
Journal ArticleDOI

Computational Complexity and Probability Constructions

TL;DR: Using any universal Tur ing machine as a basis, it is possible to cons t ruc t an infinite number of increas ingly accurate computable probabil i ty measures which are independen t of any p robab i l i ty assumpt ions.
Frequently Asked Questions (1)
Q1. What have the authors contributed in "Some equivalences between shannon entropy and kolmogorov complexity" ?

Toussaint et al. this paper gave an upper bound on error probability for % multiclass pattern recognition.