scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Lossy Joint Source-Channel Coding in the Finite Blocklength Regime

01 May 2013-IEEE Transactions on Information Theory (IEEE)-Vol. 59, Iss: 5, pp 2545-2575
TL;DR: It is shown that even when this condition is not satisfied, symbol-by-symbol transmission is, in some cases, the best known strategy in the nonasymptotic regime.
Abstract: This paper finds new tight finite-blocklength bounds for the best achievable lossy joint source-channel code rate, and demonstrates that joint source-channel code design brings considerable performance advantage over a separate one in the nonasymptotic regime. A joint source-channel code maps a block of k source symbols onto a length-n channel codeword, and the fidelity of reproduction at the receiver end is measured by the probability e that the distortion exceeds a given threshold d. For memoryless sources and channels, it is demonstrated that the parameters of the best joint source-channel code must satisfy nC - kR(d) ≈ √(nV + k V(d)) Q-1(e), where C and V are the channel capacity and channel dispersion, respectively; R(d) and V(d) are the source rate-distortion and rate-dispersion functions; and Q is the standard Gaussian complementary cumulative distribution function. Symbol-by-symbol (uncoded) transmission is known to achieve the Shannon limit when the source and channel satisfy a certain probabilistic matching condition. In this paper, we show that even when this condition is not satisfied, symbol-by-symbol transmission is, in some cases, the best known strategy in the nonasymptotic regime.

Summary (5 min read)

I. INTRODUCTION

  • In the limit of infinite blocklengths, the optimal achievable coding rates in channel coding and lossy data compression are characterized by the channel capacity C and the source ratedistortion function R(d), respectively [3] .
  • While computable formulas for the This work was supported in part by the National Science Foundation (NSF) under Grant CCF-1016625 and by the Center for Science of Information (CSoI), an NSF Science and Technology Center, under Grant CCF-0939370.
  • Such bounds were shown in [8] for the channel coding problem and in [9] for the source coding problem.
  • The error exponent approximation and the Gaussian approximation to the non-asymptotic fundamental limit are tight in different operational regimes.
  • Section II summarizes basic definitions and notation.

II. DEFINITIONS

  • (i.e. a code with M codewords and average error probability ǫ and cost α).
  • The dispersion, which serves to quantify the penalty on the rate of the best JSCC code induced by the finite blocklength, is defined as follows.

Definition 4. Fix α and d ≥ d min . The rate-dispersion function of joint source-channel coding (source samples squared per channel use) is defined as

  • EQUATION ) where C(α) and R(d) are the channel capacity-cost and source rate-distortion functions, respectively.
  • The following properties of d−tilted information, proven in [19] , are used in the sequel.
  • The authors use the same notation ı S;Z for that more general function.
  • All results in those sections generalize to the case of a maximal cost constraint by considering X whose distribution is supported on the subset of allowable channel inputs: EQUATION rather than the entire channel input alphabet X .

Theorem 1 (Converse

  • The authors write summations over alphabets for simplicity.
  • Unless stated otherwise, all their results hold for abstract probability spaces.
  • To obtain a code-independent converse, the authors simply choose P X|S that gives the weakest bound, and (23) follows.

B. Converses via hypothesis testing and list decoding

  • While traditionally list decoding has only been considered in the context of finite alphabet sources, the authors generalize the setting to sources with abstract alphabets.
  • Even though the authors keep the standard "list" terminology, the decoder output need not be a finite or countably infinite set.

Definition 7 (List code

  • Any converse for list decoding implies a converse for conventional decoding.
  • The hypothesis testing converse for channel coding [8, Theorem 27 ] can be generalized to joint source-channel coding with list decoding as follows.
  • Note that this is a hypothetical test, which has access to both the source outcome and the decoder output.
  • The Neyman-Pearson lemma (e.g. [20] ) implies that the outcome of the optimum binary hypothesis test between P and Q only depends on the observation through dP dQ, also known as Proof.
  • In the case of finite channel input and output alphabets, the channel symmetry assumption of Theorem 5 holds, in particular, if the rows of the channel transition probability matrix are permutations of each other, and P Ȳ n is the equiprobable distribution on the (n-dimensional) channel output alphabet, which, coincidentally, is also the capacityachieving output distribution.

IV. ACHIEVABILITY

  • If both the source and the channel code are chosen separationoptimally for their given sizes, the separation principle guarantees that under certain quite general conditions (which encompass the memoryless setting, see [21] ) the asymptotic fundamental limit of joint source-channel coding is achievable.
  • In the finite blocklength regime, however, such SSCC construction is, in general, only suboptimal.
  • The dispersion achieved by the conventional SSCC approach is in fact suboptimal.
  • At finite n, the output of the optimum source encoder need not be nearly equiprobable, so there is no reason to expect that a separated scheme employing a maximum-likelihood channel decoder, which does not exploit unequal message probabilities, would achieve near-optimal non-asymptotic performance.
  • The following achievability result, obtained using independent random source codes and random channel codes within the paradigm of Definition 8, capitalizes on this intuition.

Theorem 7 (Achievability). There exists a (d, ǫ) sourcechannel code with

  • The authors will construct a code with separate encoders for source and channel and separate decoders for source and channel as in Definition 8.
  • The authors now proceed to analyze the performance of the code described above.
  • The authors now average (90) over the source and channel codebooks.
  • The code size M that leads to tight achievability bounds following from Theorem 7 is in general much larger than the size that achieves the minimum in (81).

Theorem 8 (Achievability)

  • Theorem 9 (Achievability, almost-lossless JSCC [17] ).
  • The technical condition (iv) ensures applicability of the Gaussian approximation in the following result.

VI. LOSSY TRANSMISSION OF A BMS OVER A BSC

  • The rate-distortion function of the source and the channel capacity are given by, respectively, EQUATION.
  • For convenience, the authors define the discrete random variable U α,β by EQUATION Furthermore, the binomial sum is denoted by EQUATION A straightforward particularization of the d-tilted information converse in Theorem 2 leads to the following result.
  • The hypothesis-testing converse in Theorem 4 particularizes to the following result: Theorem 12 (Converse, BMS-BSC).
  • If the source is equiprobable, the bound in Theorem 12 becomes particularly simple, as the following result details.

Theorem 14 (Achievability, BMS-BSC).

  • A source of fair coin flips has zero dispersion, and as anticipated in Remark 8, JSSC does not afford much gain in the finite blocklength regime (Fig. 5 ).
  • Moreover, in that case the JSCC achievability bound in Theorem 8 is worse than the SSCC achievability bound.
  • The situation is different if the source is biased, with JSCC showing significant gain over SSCC .

VIII. TO CODE OR NOT TO CODE

  • The authors goal in this section is to compare the excess distortion performance of the optimal code of rate 1 at channel blocklength n with that of the optimal symbol-by-symbol code, evaluated after n channel uses, leveraging the bounds in Sections III and IV and the approximation in Section V.
  • The authors show certain examples in which symbol-by-symbol coding is, in fact, either optimal or very close to being optimal.
  • A general conclusion drawn from this section is that even when no coding is asymptotically suboptimal it can be a very attractive choice for short blocklengths [2] .

Definition 10. The distortion-dispersion function of symbolby-symbol joint source-channel coding is defined as

  • EQUATION ) where D is the distortion-rate function of the source.
  • Condition (v) ensures that symbol-by-symbol transmission attains the minimum average (over source realizations) distortion achievable among all codes of any blocklength.
  • The following results pertain to the full distribution of the distortion incurred at the receiver output and not just its mean.

Theorem 20 (Achievability, symbol-by-symbol code). Under restrictions

  • If (v) holds, then there exist a symbol-by-symbol encoder and decoder such that the conditional distribution of the output of the decoder given the source outcome coincides with distribution P Z ⋆ |S , so the excess-distortion probability of this symbol-by-symbol code is given by the left side of (189), also known as Proof.
  • EQUATION where EQUATION Moreover, if there is no power constraint, EQUATION EQUATION where θ(n) is that in Theorem 10.
  • In other words, not only do such symbol-by-symbol codes attain the minimum average distortion but also the variance of distortions at the decoder's output is the minimum achievable among all codes operating at that average distortion.

B. Uncoded transmission of a BMS over a BSC

  • If the encoder and the decoder are both identity mappings (uncoded transmission), the resulting joint distribution satisfies condition (v).
  • As is well known, regardless of the blocklength, the uncoded symbol-by-symbol scheme achieves the minimum bit error rate (averaged over source and channel).
  • Here, the authors are interested instead in examining the excess distortion probability criterion.
  • Consider an application where, if the fraction of erroneously received bits exceeds a certain threshold, then the entire output packet is useless.
  • Moreover, the uncoded transmission attains the minimum bit error rate threshold D(n, n, ǫ) achievable among all codes operating at blocklength n, regardless of the allowed ǫ, as the following result demonstrates.

It achieves, at blocklength n and excess distortion probability

  • For the transmission of the fair binary source over a BSC, Fig. 8 shows the distortion achieved by the uncoded scheme, the separated scheme and the JSCC scheme of Theorem 14 versus n for a fixed excess-distortion probability ǫ = 0.01.
  • The no coding / converse curve in Fig. 8 depicts one of those singular cases where the non-asymptotic fundamental limit can be computed precisely.
  • As the blocklength increases, the performance of the separated scheme approaches that of the no-coding scheme, but according to Theorem 23 it can never outperform it.
  • Had the authors allowed the excess distortion probability to vanish sufficiently slowly, the JSCC curve would have approached the Shannon limit as n → ∞.
  • Nevertheless, uncoded transmission performs remarkably well in the displayed range of blocklengths, achieving the converse almost exactly at blocklengths less than 100, and outperforming the JSCC achievability result in Theorem 14 at blocklengths as long as 700.

C. Symbol-by-symbol coding for lossy transmission of a GMS over an AWGN channel

  • The next result characterizes the distribution of the distortion incurred by the symbol-by-symbol scheme that attains the minimum average distortion.
  • On the other hand, using (130), the authors compute EQUATION.
  • Indeed, in the range of blocklenghts displayed in Figure 11 , the symbol-by-symbol code even outperforms the converse for codes operating under a maximal power constraint.

E. Symbol-by-symbol transmission of a DMS over a DEC under logarithmic loss

  • Curiously, for any 0 ≤ d ≤ H(S), the rate-distortion function and the d-tilted information are given respectively by ( 213) and (214), even if the source is not equiprobable.
  • In fact, the rate-distortion function is achieved by, EQUATION and the channel that is matched to the equiprobable source under logarithmic loss is exactly the DEC in (215).
  • Finally, it is easy to verify that the distortion-dispersion function of symbol-by-symbol coding under logarithmic loss is the same as that under erasure distortion and is given by (216).

IX. CONCLUSION

  • Which hold in wide generality and are tight enough to determine the dispersion of joint source-channel coding for the transmission of an abstract memoryless source over either a DMC or a Gaussian channel, under an arbitrary fidelity measure.the authors.
  • The major results and conclusions are the following.
  • 6) For the transmission of a stationary memoryless source over a stationary memoryless channel, the Gaussian approximation in Theorem 10 (neglecting the remainder θ(n)) provides a simple estimate of the maximal nonasymptotically achievable joint source-channel coding rate.
  • 8) Even in the absence of a probabilistic match between the source and the channel, symbol-by-symbol transmission, though asymptotically suboptimal, might outperform separate source-channel coding and joint source-channel random coding in the finite blocklength regime.
  • The authors are grateful to Dr. Oliver Kosut for offering numerous comments, and, in particular, suggesting the simplification of the achievability bound in [1] with the tighter version in Theorem 8.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

arXiv:1209.1317v2 [cs.IT] 4 Feb 2014
1
Lossy joint source-channel coding
in the finite blocklength regime
Victoria Kostina, Student Member, IEEE, Sergio Verd´u, Fellow, IEEE
Abstract—This paper finds new tight finite-blocklength bounds
for the best achievable lossy joint source-channel code rate,
and demonstrates that joint source-channel code design brings
considerable performance advantage over a separate one in the
non-asymptotic regime. A joint source-channel code maps a block
of k source symbols onto a lengthn channel codeword, and the
fidelity of reproduction at the receiver end is measured by the
probability ǫ that the distortion exceeds a given threshold d. For
memoryless sources and channels, it is demonstrated that the
parameters of the best joint source-channel code must satisfy
nC kR(d)
p
nV + kV (d)Q
1
(ǫ), where C and V are the
channel capacity and channel dispersion, respectively; R(d) and
V(d) are the source rate-distortion and rate-dispersion functions;
and Q is the standard Gaussian complementary cdf. Symbol-by-
symbol (uncoded) transmission is known to achieve the Shannon
limit when the source and channel satisfy a certain probabilistic
matching condition. In this paper we show that even when this
condition is not satisfied, symbol-by-symbol transmission is, in
some cases, the best known strategy in the non-asymptotic regime.
Index Terms—Achievability, converse, finite blocklength
regime, joint source-channel coding, lossy source coding, memo-
ryless sources, rate-distortion theory, Shannon theory.
I. INTRODUCTION
In the limit of infinite blocklengths, the optimal achievable
coding rates in channel coding and lossy data compression are
characterized by the channel capacity C and the source rate-
distortion function R(d), respectively [
3]. For a large class
of sources and channels, in the limit of large blocklength,
the maximum achievable joint source-channel coding (JSCC)
rate compatible with vanishing excess distortion probability
is characterized by the ratio
C
R(d)
[
4]. A perennial question in
information theory is how relevant the asymptotic fundamental
limits are when the communication system is forced to operate
at a given fixed blocklength. The finite blocklength (delay)
constraint is inherent to all communication scenarios. In fact,
in many systems of current interest, such as real-time multi-
media communication, delays are strictly constrained, while
in packetized data communication, packets are frequently on
the order of 1000 bits. While computable formulas for the
This work was supported in part by the National Science Foundation (NSF)
under Grant CCF-1016625 and by the Center for Science of Information
(CSoI), an NSF Science and Technology Center, under Grant CCF-0939370.
The work of V. Kostina was supported in part by the Natural Sciences and
Engineering Research Council of Canada.
Portions of this paper were presented at the 2012 IEEE International
Symposium on Information Theory [
1], and at the 2012 IEEE Information
Theory Workshop [2].
The authors are with the Department of Electrical Engineering,
Princeton University, NJ 08544 USA (e-mail: vkostina@princeton.edu;
verdu@princeton. edu).
channel capacity and the source rate-distortion function are
available for a wide class of channels and sources, the luxury
of being able to compute exactly (in polynomial time) the non-
asymptotic fundamental limit of interest is rarely affordable.
Notable exceptions where the non-asymptotic fundamental
limit is indeed computable are almost lossless source coding
[
5], [6], and JSCC over matched source-channel pairs [7]. In
general, however, one can at most hope to obtain bounds and
approximations to the information-theoretic non-asymptotic
fundamental limits.
Although non-asymptotic bounds can be distilled from
classical proofs of coding theorems, these bounds are rarely
satisfyingly tight in the non-asymptotic regime, as studied in
[
8], [9] in the contexts of channel coding and lossy source
coding, respectively. For the JSCC problem, the classical
converse is based on the mutual information data process-
ing inequality, while the classical achievability scheme uses
separate source/channel coding (SSCC), in which the channel
coding block and the source coding block are optimized
separately without knowledge of each other. These conven-
tional approaches lead to disappointingly weak non-asymptotic
bounds. In particular, SSCC can be rather suboptimal non-
asymptotically. An accurate finite blocklength analysis there-
fore calls for novel upper and lower bounds that sandwich
tightly the non-asymptotic fundamental limit. Such bounds
were shown in [
8] for the channel coding problem and in
[9] for the source coding problem. In this paper, we derive
new tight bounds for the JSCC problem, which hold in full
generality, without any assumptions on the source alphabet,
stationarity or memorylessness.
While numerical evaluation of the non-asymptotic upper and
lower bounds bears great practical interest (for example, to de-
cide how suboptimal with respect to the information-theoretic
limit a given blocklength-n code is), such bounds usually
involve cumbersome expressions that offer scant conceptual
insight. Somewhat ironically, to get an elegant, insightful
approximation of the non-asymptotic fundamental limit, one
must resort to an asymptotic analysis of these non-asymptotic
bounds. Such asymptotic analysis must be finer than that
based on the law of large numbers, which suffices to obtain
the asymptotic fundamental limit but fails to provide any
estimate of the speed of convergence to that limit. There
are two complementary approaches to a finer asymptotic
analysis: the large deviations analysis which leads to error
exponents, and the Gaussian approximation analysis which
leads to dispersion. The error exponent approximation and the
Gaussian approximation to the non-asymptotic fundamental
limit are tight in different operational regimes. In the former, a

2
rate which is strictly suboptimal with respect to the asymptotic
fundamental limit is fixed, and the error exponent measures
the exponential decay of the error probability to 0 as the
blocklength increases. The error exponent approximation is
tight if the error probability a system can tolerate is extremely
small. However, already for probability of error as low as 10
6
to 10
1
, which is the operational regime for many high data
rate applications, the Gaussian approximation, which gives
the optimal rate achievable at a given error probability as
a function of blocklength, is tight [
8], [9]. In the channel
coding problem, the Gaussian approximation of R
(n, ǫ),
the maximum achievable finite blocklength coding rate at
blocklength n and error probability ǫ, is given by, for finite
alphabet stationary memoryless channels [
8],
nR
(n, ǫ) = nC
nV Q
1
(ǫ) + O (log n) (1)
where C and V are the channel capacity and dispersion,
respectively. In the lossy source coding problem, the Gaussian
approximation of R
(k, d, ǫ), the minimum achievable finite
blocklength coding rate at blocklength k and probability ǫ of
exceeding fidelity d, is given by, for stationary memoryless
sources [9],
kR
(k, d, ǫ) = kR(d) +
p
kV(d)Q
1
(ǫ) + O (log k) (2)
where R(d) and V(d) are the rate-distortion and the rate-
dispersion functions, respectively.
For a given code, the excess distortion constraint, which is
the figure of merit in this paper as well as in [9], is, in a
way, more fundamental than the average distortion constraint,
because varying d over its entire range and evaluating the
probability of exceeding d gives full information about the
distribution (and not just its mean) of the distortion incurred at
the decoder output. Following the philosophy of [
8], [9], in this
paper we perform the Gaussian approximation analysis of our
new bounds to show that k, the maximum number of source
symbols transmissible using a given channel blocklength n,
must satisfy
nC kR(d) =
p
nV + kV(d)Q
1
(ǫ) + O (log n) (3)
under the fidelity constraint of exceeding a given distortion
level d with probability ǫ. In contrast, if, following the SSCC
paradigm, we just concatenate the channel code in (
1) and the
source code in (
2), we obtain
nC kR(d) min
η+ζǫ
n
nV Q
1
(η) +
p
kV(d)Q
1
(ζ)
o
+ O (log n) (4)
which is usually strictly suboptimal with respect to (
3).
In addition to deriving new general achievability and con-
verse bounds for JSCC and performing their Gaussian ap-
proximation analysis, in this paper we revisit the dilemma of
whether one should or should not code when operating under
delay constraints. Gastpar et al. [
7] gave a set of necessary and
sufficient conditions on the source, its distortion measure, the
channel and its cost function in order for symbol-by-symbol
transmission to attain the minimum average distortion. In these
curious cases, the source and the channel are probabilistically
matched. In the absence of channel cost constraints, we show
that whenever the source and the channel are probabilistically
matched so that symbol-by-symbol coding achieves the min-
imum average distortion, it also achieves the dispersion of
joint source-channel coding. Moreover, even in the absence of
such a match between the source and the channel, symbol-
by-symbol transmission, though asymptotically suboptimal,
might outperform in the non-asymptotic regime not only
separate source-channel coding but also our random-coding
achievability bound.
Prior research relating to finite blocklength analysis of JSCC
includes the work of Csisz´ar [
10], [11] who demonstrated
that the error exponent of joint source-channel coding out-
performs that of separate source-channel coding. For discrete
source-channel pairs with average distortion criterion, Pilc’s
achievability bound [
12], [13] applies. For the transmission
of a Gaussian source over a discrete channel under the
average mean square error constraint, Wyner’s achievability
bound [
14], [15] applies. Non-asymptotic achievability and
converse bounds for a graph-theoretic model of JSCC have
been obtained by Csisz´ar [
16]. Most recently, Tauste Campo et
al. [17] showed a number of finite-blocklength random-coding
bounds applicable to the almost-lossless JSCC setup, while
Wang et al. [
18] found the dispersion of JSCC for sources
and channels with finite alphabets.
The rest of the paper is organized as follows. Section
II
summarizes basic definitions and notation. Sections III and
IV introduce the new converse and achievability bounds to
the maximum achievable coding rate, respectively. A Gaussian
approximation analysis of the new bounds is presented in
Section
V. The evaluation of the bounds and the approximation
is performed for two important special cases: the transmission
of a binary memoryless source (BMS) over a binary symmetric
channel (BSC) with bit error rate distortion (Section
VI) and
the transmission of a Gaussian memoryless source (GMS) with
mean-square error distortion over an AWGN channel with a
total power constraint (Section
VII). Section VIII focuses on
symbol-by-symbol transmission.
II. DEFINITIONS
A lossy source-channel code is a pair of (possibly random-
ized) mappings f : M 7→ X and g : Y 7→
c
M. A distortion
measure d : M ×
c
M 7→ [0, +] is used to quantify the per-
formance of the lossy code. A cost function c: X 7→ [0, +]
may be imposed on the channel inputs. The channel is used
without feedback.
Definition 1. The pair (f, g) is a (d, ǫ, α) lossy source-
channel code for {M, X, Y,
c
M, P
S
, d, P
Y |X
, c} if
P [d (S, g(Y )) > d] ǫ and either E [c(X)] α (average
cost constraint) or c(X) α a.s. (maximal cost constraint),
where f(S) = X (see Fig.
1). In the absence of an input cost
constraint we simplify the terminology and refer to the code
as (d, ǫ) lossy source-channel code.
The special case d = 0 and d(s, z) = 1 {s 6= z} corresponds
to almost-lossless compression. If, in addition, P
S
is equiprob-
able on an alphabet of cardinality |M| = |
c
M| = M, a (0, ǫ, α)
code in Definition 1 corresponds to an (M, ǫ, α) channel code

3
P
Y |X
|
X
Y
Z
S
P [d (S, Z) > d] ǫ
f =
g
Fig. 1. A (d, ǫ) joint source-channel code.
(i.e. a code with M codewords and average error probability ǫ
and cost α). On the other hand, if P
Y |X
is an identity mapping
on an alphabet of cardinality M without cost constraints, a
(d, ǫ) code in Definition
1 corresponds to an (M, d, ǫ) lossy
compression code (as e.g. defined in [9]).
As our bounds in Sections
III and IV do not foist a Cartesian
structure on the underlying alphabets, we state them in the one-
shot paradigm of Definition
1. When we apply those bounds
to the block coding setting, transmitted objects indeed become
vectors, and Definition
2 below comes into play.
Definition 2. In the conventional fixed-to-fixed (or block)
setting in which X and Y are the nfold Cartesian prod-
ucts of alphabets A and B, M and
c
M are the kfold
Cartesian products of alphabets S and
ˆ
S, and d
k
: S
k
×
ˆ
S
k
7→ [0, +], c
n
: A
n
7→ [0, +], a (d, ǫ, α) code for
{S
k
, A
n
, B
n
,
ˆ
S
k
, P
S
k , d
k
, P
Y
n
|X
n
, c
n
} is called a
(k, n, d, ǫ, α) code (or a (k, n, d, ǫ) code if there is no cost
constraint).
Definition 3. Fix ǫ, d, α and the channel blocklength n.
The maximum achievable source blocklength and coding rate
(source symbols per channel use) are defined by, respectively
k
(n, d, ǫ, α) = sup {k : (k, n, d, ǫ, α) code} (5)
R(n, d, ǫ, α) =
1
n
k
(n, d, ǫ, α) (6)
Alternatively, fix ǫ, α, source blocklength k and channel
blocklength n. The minimum achievable excess distortion is
defined by
D(k, n, ǫ, α) = inf {d : (k, n, d, ǫ, α) code} (7)
Denote, for a given P
Y |X
and a cost function c: X 7→
[0, +],
C(α) = sup
P
X
:
E[c(X)]α
I(X; Y ) (8)
and, for a given P
S
and a distortion measure d : M ×
c
M 7→
[0, +],
R
S
(d) = inf
P
Z|S
:
E[d(S,Z)]d
I(S; Z) (9)
We impose the following basic restrictions on P
Y |X
, P
S
, the
input-cost function and the distortion measure:
(a) R
S
(d) is finite for some d, i.e. d
min
< , where
d
min
= inf {d: R
S
(d) < ∞}; (10)
(b) The infimum in (
9) is achieved by a unique P
Z
|S
;
(c) The supremum in (8) is achieved by a unique P
X
.
The dispersion, which serves to quantify the penalty on the
rate of the best JSCC code induced by the finite blocklength,
is defined as follows.
Definition 4. Fix α and d d
min
. The rate-dispersion func-
tion of joint source-channel coding (source samples squared
per channel use) is defined as
V(d, α) = lim
ǫ0
lim sup
n→∞
n
C(α)
R(d)
R(n, d, ǫ, α)
2
2 log
e
1
ǫ
(11)
where C(α) and R(d) are the channel capacity-cost and
source rate-distortion functions, respectively.
1
The distortion-dispersion function of joint source-channel
coding is defined as
W(R, α) = lim
ǫ0
lim sup
n→∞
n
D
C(α)
R
D(nR, n, ǫ, α)
2
2 log
e
1
ǫ
(12)
where D(·) is the distortion-rate function of the source.
If there is no cost constraint, we will simplify notation by
dropping α from (
5), (6), (7), (8), (11) and (12).
Definition 5 (dtilted information [
9]). For d > d
min
, the
dtilted information in s is defined as
2
S
(s, d) = log
1
E [exp (λ
d λ
d(s, Z
))]
(13)
where the expectation is with respect to P
Z
, i.e. the uncon-
ditional distribution of the reproduction random variable that
achieves the infimum in (
9), and
λ
= R
S
(d) (14)
The following properties of dtilted information, proven in
[
19], are used in the sequel.
S
(s, d) = ı
S;Z
(s; z) + λ
d(s, z) λ
d (15)
E [
S
(s, d)] = R
S
(d) (16)
E [exp (λ
d λ
d(S, z) +
S
(S, d))] 1 (17)
where (
15) holds for P
Z
-almost every z, while (17) holds for
all z
c
M, and
ı
S;Z
(s; z) = log
dP
Z|S=s
dP
Z
(z) (18)
denotes the information density of the joint distribution P
SZ
at (s, z). We can define the right side of (
18) for a given
(P
Z|S
, P
Z
) even if there is no P
S
such that the marginal of
P
S
P
Z|S
is P
Z
. We use the same notation ı
S;Z
for that more
general function. To extend Definition
5 to the lossless case,
for discrete random variables we define 0-tilted information as
S
(s, 0) = ı
S
(s) (19)
1
While for memoryless sources and channels, C(α) = C(α) and R(d) =
R
S
(d) given by (
8) and (9) evaluated with single-letter distributions, it is
important to distinguish between the operational definitions and the extremal
mutual information quantities, since the core results in this paper allow for
memory.
2
All logs and exps are in an arbitrary common base.

4
where
ı
S
(s) = log
1
P
S
(s)
(20)
is the information in outcome s M.
The distortion d-ball centered at s M is denoted by
B
d
(s) = {z
c
M: d(s, z) d}. (21)
Given (P
X
, P
Y |X
), we write P
X
P
Y |X
P
Y
to
indicate that P
Y
is the marginal of P
X
P
Y |X
, i.e. P
Y
(y) =
P
x∈X
P
Y |X
(y|x)P
X
(x).
3
So as not to clutter notation, in Sections III and IV we
assume that there are no cost constraints. However, all results
in those sections generalize to the case of a maximal cost
constraint by considering X whose distribution is supported
on the subset of allowable channel inputs:
F(α) = {x X : c(x) α} (22)
rather than the entire channel input alphabet X.
III. CONVERSES
A. Converses via d-tilted information
Our first result is a general converse bound.
Theorem 1 (Converse). The existence of a (d, ǫ) code for S
and P
Y |X
requires that
ǫ inf
P
X|S
sup
γ>0
sup
P
¯
Y
P
S
(S, d) ı
X;
¯
Y
(X; Y ) γ
e xp (γ)
(23)
sup
γ>0
sup
P
¯
Y
E
inf
x∈X
P
S
(S, d) ı
X;
¯
Y
(x; Y ) γ | S
e xp (γ)
(24)
where in (
23), S X Y , and the conditional probability
in (
24) is with respect to Y distributed according to P
Y |X=x
(independent of S), and
ı
X;
¯
Y
(x; y) = log
dP
Y |X=x
dP
¯
Y
(y) (25)
Proof: Fix γ and the (d, ǫ) code (P
X|S
, P
Z|Y
). Fix an
arbitrary probability measure P
¯
Y
on Y. Let P
¯
Y
P
Z|Y
3
We write summations over alphabets for simplicity. Unless stated other-
wise, all our results hold for abstract probability spaces.
P
¯
Z
. We can write the probability in the right side of (
23) as
P
S
(S, d) ı
X;
¯
Y
(X; Y ) γ
= P
S
(S, d) ı
X;
¯
Y
(X; Y ) γ, d(S; Z) > d
+ P
S
(S, d) ı
X;
¯
Y
(X; Y ) γ, d(S; Z) d
(26)
ǫ
+
X
s∈M
P
S
(s)
X
x∈X
P
X|S
(x|s)
X
y∈Y
X
zB
d
(s)
P
Z|Y
(z|y)
· P
Y |X
(y|x)1
P
Y |X
(y|x) P
¯
Y
(y) exp (
S
(s, d) γ)
(27)
ǫ + exp (γ)
X
s∈M
P
S
(s) exp (
S
(s, d))
X
y∈Y
P
¯
Y
(y)
·
X
zB
d
(s)
P
Z|Y
(z|y)
X
x∈X
P
X|S
(x|s) (28)
= ǫ + exp (γ)
X
s∈M
P
S
(s) exp (
S
(s, d))
X
y∈Y
P
¯
Y
(y)
·
X
zB
d
(s)
P
Z|Y
(z|y) (29)
= ǫ + exp (γ)
X
s∈M
P
S
(s) exp (
S
(s, d)) P
¯
Z
(B
d
(s)) (30)
ǫ + exp (γ)
X
z
c
M
P
¯
Z
(z)
X
s∈M
P
S
(s)
· exp (
S
(s, d) + λ
d λ
d(s, z)) (31)
ǫ + exp (γ) (32)
where (
32) is due to (17). Optimizing over γ > 0 and P
¯
Y
,
we get the best possible bound for a given encoder P
X|S
. To
obtain a code-independent converse, we simply choose P
X|S
that gives the weakest bound, and (
23) follows. To show (24),
we weaken (23) as
ǫ sup
γ>0
sup
P
¯
Y
inf
P
X|S
P
S
(S, d) ı
X;
¯
Y
(X; Y ) γ
exp (γ)
(33)
and observe that for any P
¯
Y
,
inf
P
X|S
P
S
(S, d) ı
X;
¯
Y
(X; Y ) γ
=
X
s∈M
P
S
(s) inf
P
X|S=s
X
x∈X
P
X|S
(x|s)
·
X
y∈Y
P
Y |X
(y|x)1
S
(s, d) ı
X;
¯
Y
(x; y) γ
(34)
=
X
s∈M
P
S
(s)
· inf
x∈X
X
y∈Y
P
Y |X
(y|x)1
S
(s, d) ı
X;
¯
Y
(x; y) γ
(35)
= E
inf
x∈X
P
S
(S, d) ı
X;
¯
Y
(x; Y ) γ | S
(36)

5
An immediate corollary to Theorem
1 is the following
result.
Theorem 2 (Converse). Assume that there exists a distribution
P
¯
Y
such that the distribution of ı
X;
¯
Y
(x; Y ) (according to
P
Y |X=x
) does not depend on the choice of x X. If a (d, ǫ)
code for S and P
Y |X
exists, then
ǫ sup
γ>0
P
S
(S, d) ı
X;
¯
Y
(x; Y ) γ
exp (γ)
(37)
for an arbitrary x X. The probability measure P in (
37) is
generated by P
S
P
Y |X=x
.
Proof: Under the assumption, the conditional probability
in the right side of (
24) is the same regardless of the choice
of x X.
The next result generalizes Theorem 1. When we apply
Theorem 3 in Section V to find the dispersion of JSCC, we
will let T be the number of channel input types, and we will let
W be the type of the channel input block. If T = 1, Theorem
3 reduces to Theorem 1.
Theorem 3 (Converse). The existence of a (d, ǫ) code for S
and P
Y |X
requires that
ǫ inf
P
X|S
max
γ>0,T
T exp (γ)
+ sup
¯
Y ,W :
S(X,W )Y
P
S
(S, d) ı
X;
¯
Y |W
(X; Y |W ) γ
(38)
max
γ>0,T
T exp (γ)
+ sup
¯
Y ,W
E
h
inf
x∈X
P
S
(S, d) ı
X;
¯
Y |W
(x; Y |W ) γ | S
i
(39)
where T is a positive integer, the random variable W takes
values on { 1, . . . , T }, and
ı
X;
¯
Y |W
(x; y|t) = log
P
Y |X=x,W =t
P
¯
Y |W =t
(y) (40)
and in (
39), the probability measure is generated by
P
S
P
W |X=x
P
Y |X=x,W
.
Proof: Fix a possibly randomized (d, ǫ) code
{P
X|S
, P
Z|Y
}, a positive scalar γ, a positive integer
T , an auxiliary random variable W that satisfies
S (X, W ) Y , and a conditional probability distribution
P
¯
Y
|W
: { 1, . . . T } 7→ Y. Let P
¯
Y
|W =t
P
Z|Y
P
¯
Z
|W =t
,
i.e. P
¯
Z|W =t
(z) =
P
y∈Y
P
Z|Y
(z|y)P
¯
Y |W =t
(y), for all t.
Write
P
S
(S, d) ı
X;Y |W
(X; Y |W ) γ
ǫ +
X
s∈M
P
S
(s)
T
X
t=1
P
W |S
(t|s)
X
x∈X
P
X|S,W
(x|s, t)
·
X
y∈Y
P
Y |X,W
(y|x, t)
X
zB
d
(s)
P
Z|Y
(z|y)
· 1
P
Y |X,W
(y|x, t) P
¯
Y |W =t
(y) exp (
S
(s, d) γ)
(41)
ǫ + exp (γ)
X
s∈M
P
S
(s) exp (
S
(s, d))
T
X
t=1
P
W |S
(t|s)
·
X
y∈Y
P
¯
Y |W
(y|t)
X
zB
d
(s)
P
Z|Y
(z|y)
X
x∈X
P
X|S,W
(x|s, t)
(42)
ǫ + exp (γ)
T
X
t=1
X
s∈M
P
S
(s) exp (
S
(s, d))
X
y∈Y
P
¯
Y
|W
(y|t)
·
X
zB
d
(s)
P
Z|Y
(z|y) (43)
ǫ + exp (γ)
T
X
t=1
X
s∈M
P
S
(s) exp (
S
(s, d)) P
¯
Z|W =t
(B
d
(s))
(44)
ǫ + exp (γ)
T
X
t=1
X
s∈M
P
S
(s)
X
z
c
M
P
¯
Z|W =t
(z)
· exp (
S
(s, d) + λ
d λ
d(s, z)) (45)
ǫ + T exp (γ) (46)
where (
46) is due to (17). Optimizing over γ, T and the
distributions of the auxiliary random variables
¯
Y and W , we
obtain the best possible bound for a given encoder P
X|S
.
To obtain a code-independent converse, we simply choose
P
X|S
that gives the weakest bound, and (
38) follows. To show
(39), we weaken (38) by restricting the sup to W satisfying
S X W and changing the order of inf and sup as follows:
max
γ>0,T
sup
¯
Y ,W :
S(X,W )Y
SXW
inf
P
X|S
(47)
Observe that for any legitimate choice of
¯
Y and W ,
inf
P
X|S
P
S
(S, d) ı
X;
¯
Y |W
(X; Y |W ) γ
(48)
=
X
s∈M
P
S
(s) inf
P
X|S=s
X
x∈X
P
X|S
(x|s)
T
X
t=1
P
W |X
(t|x)
·
X
y∈Y
P
Y |X,W
(y|x, t)1
S
(s, d) ı
X;
¯
Y |W
(x; y|t) γ
(49)
=
X
s∈M
P
S
(s) inf
x∈X
T
X
t=1
P
W |X
(t|x)
X
y∈Y
P
Y |X,W
(y|x, t)
· 1
S
(s, d) ı
X;
¯
Y |W
(x; y|t) γ
(50)
which is equal to the expectation on the right side of (
39).

Citations
More filters
Journal ArticleDOI
TL;DR: New achievability and converse bounds are derived, which are uniformly tighter than existing bounds, and lead to the tightest bounds on the second-order coding rate for discrete memoryless and Gaussian wiretap channels.
Abstract: This paper investigates the maximal secret communication rate over a wiretap channel subject to reliability and secrecy constraints at a given blocklength. New achievability and converse bounds are derived, which are uniformly tighter than existing bounds, and lead to the tightest bounds on the second-order coding rate for discrete memoryless and Gaussian wiretap channels. The exact second-order coding rate is established for semi-deterministic wiretap channels, which characterizes the optimal tradeoff between reliability and secrecy in the finite-blocklength regime. Underlying our achievability bounds are two new privacy amplification results, which not only refine the classic privacy amplification results, but also achieve secrecy under the stronger semantic-security metric.

106 citations

Posted Content
TL;DR: In this article, it was shown that symbol-by-symbol (uncoded) transmission is not always the best strategy in the nonasymptotic regime, even when the source and channel satisfy a certain probabilistic matching condition.
Abstract: This paper finds new tight finite-blocklength bounds for the best achievable lossy joint source-channel code rate, and demonstrates that joint source-channel code design brings considerable performance advantage over a separate one in the non-asymptotic regime. A joint source-channel code maps a block of $k$ source symbols onto a length$-n$ channel codeword, and the fidelity of reproduction at the receiver end is measured by the probability $\epsilon$ that the distortion exceeds a given threshold $d$. For memoryless sources and channels, it is demonstrated that the parameters of the best joint source-channel code must satisfy $nC - kR(d) \approx \sqrt{nV + k \mathcal V(d)} Q(\epsilon)$, where $C$ and $V$ are the channel capacity and channel dispersion, respectively; $R(d)$ and $\mathcal V(d)$ are the source rate-distortion and rate-dispersion functions; and $Q$ is the standard Gaussian complementary cdf. Symbol-by-symbol (uncoded) transmission is known to achieve the Shannon limit when the source and channel satisfy a certain probabilistic matching condition. In this paper we show that even when this condition is not satisfied, symbol-by-symbol transmission is, in some cases, the best known strategy in the non-asymptotic regime.

101 citations

Proceedings ArticleDOI
07 Jul 2013
TL;DR: A novel technique to prove a one-shot version of achievability results in network information theory is proposed, which uses a stochastic encoder and decoder with a particular structure for coding that resembles both the ML and the joint-typicality coders.
Abstract: This paper proposes a novel technique to prove a one-shot version of achievability results in network information theory. The technique is not based on covering and packing lemmas. In this technique, we use a stochastic encoder and decoder with a particular structure for coding that resembles both the ML and the joint-typicality coders. Although stochastic encoders and decoders do not usually enhance the capacity region, their use simplifies the analysis. The Jensen inequality lies at the heart of error analysis, which enables us to deal with the expectation of many terms coming from stochastic encoders and decoders at once. The technique is illustrated via four examples: point-to-point channel coding, Gelfand-Pinsker, broadcast channel and Berger-Tung problem of distributed lossy compression. Applying the one-shot result for the memoryless broadcast channel in the asymptotic case, we get the entire region of Marton's inner bound without any need for time-sharing. Also, these results are employed in conjunction with multi-dimensional berry-esseen CLT to derive new regions for finite-blocklength regime of Gelfand-Pinsker.

98 citations


Cites result from "Lossy Joint Source-Channel Coding i..."

  • ...Orig inally studied by Strassen [1], there has been a recent surge of works on this topic following the results of Polyans kiy et al [2] (see for instance [3]-[6])....

    [...]

Book
04 Sep 2014
TL;DR: This monograph presents a unified treatment of single- and multi-user problems in Shannon's information theory where the requirement that the error probability decays asymptotically in the blocklength is departed.
Abstract: This monograph presents a unified treatment of single- and multi-user problems in Shannon's information theory where we depart from the requirement that the error probability decays asymptotically in the blocklength. Instead, the error probabilities for various problems are bounded above by a non-vanishing constant and the spotlight is shone on achievable coding rates as functions of the growing blocklengths. This represents the study of asymptotic estimates with non-vanishing error probabilities.In Part I, after reviewing the fundamentals of information theory, we discuss Strassen's seminal result for binary hypothesis testing where the type-I error probability is non-vanishing and the rate of decay of the type-II error probability with growing number of independent observations is characterized. In Part II, we use this basic hypothesis testing result to develop second- and sometimes, even third-order asymptotic expansions for point-to-point communication. Finally in Part III, we consider network information theory problems for which the second order asymptotics are known. These problems include some classes of channels with random state, the multiple-encoder distributed lossless source coding (Slepian-Wolf) problem and special cases of the Gaussian interference and multiple-access channels. Finally, we discuss avenues for further research.

96 citations

Proceedings ArticleDOI
01 Oct 2012
TL;DR: Invoking random coding, but not typical sequences, non-asymptotic achievability results for the major setups in multiuser information theory are given.
Abstract: Invoking random coding, but not typical sequences, we give non-asymptotic achievability results for the major setups in multiuser information theory. No limitations, such as memorylessness or discreteness, on sources/channels are imposed. All the bounds given are powerful enough to yield the constructive side of the (asymptotic) capacity regions in the memoryless case. The approach relies on simple non-asymptotic counterparts of the packing and covering lemmas conventionally used in conjunction with the typical sequence approach.

83 citations

References
More filters
Journal ArticleDOI
TL;DR: This final installment of the paper considers the case where the signals or the messages or both are continuously variable, in contrast with the discrete nature assumed until now.
Abstract: In this final installment of the paper we consider the case where the signals or the messages or both are continuously variable, in contrast with the discrete nature assumed until now. To a considerable extent the continuous case can be obtained through a limiting process from the discrete case by dividing the continuum of messages and signals into a large but finite number of small regions and calculating the various parameters involved on a discrete basis. As the size of the regions is decreased these parameters in general approach as limits the proper values for the continuous case. There are, however, a few new effects that appear and also a general change of emphasis in the direction of specialization of the general results to particular cases.

65,425 citations

Journal ArticleDOI
TL;DR: It is shown analytically that the maximal rate achievable with error probability ¿ isclosely approximated by C - ¿(V/n) Q-1(¿) where C is the capacity, V is a characteristic of the channel referred to as channel dispersion, and Q is the complementary Gaussian cumulative distribution function.
Abstract: This paper investigates the maximal channel coding rate achievable at a given blocklength and error probability. For general classes of channels new achievability and converse bounds are given, which are tighter than existing bounds for wide ranges of parameters of interest, and lead to tight approximations of the maximal achievable rate for blocklengths n as short as 100. It is also shown analytically that the maximal rate achievable with error probability ? isclosely approximated by C - ?(V/n) Q-1(?) where C is the capacity, V is a characteristic of the channel referred to as channel dispersion , and Q is the complementary Gaussian cumulative distribution function.

3,242 citations

Journal ArticleDOI
TL;DR: It is shown that in order to achieve an optimal cost-distortion tradeoff, the source and the channel have to be matched in a probabilistic sense, which leads to a result on optimal single-source broadcast communication.
Abstract: What makes a source-channel communication system optimal? It is shown that in order to achieve an optimal cost-distortion tradeoff, the source and the channel have to be matched in a probabilistic sense. The match (or lack of it) involves the source distribution, the distortion measure, the channel conditional distribution, and the channel input cost function. Closed-form necessary and sufficient expressions relating the above entities are given. This generalizes both the separation-based approach as well as the two well-known examples of optimal uncoded communication. The condition of probabilistic matching is extended to certain nonergodic and multiuser scenarios. This leads to a result on optimal single-source broadcast communication.

588 citations

MonographDOI
01 Jan 2011

512 citations

Frequently Asked Questions (13)
Q1. What are the contributions in "Lossy joint source-channel coding in the finite blocklength regime" ?

This paper finds new tight finite-blocklength bounds for the best achievable lossy joint source-channel code rate, and demonstrates that joint source-channel code design brings considerable performance advantage over a separate one in the non-asymptotic regime. In this paper the authors show that even when this condition is not satisfied, symbol-by-symbol transmission is, in some cases, the best known strategy in the non-asymptotic regime. 

The reason for possibly lower achievable dispersion in this case is that the authors have the freedom to map the unlikely source realizations leading to high probability of failure to those codewords resulting in the maximum variance so as to increase the probability that the channel output escapes the decoding failure region. 

The rate-dispersion function of joint source-channel coding (source samples squared per channel use) is defined asV(d, α) = lim ǫ→0 lim sup n→∞n ( C(α) R(d) −R(n, d, ǫ, α) )22 loge 1 ǫ(11)where C(α) and R(d) are the channel capacity-cost and source rate-distortion functions, respectively. 

1The distortion-dispersion function of joint source-channel coding is defined asW(R,α) = lim ǫ→0 lim sup n→∞n ( D ( C(α) R ) −D(nR, n, ǫ, α) )22 loge 1 ǫ(12) where D(·) is the distortion-rate function of the source. 

Optimizing over γ, T and the distributions of the auxiliary random variables Ȳ and W , the authors obtain the best possible bound for a given encoder PX|S . 

The output of the optimum source encoder is, for large k, approximately equiprobable over a set of5As the maximal (over source outputs) error probability cannot be lower than the worst-case error probability, the maximal error probability achievability bounds of [8] apply to bound ǫ⋆(M). 

For a discrete source, the single-letter erasure distortion measure is defined as the following mapping d : S×{S, e} 7→ [0,∞]:8d(s, z) = 0 z = s H(S) z = e∞ otherwise (211)For any 0 ≤ d ≤ H(S), the rate-distortion function of the equiprobable source is achieved byPZ⋆|S=s(z) = { 1− dH(S) z = s dH(S) z = e (212)The rate-distortion function and the d-tilted information for the equiprobable source with the erasure distortion measure are given by, respectively,R(d) = H(S)− d (213) S(s, d) = ıS(s)− d (214)Note that, trivially, S(S, d) = R(d) = log |S| − d a.s. 

Since the distortion cdf of any single code does not majorize the cdfs of all possible codes, the converse bound on the average distortion obtained through this approach, although asymptotically tight, may be loose at short blocklengths. 

3) As evidenced by their numerical results, the converse result in Theorem 5, which applies to those channels satisfying a certain symmetry condition and which is a consequence of the hypothesis testing converse in Theorem 4, can outperform the d-tilted information converse in Theorem 3. 

From the channel coding theorem the authors know that there exists a channel code that is capable of distinguishing, with high probability, M = exp (kR(d)) < exp (nC) messages when equipped with the maximum likelihood decoder. 

In the absence of an input cost constraint the authors simplify the terminology and refer to the code as (d, ǫ) lossy source-channel code. 

The error probability with this type of list decoding is the probability that the source outcome S does not belong to the decoder output list for Y : 1− ∑x∈X∑y∈Y∑s̃∈M(L)∑ s∈s̃ PS̃|Y (s̃|y)PY |X(y|x)PX|S(x|s)PS(s)(54)where M(L) is the set of all QS-measurable subsets of M with QS-measure not exceeding L.Definition 7 (List code). 

In particular, the optimum binary hypothesis test W⋆ for deciding between PSPX|SPY |X and QSPX|SPȲ satisfiesW ⋆ − (S, ıX;Ȳ (X ;Y ))−