Journal Article•DOI•

Lossy Joint Source-Channel Coding in the Finite Blocklength Regime

Victoria Kostina¹, Sergio Verdu¹•Institutions (1)

01 May 2013-IEEE Transactions on Information Theory (IEEE)-Vol. 59, Iss: 5, pp 2545-2575

TL;DR: It is shown that even when this condition is not satisfied, symbol-by-symbol transmission is, in some cases, the best known strategy in the nonasymptotic regime.

read less

Abstract: This paper finds new tight finite-blocklength bounds for the best achievable lossy joint source-channel code rate, and demonstrates that joint source-channel code design brings considerable performance advantage over a separate one in the nonasymptotic regime. A joint source-channel code maps a block of k source symbols onto a length-n channel codeword, and the fidelity of reproduction at the receiver end is measured by the probability e that the distortion exceeds a given threshold d. For memoryless sources and channels, it is demonstrated that the parameters of the best joint source-channel code must satisfy nC - kR(d) ≈ √(nV + k V(d)) Q-1(e), where C and V are the channel capacity and channel dispersion, respectively; R(d) and V(d) are the source rate-distortion and rate-dispersion functions; and Q is the standard Gaussian complementary cumulative distribution function. Symbol-by-symbol (uncoded) transmission is known to achieve the Shannon limit when the source and channel satisfy a certain probabilistic matching condition. In this paper, we show that even when this condition is not satisfied, symbol-by-symbol transmission is, in some cases, the best known strategy in the nonasymptotic regime.

...read moreread less

Summary (5 min read)

Jump to: [I. INTRODUCTION] – [II. DEFINITIONS] – [Definition 4. Fix α and d ≥ d min . The rate-dispersion function of joint source-channel coding (source samples squared per channel use) is defined as] – [Theorem 1 (Converse] – [B. Converses via hypothesis testing and list decoding] – [Definition 7 (List code] – [IV. ACHIEVABILITY] – [Theorem 7 (Achievability). There exists a (d, ǫ) sourcechannel code with] – [Theorem 8 (Achievability)] – [VI. LOSSY TRANSMISSION OF A BMS OVER A BSC] – [Theorem 14 (Achievability, BMS-BSC).] – [VIII. TO CODE OR NOT TO CODE] – [Definition 10. The distortion-dispersion function of symbolby-symbol joint source-channel coding is defined as] – [Theorem 20 (Achievability, symbol-by-symbol code). Under restrictions] – [B. Uncoded transmission of a BMS over a BSC] – [It achieves, at blocklength n and excess distortion probability] – [C. Symbol-by-symbol coding for lossy transmission of a GMS over an AWGN channel] – [E. Symbol-by-symbol transmission of a DMS over a DEC under logarithmic loss] and [IX. CONCLUSION]

I. INTRODUCTION

In the limit of infinite blocklengths, the optimal achievable coding rates in channel coding and lossy data compression are characterized by the channel capacity C and the source ratedistortion function R(d), respectively [3] .
While computable formulas for the This work was supported in part by the National Science Foundation (NSF) under Grant CCF-1016625 and by the Center for Science of Information (CSoI), an NSF Science and Technology Center, under Grant CCF-0939370.
Such bounds were shown in [8] for the channel coding problem and in [9] for the source coding problem.
The error exponent approximation and the Gaussian approximation to the non-asymptotic fundamental limit are tight in different operational regimes.
Section II summarizes basic definitions and notation.

II. DEFINITIONS

(i.e. a code with M codewords and average error probability ǫ and cost α).
The dispersion, which serves to quantify the penalty on the rate of the best JSCC code induced by the finite blocklength, is defined as follows.

Definition 4. Fix α and d ≥ d min . The rate-dispersion function of joint source-channel coding (source samples squared per channel use) is defined as

EQUATION ) where C(α) and R(d) are the channel capacity-cost and source rate-distortion functions, respectively.
The following properties of d−tilted information, proven in [19] , are used in the sequel.
The authors use the same notation ı S;Z for that more general function.
All results in those sections generalize to the case of a maximal cost constraint by considering X whose distribution is supported on the subset of allowable channel inputs: EQUATION rather than the entire channel input alphabet X .

Theorem 1 (Converse

The authors write summations over alphabets for simplicity.
Unless stated otherwise, all their results hold for abstract probability spaces.
To obtain a code-independent converse, the authors simply choose P X|S that gives the weakest bound, and (23) follows.

B. Converses via hypothesis testing and list decoding

While traditionally list decoding has only been considered in the context of finite alphabet sources, the authors generalize the setting to sources with abstract alphabets.
Even though the authors keep the standard "list" terminology, the decoder output need not be a finite or countably infinite set.

Definition 7 (List code

Any converse for list decoding implies a converse for conventional decoding.
The hypothesis testing converse for channel coding [8, Theorem 27 ] can be generalized to joint source-channel coding with list decoding as follows.
Note that this is a hypothetical test, which has access to both the source outcome and the decoder output.
The Neyman-Pearson lemma (e.g. [20] ) implies that the outcome of the optimum binary hypothesis test between P and Q only depends on the observation through dP dQ, also known as Proof.
In the case of finite channel input and output alphabets, the channel symmetry assumption of Theorem 5 holds, in particular, if the rows of the channel transition probability matrix are permutations of each other, and P Ȳ n is the equiprobable distribution on the (n-dimensional) channel output alphabet, which, coincidentally, is also the capacityachieving output distribution.

IV. ACHIEVABILITY

If both the source and the channel code are chosen separationoptimally for their given sizes, the separation principle guarantees that under certain quite general conditions (which encompass the memoryless setting, see [21] ) the asymptotic fundamental limit of joint source-channel coding is achievable.
In the finite blocklength regime, however, such SSCC construction is, in general, only suboptimal.
The dispersion achieved by the conventional SSCC approach is in fact suboptimal.
At finite n, the output of the optimum source encoder need not be nearly equiprobable, so there is no reason to expect that a separated scheme employing a maximum-likelihood channel decoder, which does not exploit unequal message probabilities, would achieve near-optimal non-asymptotic performance.
The following achievability result, obtained using independent random source codes and random channel codes within the paradigm of Definition 8, capitalizes on this intuition.

Theorem 7 (Achievability). There exists a (d, ǫ) sourcechannel code with

The authors will construct a code with separate encoders for source and channel and separate decoders for source and channel as in Definition 8.
The authors now proceed to analyze the performance of the code described above.
The authors now average (90) over the source and channel codebooks.
The code size M that leads to tight achievability bounds following from Theorem 7 is in general much larger than the size that achieves the minimum in (81).

Theorem 8 (Achievability)

Theorem 9 (Achievability, almost-lossless JSCC [17] ).
The technical condition (iv) ensures applicability of the Gaussian approximation in the following result.

VI. LOSSY TRANSMISSION OF A BMS OVER A BSC

The rate-distortion function of the source and the channel capacity are given by, respectively, EQUATION.
For convenience, the authors define the discrete random variable U α,β by EQUATION Furthermore, the binomial sum is denoted by EQUATION A straightforward particularization of the d-tilted information converse in Theorem 2 leads to the following result.
The hypothesis-testing converse in Theorem 4 particularizes to the following result: Theorem 12 (Converse, BMS-BSC).
If the source is equiprobable, the bound in Theorem 12 becomes particularly simple, as the following result details.

Theorem 14 (Achievability, BMS-BSC).

A source of fair coin flips has zero dispersion, and as anticipated in Remark 8, JSSC does not afford much gain in the finite blocklength regime (Fig. 5 ).
Moreover, in that case the JSCC achievability bound in Theorem 8 is worse than the SSCC achievability bound.
The situation is different if the source is biased, with JSCC showing significant gain over SSCC .

VIII. TO CODE OR NOT TO CODE

The authors goal in this section is to compare the excess distortion performance of the optimal code of rate 1 at channel blocklength n with that of the optimal symbol-by-symbol code, evaluated after n channel uses, leveraging the bounds in Sections III and IV and the approximation in Section V.
The authors show certain examples in which symbol-by-symbol coding is, in fact, either optimal or very close to being optimal.
A general conclusion drawn from this section is that even when no coding is asymptotically suboptimal it can be a very attractive choice for short blocklengths [2] .

Definition 10. The distortion-dispersion function of symbolby-symbol joint source-channel coding is defined as

EQUATION ) where D is the distortion-rate function of the source.
Condition (v) ensures that symbol-by-symbol transmission attains the minimum average (over source realizations) distortion achievable among all codes of any blocklength.
The following results pertain to the full distribution of the distortion incurred at the receiver output and not just its mean.

Theorem 20 (Achievability, symbol-by-symbol code). Under restrictions

If (v) holds, then there exist a symbol-by-symbol encoder and decoder such that the conditional distribution of the output of the decoder given the source outcome coincides with distribution P Z ⋆ |S , so the excess-distortion probability of this symbol-by-symbol code is given by the left side of (189), also known as Proof.
EQUATION where EQUATION Moreover, if there is no power constraint, EQUATION EQUATION where θ(n) is that in Theorem 10.
In other words, not only do such symbol-by-symbol codes attain the minimum average distortion but also the variance of distortions at the decoder's output is the minimum achievable among all codes operating at that average distortion.

B. Uncoded transmission of a BMS over a BSC

If the encoder and the decoder are both identity mappings (uncoded transmission), the resulting joint distribution satisfies condition (v).
As is well known, regardless of the blocklength, the uncoded symbol-by-symbol scheme achieves the minimum bit error rate (averaged over source and channel).
Here, the authors are interested instead in examining the excess distortion probability criterion.
Consider an application where, if the fraction of erroneously received bits exceeds a certain threshold, then the entire output packet is useless.
Moreover, the uncoded transmission attains the minimum bit error rate threshold D(n, n, ǫ) achievable among all codes operating at blocklength n, regardless of the allowed ǫ, as the following result demonstrates.

It achieves, at blocklength n and excess distortion probability

For the transmission of the fair binary source over a BSC, Fig. 8 shows the distortion achieved by the uncoded scheme, the separated scheme and the JSCC scheme of Theorem 14 versus n for a fixed excess-distortion probability ǫ = 0.01.
The no coding / converse curve in Fig. 8 depicts one of those singular cases where the non-asymptotic fundamental limit can be computed precisely.
As the blocklength increases, the performance of the separated scheme approaches that of the no-coding scheme, but according to Theorem 23 it can never outperform it.
Had the authors allowed the excess distortion probability to vanish sufficiently slowly, the JSCC curve would have approached the Shannon limit as n → ∞.
Nevertheless, uncoded transmission performs remarkably well in the displayed range of blocklengths, achieving the converse almost exactly at blocklengths less than 100, and outperforming the JSCC achievability result in Theorem 14 at blocklengths as long as 700.

C. Symbol-by-symbol coding for lossy transmission of a GMS over an AWGN channel

The next result characterizes the distribution of the distortion incurred by the symbol-by-symbol scheme that attains the minimum average distortion.
On the other hand, using (130), the authors compute EQUATION.
Indeed, in the range of blocklenghts displayed in Figure 11 , the symbol-by-symbol code even outperforms the converse for codes operating under a maximal power constraint.

E. Symbol-by-symbol transmission of a DMS over a DEC under logarithmic loss

Curiously, for any 0 ≤ d ≤ H(S), the rate-distortion function and the d-tilted information are given respectively by ( 213) and (214), even if the source is not equiprobable.
In fact, the rate-distortion function is achieved by, EQUATION and the channel that is matched to the equiprobable source under logarithmic loss is exactly the DEC in (215).
Finally, it is easy to verify that the distortion-dispersion function of symbol-by-symbol coding under logarithmic loss is the same as that under erasure distortion and is given by (216).

IX. CONCLUSION

Which hold in wide generality and are tight enough to determine the dispersion of joint source-channel coding for the transmission of an abstract memoryless source over either a DMC or a Gaussian channel, under an arbitrary fidelity measure.the authors.
The major results and conclusions are the following.
6) For the transmission of a stationary memoryless source over a stationary memoryless channel, the Gaussian approximation in Theorem 10 (neglecting the remainder θ(n)) provides a simple estimate of the maximal nonasymptotically achievable joint source-channel coding rate.
8) Even in the absence of a probabilistic match between the source and the channel, symbol-by-symbol transmission, though asymptotically suboptimal, might outperform separate source-channel coding and joint source-channel random coding in the finite blocklength regime.
The authors are grateful to Dr. Oliver Kosut for offering numerous comments, and, in particular, suggesting the simplification of the achievability bound in [1] with the tighter version in Theorem 8.

Did you find this useful? Give us your feedback

Figures (7)

Fig. 2. An (M, d, ǫ) joint source-channel code.

Fig. 6. Rate-blocklength tradeoff for the transmission of aBMS with bias p = 0.11 over a BSC with crossover probabilityδ = p = 0.11 andd = 0.05, ǫ = 10−2.

Fig. 12. An example where (233) holds with equality.

Fig. 4. Rate-blocklength tradeoff for the transmission of aBMS with bias p = 0.11 over a BSC with crossover probabilityδ = p = 0.11 andd = 0, ǫ = 10−2 .

Fig. 8. Distortion-blocklength tradeoff for the transmission of a fair BMS over a BSC with crossover probabilityδ = 0.11 andR = 1, ǫ = 10−2.

Fig. 10. Distortion-blocklength tradeoff for the transmission of a BMS with p = 2 5 over a BSC with crossover probabilityδ = 0.11 andR = 1, ǫ = 10−2.

Fig. 11. Distortion-blocklength tradeoff for the transmission of a GMS over an AWGN channel withP σ2 N = 1 andR = 1, ǫ = 10−2.

Content maybe subject to copyright Report

arXiv:1209.1317v2 [cs.IT] 4 Feb 2014

Lossy joint source-channel coding

in the ﬁnite blocklength regime

Victoria Kostina, Student Member, IEEE, Sergio Verd´u, Fellow, IEEE

Abstract—This paper ﬁnds new tight ﬁnite-blocklength bounds

for the best achievable lossy joint source-channel code rate,

and demonstrates that joint source-channel code design brings

considerable performance advantage over a separate one in the

non-asymptotic regime. A joint source-channel code maps a block

of k source symbols onto a length−n channel codeword, and the

ﬁdelity of reproduction at the receiver end is measured by the

probability ǫ that the distortion exceeds a given threshold d. For

memoryless sources and channels, it is demonstrated that the

parameters of the best joint source-channel code must satisfy

nC − kR(d) ≈

nV + kV (d)Q

−1

(ǫ), where C and V are the

channel capacity and channel dispersion, respectively; R(d) and

V(d) are the source rate-distortion and rate-dispersion functions;

and Q is the standard Gaussian complementary cdf. Symbol-by-

symbol (uncoded) transmission is known to achieve the Shannon

limit when the source and channel satisfy a certain probabilistic

matching condition. In this paper we show that even when this

condition is not satisﬁed, symbol-by-symbol transmission is, in

some cases, the best known strategy in the non-asymptotic regime.

Index Terms—Achievability, converse, ﬁnite blocklength

regime, joint source-channel coding, lossy source coding, memo-

ryless sources, rate-distortion theory, Shannon theory.

I. INTRODUCTION

In the limit of inﬁnite blocklengths, the optimal achievable

coding rates in channel coding and lossy data compression are

characterized by the channel capacity C and the source rate-

distortion function R(d), respectively [

3]. For a large class

of sources and channels, in the limit of large blocklength,

the maximum achievable joint source-channel coding (JSCC)

rate compatible with vanishing excess distortion probability

is characterized by the ratio

R(d)

[

4]. A perennial question in

information theory is how relevant the asymptotic fundamental

limits are when the communication system is forced to operate

at a given ﬁxed blocklength. The ﬁnite blocklength (delay)

constraint is inherent to all communication scenarios. In fact,

in many systems of current interest, such as real-time multi-

media communication, delays are strictly constrained, while

in packetized data communication, packets are frequently on

the order of 1000 bits. While computable formulas for the

This work was supported in part by the National Science Foundation (NSF)

under Grant CCF-1016625 and by the Center for Science of Information

(CSoI), an NSF Science and Technology Center, under Grant CCF-0939370.

The work of V. Kostina was supported in part by the Natural Sciences and

Engineering Research Council of Canada.

Portions of this paper were presented at the 2012 IEEE International

Symposium on Information Theory [

1], and at the 2012 IEEE Information

Theory Workshop [2].

The authors are with the Department of Electrical Engineering,

Princeton University, NJ 08544 USA (e-mail: vkostina@princeton.edu;

verdu@princeton. edu).

channel capacity and the source rate-distortion function are

available for a wide class of channels and sources, the luxury

of being able to compute exactly (in polynomial time) the non-

asymptotic fundamental limit of interest is rarely affordable.

Notable exceptions where the non-asymptotic fundamental

limit is indeed computable are almost lossless source coding

[

5], [6], and JSCC over matched source-channel pairs [7]. In

general, however, one can at most hope to obtain bounds and

approximations to the information-theoretic non-asymptotic

fundamental limits.

Although non-asymptotic bounds can be distilled from

classical proofs of coding theorems, these bounds are rarely

satisfyingly tight in the non-asymptotic regime, as studied in

[

8], [9] in the contexts of channel coding and lossy source

coding, respectively. For the JSCC problem, the classical

converse is based on the mutual information data process-

ing inequality, while the classical achievability scheme uses

separate source/channel coding (SSCC), in which the channel

coding block and the source coding block are optimized

separately without knowledge of each other. These conven-

tional approaches lead to disappointingly weak non-asymptotic

bounds. In particular, SSCC can be rather suboptimal non-

asymptotically. An accurate ﬁnite blocklength analysis there-

fore calls for novel upper and lower bounds that sandwich

tightly the non-asymptotic fundamental limit. Such bounds

were shown in [

8] for the channel coding problem and in

[9] for the source coding problem. In this paper, we derive

new tight bounds for the JSCC problem, which hold in full

generality, without any assumptions on the source alphabet,

stationarity or memorylessness.

While numerical evaluation of the non-asymptotic upper and

lower bounds bears great practical interest (for example, to de-

cide how suboptimal with respect to the information-theoretic

limit a given blocklength-n code is), such bounds usually

involve cumbersome expressions that offer scant conceptual

insight. Somewhat ironically, to get an elegant, insightful

approximation of the non-asymptotic fundamental limit, one

must resort to an asymptotic analysis of these non-asymptotic

bounds. Such asymptotic analysis must be ﬁner than that

based on the law of large numbers, which sufﬁces to obtain

the asymptotic fundamental limit but fails to provide any

estimate of the speed of convergence to that limit. There

are two complementary approaches to a ﬁner asymptotic

analysis: the large deviations analysis which leads to error

exponents, and the Gaussian approximation analysis which

leads to dispersion. The error exponent approximation and the

Gaussian approximation to the non-asymptotic fundamental

limit are tight in different operational regimes. In the former, a

rate which is strictly suboptimal with respect to the asymptotic

fundamental limit is ﬁxed, and the error exponent measures

the exponential decay of the error probability to 0 as the

blocklength increases. The error exponent approximation is

tight if the error probability a system can tolerate is extremely

small. However, already for probability of error as low as 10

−6

to 10

−1

, which is the operational regime for many high data

rate applications, the Gaussian approximation, which gives

the optimal rate achievable at a given error probability as

a function of blocklength, is tight [

8], [9]. In the channel

coding problem, the Gaussian approximation of R

⋆

(n, ǫ),

the maximum achievable ﬁnite blocklength coding rate at

blocklength n and error probability ǫ, is given by, for ﬁnite

alphabet stationary memoryless channels [

8],

⋆

(n, ǫ) = nC −

√

nV Q

−1

(ǫ) + O (log n) (1)

where C and V are the channel capacity and dispersion,

respectively. In the lossy source coding problem, the Gaussian

approximation of R

⋆

(k, d, ǫ), the minimum achievable ﬁnite

blocklength coding rate at blocklength k and probability ǫ of

exceeding ﬁdelity d, is given by, for stationary memoryless

sources [9],

⋆

(k, d, ǫ) = kR(d) +

kV(d)Q

−1

(ǫ) + O (log k) (2)

where R(d) and V(d) are the rate-distortion and the rate-

dispersion functions, respectively.

For a given code, the excess distortion constraint, which is

the ﬁgure of merit in this paper as well as in [9], is, in a

way, more fundamental than the average distortion constraint,

because varying d over its entire range and evaluating the

probability of exceeding d gives full information about the

distribution (and not just its mean) of the distortion incurred at

the decoder output. Following the philosophy of [

8], [9], in this

paper we perform the Gaussian approximation analysis of our

new bounds to show that k, the maximum number of source

symbols transmissible using a given channel blocklength n,

must satisfy

nC − kR(d) =

nV + kV(d)Q

−1

(ǫ) + O (log n) (3)

under the ﬁdelity constraint of exceeding a given distortion

level d with probability ǫ. In contrast, if, following the SSCC

paradigm, we just concatenate the channel code in (

1) and the

source code in (

2), we obtain

nC − kR(d) ≤ min

η+ζ≤ǫ

√

nV Q

−1

(η) +

kV(d)Q

−1

(ζ)

+ O (log n) (4)

which is usually strictly suboptimal with respect to (

3).

In addition to deriving new general achievability and con-

verse bounds for JSCC and performing their Gaussian ap-

proximation analysis, in this paper we revisit the dilemma of

whether one should or should not code when operating under

delay constraints. Gastpar et al. [

7] gave a set of necessary and

sufﬁcient conditions on the source, its distortion measure, the

channel and its cost function in order for symbol-by-symbol

transmission to attain the minimum average distortion. In these

curious cases, the source and the channel are probabilistically

matched. In the absence of channel cost constraints, we show

that whenever the source and the channel are probabilistically

matched so that symbol-by-symbol coding achieves the min-

imum average distortion, it also achieves the dispersion of

joint source-channel coding. Moreover, even in the absence of

such a match between the source and the channel, symbol-

by-symbol transmission, though asymptotically suboptimal,

might outperform in the non-asymptotic regime not only

separate source-channel coding but also our random-coding

achievability bound.

Prior research relating to ﬁnite blocklength analysis of JSCC

includes the work of Csisz´ar [

10], [11] who demonstrated

that the error exponent of joint source-channel coding out-

performs that of separate source-channel coding. For discrete

source-channel pairs with average distortion criterion, Pilc’s

achievability bound [

12], [13] applies. For the transmission

of a Gaussian source over a discrete channel under the

average mean square error constraint, Wyner’s achievability

bound [

14], [15] applies. Non-asymptotic achievability and

converse bounds for a graph-theoretic model of JSCC have

been obtained by Csisz´ar [

16]. Most recently, Tauste Campo et

al. [17] showed a number of ﬁnite-blocklength random-coding

bounds applicable to the almost-lossless JSCC setup, while

Wang et al. [

18] found the dispersion of JSCC for sources

and channels with ﬁnite alphabets.

The rest of the paper is organized as follows. Section

summarizes basic deﬁnitions and notation. Sections III and

IV introduce the new converse and achievability bounds to

the maximum achievable coding rate, respectively. A Gaussian

approximation analysis of the new bounds is presented in

Section

V. The evaluation of the bounds and the approximation

is performed for two important special cases: the transmission

of a binary memoryless source (BMS) over a binary symmetric

channel (BSC) with bit error rate distortion (Section

VI) and

the transmission of a Gaussian memoryless source (GMS) with

mean-square error distortion over an AWGN channel with a

total power constraint (Section

VII). Section VIII focuses on

symbol-by-symbol transmission.

II. DEFINITIONS

A lossy source-channel code is a pair of (possibly random-

ized) mappings f : M 7→ X and g : Y 7→

M. A distortion

measure d : M ×

M 7→ [0, +∞] is used to quantify the per-

formance of the lossy code. A cost function c: X 7→ [0, +∞]

may be imposed on the channel inputs. The channel is used

without feedback.

Deﬁnition 1. The pair (f, g) is a (d, ǫ, α) lossy source-

channel code for {M, X, Y,

M, P

, d, P

Y |X

, c} if

P [d (S, g(Y )) > d] ≤ ǫ and either E [c(X)] ≤ α (average

cost constraint) or c(X) ≤ α a.s. (maximal cost constraint),

where f(S) = X (see Fig.

1). In the absence of an input cost

constraint we simplify the terminology and refer to the code

as (d, ǫ) lossy source-channel code.

The special case d = 0 and d(s, z) = 1 {s 6= z} corresponds

to almost-lossless compression. If, in addition, P

is equiprob-

able on an alphabet of cardinality |M| = |

M| = M, a (0, ǫ, α)

code in Deﬁnition 1 corresponds to an (M, ǫ, α) channel code

Y |X

P [d (S, Z) > d] ≤ ǫ

f =

Fig. 1. A (d, ǫ) joint source-channel code.

(i.e. a code with M codewords and average error probability ǫ

and cost α). On the other hand, if P

Y |X

is an identity mapping

on an alphabet of cardinality M without cost constraints, a

(d, ǫ) code in Deﬁnition

1 corresponds to an (M, d, ǫ) lossy

compression code (as e.g. deﬁned in [9]).

As our bounds in Sections

III and IV do not foist a Cartesian

structure on the underlying alphabets, we state them in the one-

shot paradigm of Deﬁnition

1. When we apply those bounds

to the block coding setting, transmitted objects indeed become

vectors, and Deﬁnition

2 below comes into play.

Deﬁnition 2. In the conventional ﬁxed-to-ﬁxed (or block)

setting in which X and Y are the n−fold Cartesian prod-

ucts of alphabets A and B, M and

M are the k−fold

Cartesian products of alphabets S and

S, and d

: S

7→ [0, +∞], c

: A

7→ [0, +∞], a (d, ǫ, α) code for

, A

, B

, P

k , d

, P

, c

} is called a

(k, n, d, ǫ, α) code (or a (k, n, d, ǫ) code if there is no cost

constraint).

Deﬁnition 3. Fix ǫ, d, α and the channel blocklength n.

The maximum achievable source blocklength and coding rate

(source symbols per channel use) are deﬁned by, respectively

⋆

(n, d, ǫ, α) = sup {k : ∃(k, n, d, ǫ, α) code} (5)

R(n, d, ǫ, α) =

⋆

(n, d, ǫ, α) (6)

Alternatively, ﬁx ǫ, α, source blocklength k and channel

blocklength n. The minimum achievable excess distortion is

deﬁned by

D(k, n, ǫ, α) = inf {d : ∃(k, n, d, ǫ, α) code} (7)

Denote, for a given P

Y |X

and a cost function c: X 7→

[0, +∞],

C(α) = sup

E[c(X)]≤α

I(X; Y ) (8)

and, for a given P

and a distortion measure d : M ×

M 7→

[0, +∞],

(d) = inf

Z|S

E[d(S,Z)]≤d

I(S; Z) (9)

We impose the following basic restrictions on P

Y |X

, P

, the

input-cost function and the distortion measure:

(a) R

(d) is ﬁnite for some d, i.e. d

min

< ∞, where

min

= inf {d: R

(d) < ∞}; (10)

(b) The inﬁmum in (

9) is achieved by a unique P

⋆

;

⋆

The dispersion, which serves to quantify the penalty on the

rate of the best JSCC code induced by the ﬁnite blocklength,

is deﬁned as follows.

Deﬁnition 4. Fix α and d ≥ d

min

. The rate-dispersion func-

tion of joint source-channel coding (source samples squared

per channel use) is deﬁned as

V(d, α) = lim

ǫ→0

lim sup

n→∞



C(α)

R(d)

− R(n, d, ǫ, α)



2 log

(11)

where C(α) and R(d) are the channel capacity-cost and

source rate-distortion functions, respectively.

The distortion-dispersion function of joint source-channel

coding is deﬁned as

W(R, α) = lim

ǫ→0

lim sup

n→∞



C(α)



− D(nR, n, ǫ, α)



2 log

(12)

where D(·) is the distortion-rate function of the source.

If there is no cost constraint, we will simplify notation by

dropping α from (

5), (6), (7), (8), (11) and (12).

Deﬁnition 5 (d−tilted information [

9]). For d > d

min

, the

d−tilted information in s is deﬁned as



(s, d) = log

E [exp (λ

⋆

d − λ

⋆

d(s, Z

⋆

))]

(13)

where the expectation is with respect to P

⋆

, i.e. the uncon-

ditional distribution of the reproduction random variable that

achieves the inﬁmum in (

9), and

⋆

= −R

′

(d) (14)

The following properties of d−tilted information, proven in

[

19], are used in the sequel.



(s, d) = ı

S;Z

⋆

(s; z) + λ

⋆

d(s, z) −λ

⋆

d (15)

E [

(s, d)] = R

(d) (16)

E [exp (λ

⋆

d − λ

⋆

d(S, z) + 

(S, d))] ≤ 1 (17)

where (

15) holds for P

⋆

-almost every z, while (17) holds for

all z ∈

M, and

S;Z

(s; z) = log

Z|S=s

(z) (18)

denotes the information density of the joint distribution P

at (s, z). We can deﬁne the right side of (

18) for a given

Z|S

, P

) even if there is no P

such that the marginal of

Z|S

is P

. We use the same notation ı

S;Z

for that more

general function. To extend Deﬁnition

5 to the lossless case,

for discrete random variables we deﬁne 0-tilted information as



(s, 0) = ı

(s) (19)

While for memoryless sources and channels, C(α) = C(α) and R(d) =

(d) given by (

8) and (9) evaluated with single-letter distributions, it is

important to distinguish between the operational deﬁnitions and the extremal

mutual information quantities, since the core results in this paper allow for

memory.

All log’s and exp’s are in an arbitrary common base.

where

(s) = log

(s)

(20)

is the information in outcome s ∈ M.

The distortion d-ball centered at s ∈ M is denoted by

(s) = {z ∈

M: d(s, z) ≤ d}. (21)

Given (P

, P

Y |X

), we write P

→ P

Y |X

→ P

indicate that P

is the marginal of P

Y |X

, i.e. P

(y) =

x∈X

Y |X

(y|x)P

(x).

So as not to clutter notation, in Sections III and IV we

assume that there are no cost constraints. However, all results

in those sections generalize to the case of a maximal cost

constraint by considering X whose distribution is supported

on the subset of allowable channel inputs:

F(α) = {x ∈ X : c(x) ≤ α} (22)

rather than the entire channel input alphabet X.

III. CONVERSES

A. Converses via d-tilted information

Our ﬁrst result is a general converse bound.

Theorem 1 (Converse). The existence of a (d, ǫ) code for S

and P

Y |X

requires that

ǫ ≥ inf

X|S

sup

γ>0



sup





(S, d) − ı

(X; Y ) ≥ γ



− e xp (−γ)



(23)

≥ sup

γ>0



sup



inf

x∈X





(S, d) − ı

(x; Y ) ≥ γ | S





− e xp (−γ)



(24)

where in (

23), S − X − Y , and the conditional probability

in (

24) is with respect to Y distributed according to P

Y |X=x

(independent of S), and

(x; y) = log

Y |X=x

(y) (25)

Proof: Fix γ and the (d, ǫ) code (P

X|S

, P

Z|Y

). Fix an

arbitrary probability measure P

on Y. Let P

→ P

Z|Y

→

We write summations over alphabets for simplicity. Unless stated other-

wise, all our results hold for abstract probability spaces.

. We can write the probability in the right side of (

23) as





(S, d) − ı

(X; Y ) ≥ γ



= P





(S, d) − ı

(X; Y ) ≥ γ, d(S; Z) > d



+ P





(S, d) − ı

(X; Y ) ≥ γ, d(S; Z) ≤ d



(26)

≤ ǫ

s∈M

(s)

x∈X

X|S

(x|s)

y∈Y

z∈B

(s)

Z|Y

(z|y)

· P

Y |X

(y|x)1



Y |X

(y|x) ≤ P

(y) exp (

(s, d) − γ)



(27)

≤ ǫ + exp (−γ)

s∈M

(s) exp (

(s, d))

y∈Y

(y)

z∈B

(s)

Z|Y

(z|y)

x∈X

X|S

(x|s) (28)

= ǫ + exp (−γ)

s∈M

(s) exp (

(s, d))

y∈Y

(y)

z∈B

(s)

Z|Y

(z|y) (29)

= ǫ + exp (−γ)

s∈M

(s) exp (

(s, d)) P

(s)) (30)

≤ ǫ + exp (−γ)

z∈

(z)

s∈M

(s)

· exp (

(s, d) + λ

⋆

d − λ

⋆

d(s, z)) (31)

≤ ǫ + exp (−γ) (32)

where (

32) is due to (17). Optimizing over γ > 0 and P

we get the best possible bound for a given encoder P

X|S

. To

obtain a code-independent converse, we simply choose P

X|S

that gives the weakest bound, and (

23) follows. To show (24),

we weaken (23) as

ǫ ≥ sup

γ>0



sup

inf

X|S





(S, d) − ı

(X; Y ) ≥ γ



− exp (−γ)



(33)

and observe that for any P

inf

X|S





(S, d) − ı

(X; Y ) ≥ γ



s∈M

(s) inf

X|S=s

x∈X

X|S

(x|s)

y∈Y

Y |X

(y|x)1





(s, d) − ı

(x; y) ≥ γ



(34)

s∈M

(s)

· inf

x∈X

y∈Y

Y |X

(y|x)1





(s, d) − ı

(x; y) ≥ γ



(35)

= E



inf

x∈X





(S, d) − ı

(x; Y ) ≥ γ | S





(36)

An immediate corollary to Theorem

1 is the following

result.

Theorem 2 (Converse). Assume that there exists a distribution

such that the distribution of ı

(x; Y ) (according to

Y |X=x

) does not depend on the choice of x ∈ X. If a (d, ǫ)

code for S and P

Y |X

exists, then

ǫ ≥ sup

γ>0







(S, d) − ı

(x; Y ) ≥ γ



− exp (−γ)



(37)

for an arbitrary x ∈ X. The probability measure P in (

37) is

generated by P

Y |X=x

Proof: Under the assumption, the conditional probability

in the right side of (

24) is the same regardless of the choice

of x ∈ X.

The next result generalizes Theorem 1. When we apply

Theorem 3 in Section V to ﬁnd the dispersion of JSCC, we

will let T be the number of channel input types, and we will let

W be the type of the channel input block. If T = 1, Theorem

3 reduces to Theorem 1.

Theorem 3 (Converse). The existence of a (d, ǫ) code for S

and P

Y |X

requires that

ǫ ≥ inf

X|S

max

γ>0,T



− T exp (−γ)

+ sup

Y ,W :

S−(X,W )−Y





(S, d) − ı

Y |W

(X; Y |W ) ≥ γ





(38)

≥ max

γ>0,T



− T exp (−γ)

+ sup

Y ,W

inf

x∈X





(S, d) − ı

Y |W

(x; Y |W ) ≥ γ | S





(39)

where T is a positive integer, the random variable W takes

values on { 1, . . . , T }, and

Y |W

(x; y|t) = log

Y |X=x,W =t

Y |W =t

(y) (40)

and in (

39), the probability measure is generated by

W |X=x

Y |X=x,W

Proof: Fix a possibly randomized (d, ǫ) code

X|S

, P

Z|Y

}, a positive scalar γ, a positive integer

T , an auxiliary random variable W that satisﬁes

S − (X, W ) − Y , and a conditional probability distribution

: { 1, . . . T } 7→ Y. Let P

|W =t

→ P

Z|Y

→ P

|W =t

i.e. P

Z|W =t

(z) =

y∈Y

Z|Y

(z|y)P

Y |W =t

(y), for all t.

Write





(S, d) − ı

X;Y |W

(X; Y |W ) ≥ γ



≤ ǫ +

s∈M

(s)

t=1

W |S

(t|s)

x∈X

X|S,W

(x|s, t)

y∈Y

Y |X,W

(y|x, t)

z∈B

(s)

Z|Y

(z|y)

· 1



Y |X,W

(y|x, t) ≤ P

Y |W =t

(y) exp (

(s, d) − γ)



(41)

≤ ǫ + exp (−γ)

s∈M

(s) exp (

(s, d))

t=1

W |S

(t|s)

y∈Y

Y |W

(y|t)

z∈B

(s)

Z|Y

(z|y)

x∈X

X|S,W

(x|s, t)

(42)

≤ ǫ + exp (−γ)

t=1

s∈M

(s) exp (

(s, d))

y∈Y

(y|t)

z∈B

(s)

Z|Y

(z|y) (43)

≤ ǫ + exp (−γ)

t=1

s∈M

(s) exp (

(s, d)) P

Z|W =t

(s))

(44)

≤ ǫ + exp (−γ)

t=1

s∈M

(s)

z∈

Z|W =t

(z)

· exp (

(s, d) + λ

⋆

d − λ

⋆

d(s, z)) (45)

≤ ǫ + T exp (−γ) (46)

where (

46) is due to (17). Optimizing over γ, T and the

distributions of the auxiliary random variables

Y and W , we

obtain the best possible bound for a given encoder P

X|S

To obtain a code-independent converse, we simply choose

X|S

that gives the weakest bound, and (

38) follows. To show

(39), we weaken (38) by restricting the sup to W satisfying

S −X −W and changing the order of inf and sup as follows:

max

γ>0,T

sup

Y ,W :

S−(X,W )−Y

S−X−W

inf

X|S

(47)

Observe that for any legitimate choice of

Y and W ,

inf

X|S





(S, d) − ı

Y |W

(X; Y |W ) ≥ γ



(48)

s∈M

(s) inf

X|S=s

x∈X

X|S

(x|s)

t=1

W |X

(t|x)

y∈Y

Y |X,W

(y|x, t)1





(s, d) − ı

Y |W

(x; y|t) ≥ γ



(49)

s∈M

(s) inf

x∈X

t=1

W |X

(t|x)

y∈Y

Y |X,W

(y|x, t)

· 1





(s, d) − ı

Y |W

(x; y|t) ≥ γ



(50)

which is equal to the expectation on the right side of (

39).

HTML Viewer

Frequently Asked Questions (13)

Q1. What are the contributions in "Lossy joint source-channel coding in the finite blocklength regime" ?

This paper finds new tight finite-blocklength bounds for the best achievable lossy joint source-channel code rate, and demonstrates that joint source-channel code design brings considerable performance advantage over a separate one in the non-asymptotic regime. In this paper the authors show that even when this condition is not satisfied, symbol-by-symbol transmission is, in some cases, the best known strategy in the non-asymptotic regime.

Q2. What is the reason for the lower achievable dispersion in this case?

The reason for possibly lower achievable dispersion in this case is that the authors have the freedom to map the unlikely source realizations leading to high probability of failure to those codewords resulting in the maximum variance so as to increase the probability that the channel output escapes the decoding failure region.

Q3. What is the rate-dispersion function of a joint source-channel coding?

The rate-dispersion function of joint source-channel coding (source samples squared per channel use) is defined asV(d, α) = lim ǫ→0 lim sup n→∞n ( C(α) R(d) −R(n, d, ǫ, α) )22 loge 1 ǫ(11)where C(α) and R(d) are the channel capacity-cost and source rate-distortion functions, respectively.

Q4. What is the distortion-dispersion function of a joint source-channel coding?

1The distortion-dispersion function of joint source-channel coding is defined asW(R,α) = lim ǫ→0 lim sup n→∞n ( D ( C(α) R ) −D(nR, n, ǫ, α) )22 loge 1 ǫ(12) where D(·) is the distortion-rate function of the source.

Q5. What is the possible bound for a given encoder?

Optimizing over γ, T and the distributions of the auxiliary random variables Ȳ and W , the authors obtain the best possible bound for a given encoder PX|S .

Q6. What is the maximum error probability of a source-channel code?

The output of the optimum source encoder is, for large k, approximately equiprobable over a set of5As the maximal (over source outputs) error probability cannot be lower than the worst-case error probability, the maximal error probability achievability bounds of [8] apply to bound ǫ⋆(M).

Q7. what is the erasure distortion measure for a discrete source?

For a discrete source, the single-letter erasure distortion measure is defined as the following mapping d : S×{S, e} 7→ [0,∞]:8d(s, z) = 0 z = s H(S) z = e∞ otherwise (211)For any 0 ≤ d ≤ H(S), the rate-distortion function of the equiprobable source is achieved byPZ⋆|S=s(z) = { 1− dH(S) z = s dH(S) z = e (212)The rate-distortion function and the d-tilted information for the equiprobable source with the erasure distortion measure are given by, respectively,R(d) = H(S)− d (213) S(s, d) = ıS(s)− d (214)Note that, trivially, S(S, d) = R(d) = log |S| − d a.s.

Q8. What is the achievability bound for a single code?

Since the distortion cdf of any single code does not majorize the cdfs of all possible codes, the converse bound on the average distortion obtained through this approach, although asymptotically tight, may be loose at short blocklengths.

Q9. What is the effect of the converse result in Theorem 5?

3) As evidenced by their numerical results, the converse result in Theorem 5, which applies to those channels satisfying a certain symmetry condition and which is a consequence of the hypothesis testing converse in Theorem 4, can outperform the d-tilted information converse in Theorem 3.

Q10. What is the probability of a channel code being distinguished?

From the channel coding theorem the authors know that there exists a channel code that is capable of distinguishing, with high probability, M = exp (kR(d)) < exp (nC) messages when equipped with the maximum likelihood decoder.

Q11. What is the definition of a lossy source-channel code?

In the absence of an input cost constraint the authors simplify the terminology and refer to the code as (d, ǫ) lossy source-channel code.

Q12. What is the error probability with a list decoder?

The error probability with this type of list decoding is the probability that the source outcome S does not belong to the decoder output list for Y : 1− ∑x∈X∑y∈Y∑s̃∈M(L)∑ s∈s̃ PS̃|Y (s̃|y)PY |X(y|x)PX|S(x|s)PS(s)(54)where M(L) is the set of all QS-measurable subsets of M with QS-measure not exceeding L.Definition 7 (List code).

Q13. what is the optimum binary hypothesis test?

In particular, the optimum binary hypothesis test W⋆ for deciding between PSPX|SPY |X and QSPX|SPȲ satisfiesW ⋆ − (S, ıX;Ȳ (X ;Y ))−

Lossy Joint Source-Channel Coding in the Finite Blocklength Regime

Summary (5 min read)

I. INTRODUCTION

II. DEFINITIONS

Definition 4. Fix α and d ≥ d min . The rate-dispersion function of joint source-channel coding (source samples squared per channel use) is defined as

Theorem 1 (Converse

B. Converses via hypothesis testing and list decoding

Definition 7 (List code

IV. ACHIEVABILITY

Theorem 7 (Achievability). There exists a (d, ǫ) sourcechannel code with

Theorem 8 (Achievability)

VI. LOSSY TRANSMISSION OF A BMS OVER A BSC

Theorem 14 (Achievability, BMS-BSC).

VIII. TO CODE OR NOT TO CODE

Definition 10. The distortion-dispersion function of symbolby-symbol joint source-channel coding is defined as

Theorem 20 (Achievability, symbol-by-symbol code). Under restrictions

B. Uncoded transmission of a BMS over a BSC

It achieves, at blocklength n and excess distortion probability

C. Symbol-by-symbol coding for lossy transmission of a GMS over an AWGN channel

E. Symbol-by-symbol transmission of a DMS over a DEC under logarithmic loss

IX. CONCLUSION

Figures (7)

Citations

Cites result from "Lossy Joint Source-Channel Coding i..."

References

Related Papers (5)

Frequently Asked Questions (13)

Q1. What are the contributions in "Lossy joint source-channel coding in the finite blocklength regime" ?

Q2. What is the reason for the lower achievable dispersion in this case?

Q3. What is the rate-dispersion function of a joint source-channel coding?

Q4. What is the distortion-dispersion function of a joint source-channel coding?

Q5. What is the possible bound for a given encoder?

Q6. What is the maximum error probability of a source-channel code?

Q7. what is the erasure distortion measure for a discrete source?

Q8. What is the achievability bound for a single code?

Q9. What is the effect of the converse result in Theorem 5?

Q10. What is the probability of a channel code being distinguished?

Q11. What is the definition of a lossy source-channel code?

Q12. What is the error probability with a list decoder?

Q13. what is the optimum binary hypothesis test?