Journal Article•DOI•

Bit-interleaved coded modulation

Giuseppe Caire¹, Giorgio Taricco², Ezio Biglieri²•Institutions (2)

Polytechnic University of Turin¹, Instituto Politécnico Nacional²

29 Jun 1997-Vol. 44, Iss: 3, pp 927-946

TL;DR: The paper presents in a comprehensive fashion the theory underlying bit-interleaved coded modulation, provides tools for evaluating its performance, and gives guidelines for its design.

read less

Abstract: It has been recognized by Zehavi (1992) that the performance of coded modulation over a Rayleigh fading channel can be improved by bit-wise interleaving at the encoder output, and by using an appropriate soft-decision metric as an input to a Viterbi (1990) decoder. The paper presents in a comprehensive fashion the theory underlying bit-interleaved coded modulation, provides tools for evaluating its performance, and gives guidelines for its design.

...read moreread less

Summary (11 min read)

Jump to: [2.1.1 Gaussian Channel Model] – [2.3.2 BICM Classical Model] – [2.3.3 Labeling Rules] – [2.A Continuous-and Discrete-Time Gaussian Channels] – [3.1.1 Channel Capacity] – [3.1.2 Error Probability with Random Codes] – [3.2 Bit-Interleaved Coded Modulation] – [3.2.1 Achievable Rates] – [Corollary 3.1 ( [77]] – [Corollary 3.2 ( [77]] – [3.2.2 Error Exponents] – [3.3 Comparison with Multilevel Coding] – [3.4 Mutual Information Analysis] – [3.4.1 Derivative of Mutual Information] – [3.4.2 Wideband Regime] – [3.5 Concluding Remarks and Related Work] – [4.1 Error Probability and the Union Bound] – [4.1.1 Linear Codes] – [4.1.2 Cumulant Transforms of Symbol Scores] – [4.2 Pairwise Error Probability for Infinite Interleaving] – [4.2.1 Exact Formulas, Bounds, and Approximations] – [4.2.2 MAP Demodulator] – [4.2.3 Cumulant transform asymptotic analysis] – [4.3 Pairwise Error Probability for Finite Interleaving 4.3.1 Motivation] – [4.3.2 A General Formula for the Pairwise Error Probability] – [Theorem 4.3 ( [73]] – [4.3.4 High-order Modulations: Asymptotic Analysis] – [4.4 Bounds and Approximations Above the Cutoff Rate] – [4.5 Concluding Remarks and Related Work] – [5.1 Factor Graph Representation and Belief Propagation] – [5.2 Density Evolution] – [5.3 EXIT Charts] – [5.4 The Area Theorem] – [5.4.1 Code Design Considerations] – [5.5 Improved Schemes] – [5.5.1 Standard concatenated codes with BICM-ID] – [5.5.2 LDPC BICM-ID] – [Corollary 5.3 ( [6]] – [5.5.3 RA BICM-ID] – [5.6 Concluding Remarks and Related Work] – [5.A Density Evolution Algorithm for BICM-ID] – [6.1 Non-Coherent Demodulation] – [6.2 Block-Fading] – [6.3 MIMO] – [6.4 Optical Communication: Discrete-Time Poisson Channel] – [6.5 Additive Exponential Noise Channel] and [Conclusions]

2.1.1 Gaussian Channel Model

A particularly interesting, yet simple, case is that of complex-plane signal sets (X ⊂ C, Y = C) in AWGN with fully-interleaved fading, EQUATION where h k are fading coefficients with unit variance, z k are the zeromean, unit-variance, circularly symmetric complex Gaussian samples, and snr is the signal-to-noise ratio (SNR).
Examples of input set X are unit energy PSK or QAM signal sets.
From the decoder viewpoint, the channel transition probability is decomposed as P Y,H|X (y, h|x) = P Y |X,H (y|x, h)P H (h), with EQUATION Under this assumption, the phase of the fading coefficient becomes irrelevant and the authors can assume that the fading coefficients are real-valued.
For simplicity of notation, the authors shall denote the channel transition probability simply as P Y |X (y|x), where the possible conditioning with respect to h or any other related channel state information h, is implicitly understood and will be clear from the context.

2.3.2 BICM Classical Model

With infinite-length interleaving, the m parallel channels were assumed to be independent in [29, 140] , or in other words, the correlations among the different subchannels are neglected.
The authors will see later that this "classical" representation of BICM as a set of parallel channels gives a good model, even though it can sometimes be optimistic.
The alternative model which uses the symbol mismatched decoding metric achieves a higher accuracy at a comparable modeling complexity.

2.3.3 Labeling Rules

For the decoder presented in previous sections, it was conjectured [29] that binary reflected Gray mapping was optimum, in the sense of having the largest BICM capacity.
This conjecture was supported by some numerical evidence, and was further refined in [2, 109] to possibly hold only for moderate-to-large values of SNR.
Indeed, Stierstorfer and Fischer [110] have shown that a different labeling -strictly regular set partitioning-is significantly better for small values of SNR.
Throughout the monograph, the authors use for their simulations the labeling rules depicted in Figure 2 .5, namely binary reflected Gray labeling [95] and set partitioning labeling [130] .
For QAM modulations, the symbol mapping is the Cartesian product of Gray mappings over the in-phase and quadrature components.

2.A Continuous-and Discrete-Time Gaussian Channels

The authors follow closely the review paper by Forney and Ungerböck [36] .
In the linear Gaussian channel, the input x(t), additive Gaussian noise component z(t), and output y(t) are related as where h(t; τ ) is a (possibly time-varying) channel impulse response.
Since all functions are real, their Fourier transforms are Hermitian and the authors need consider only the positive-frequency components.
The authors complement their analysis with a derivation of the error exponents of other variants of coded modulation, namely multi-level coding with successive decoding [140] and with independent decoding of all the levels.
The authors also give a simple expression for the first derivative of the BICM mutual information with respect to the signal-to-noise ratio, in terms of Minimum Mean-Equare Error (MMSE) for estimating the input of the channel from its output, and they relate this to the findings of [51, 67] .

3.1.1 Channel Capacity

For memoryless channels, Shannon's theorem yields the capacity formula: Theorem 3.1 (Shannon 1948 ).
Specifically, he proved that the error probability decreases exponentially with the block length according to a parameter called the reliability function.
While a uniform distribution is only optimal for large snr, it is simpler to implement and usually leads to more manageable analytical expressions.
In general, the probability distribution P X (x) that maximizes the mutual information for a given signal constellation depends on snr and on the specific constellation geometry.

3.1.2 Error Probability with Random Codes

Following in the footsteps of Gallager [39, Chapter 5] , this section provides an achievability theorem for a general decoding metric q(x, y) using random coding arguments.
The final result, concerning the error probability, can be found in Reference [59] .
The authors denote by Pe (m) the average error probability over the code ensemble when message m is transmitted and by Pe the error probability averaged over the message choices.
The authors define the mismatched random coding exponent as EQUATION Further improvements are possible by optimizing over the input distribution P X (x).
According to (3.15) , the average error probability Later, the authors will use the following data-processing inequality, which shows that the generalized Gallager function of any mismatched decoder is upperbounded by the Gallager function of a maximum likelihood decoder.

3.2 Bit-Interleaved Coded Modulation

The authors study the BICM decoder and determine the generalized mutual information and a lower bound to the reliability function.
Special attention is given to the comparison with the classical analysis of BICM as a set of m independent parallel channels (see Section 2.3).

3.2.1 Achievable Rates

The authors start with a brief review of the classical results on the achievable rates for BICM.
Under the assumption of an infinite-length interleaver, capacity and cutoff rate were studied in [29] .
This assumption (see Section 2.3) yields a set of m independent parallel binary-input channels, for which the corresponding mutual information and cutoff rate are the sum of the corresponding rates of each subchannel, and are given by EQUATION and EQUATION respectively.
An underlying assumption behind Eq. (3.30) is that the m independent channels are used the same number of times.
The authors now relate this BICM capacity with the generalized mutual information introduced in the previous section.

Corollary 3.1 ( [77]

For the metric in Eq. (2.15), EQUATION Expression (3.35) coincides with the BICM capacity above, even though the authors have lifted the assumption of infinite interleaving.
When the suboptimal metrics (2.17) are used, the authors have the following.

Corollary 3.2 ( [77]

The fundamental difference between their result and the generalized mutual information given in (3.36) is the optimization over s.
Since both expressions are equal when s = 1, the optimization over s may induce a larger achievable rate.
For the sake of illustration simplicity, the authors have only plotted the information rate for the Gray and set partitioning binary labeling rules from Figure 2 .5.
Observe that binary reflected Gray labeling pays a negligible penalty in information rate, being close to the coded modulation capacity.

3.2.2 Error Exponents

In their analysis of multilevel coding and successive decoding, Wachsmann et al. provided the error exponents of BICM modeled as a set of independent parallel channels [140] .
As mentioned in the previous section, both BICM models yield the same capacity.
In most cases, BICM with a max-log metric (2.17) incurs a marginal loss in the exponent for mid-to-large SNR.
Error exponents for coded modulation , BICM with independent parallel channels , BICM using metric (2.15) (dash-), and BICM using metric (2.17) for 16-QAM with Gray labeling, Rayleigh fading and snr = −25 dB.
An interesting question is whether the error exponent of the parallelchannel model is always larger than that of the mismatched-decoding model.

3.3 Comparison with Multilevel Coding

Multilevel codes (MLC) combined with multistage decoding (MSD) have been proposed [56, 140] as an efficient method to attain the channel capacity by using binary codes.
The multi-stage decoder operates by decoding the m levels separately.
Since the alternatives are not disjoint, application of the union bound to the error probability Pr{error|x While BICM suffers from a non-zero, yet small, capacity loss compared to CM and MLC/MSD, BICM attains a larger error exponent, whose loss with respect to CM is small.

3.4 Mutual Information Analysis

The authors focus on AWGN channels with and without fading and study some properties of the mutual information as a function of snr.
Building on work by Guo, Shamai and Verdú [51] , the authors first provide a simple expression for the first derivative of the mutual information with respect to snr.
This expression is of interest for the optimization of power allocation across parallel channels, as discussed by Lozano et al. [67] in the context of coded modulation systems.
Then, the authors study the BICM mutual information at low snr, that is in the wideband regime recently popularised by Verdú [134] .
For a given rate, BICM with Gray labeling loses at most 1.25 dB in received power.

3.4.1 Derivative of Mutual Information

A fundamental relationship between the input-output mutual information and the minimum mean-squared error (MMSE) in estimating the input from the output in additive Gaussian channels was discovered by Guo, Shamai and Verdú in [51] .
It is worth noting that, beyond its own intrinsic theoretical interest, this relationship has proved instrumental in optimizing the power allocation for parallel channels with arbitrary input distributions and in obtaining the minimum bit-energy-to-noisespectral-density ratio for reliable communication [67] .
For BICM, obtaining a direct relationship between the BICM capacity and the MMSE in estimating the coded bits given the output is a challenging problem.
The combination of Eqs. (3.33) and (3.61) yields a simple relationship between the first derivative of the BICM mutual information and the MMSE of coded modulation: Theorem 3.3 ( [49] ).
As for low snr, the authors notice a small loss, whose value is determined analytically from the analysis in the next section.

3.4.2 Wideband Regime

At very low signal-to-noise ratio snr, the energy of a single bit is spread over many channel degrees of freedom, leading to the wideband regime recently discussed at length by Verdú [134] .
Rather than studying the exact expression of the channel capacity, one considers a second-order Taylor series in terms of snr, where c 1 and c 2 depend on the modulation format, the receiver design, and the fading distribution.
It is somewhat surprising that the loss incurred by binary reflected Gray labeling with respect to coded modulation is bounded at low snr.

4.1 Error Probability and the Union Bound

Since some of these bit indices might be modulated in the same constellation symbol, the authors have m classes of symbol scores, each characterized by a different number of wrong bits (that is, the Hamming weight of the binary labels).
These scores are random variables whose density function depends on all the random elements in the channel, as well as the transmitted bits, their position in the symbol and the bit pattern.
In order to avoid this dependence, the authors will use the random coset code method used in [60] to analyze LDPC codes for the Inter-Symbol Interference (ISI) channel and in [10] to analyze nonbinary LDPC codes.
Clearly, the error probability computed this way gives an average over all possible scrambling sequences.
In [29] , the scrambler role was played by randomly choosing between a mapping rule µ and its complement μ with probability 1/2 at every channel use.

4.1.1 Linear Codes

If the underlying binary code C is linear and the channel is symmetric, the pairwise error probability depends on the transmitted codeword c m and the competing codeword c m only through their respective Hamming distance d [139] .
The union bound accurately characterizes the error probability in the region above the cutoff rate [139] .

4.1.2 Cumulant Transforms of Symbol Scores

The authors introduce the definition of the cumulant transform and apply it to the symbol scores.
The binary labels of the reference and competing symbols in the symbol score differ only by a single bit, and all d different bits of the pairwise error between the reference and competing codewords are mapped onto different modulation symbols.
As the authors will see in the next sections, this will significantly simplify the analysis.
As the authors will see, only the cumulant transform is needed to accurately approximate the error probability.
Fortunately, this effect shows in error probability values of interest only for short interleaver lengths, and assuming symbols of weight 1 is sufficient for most practical purposes.

4.2 Pairwise Error Probability for Infinite Interleaving

The authors study the pairwise error probability assuming infinite-length interleaving [29] .
As the authors saw in the previous chapter, this channel model does not fully characterize the fundamental limits of BICM.
While the model yields the same capacity, the error exponent is in general different.
The authors shall see that this model characterizes fairly accurately the error probability for medium-to-large signal-to-noise ratios when the union bound is employed.
Infinite-length interleaving implies that all d different bits in a pairwise error event are mapped onto d different symbols, i. e. there are no symbols with label-Hamming-weight larger than 1.

4.2.1 Exact Formulas, Bounds, and Approximations

The authors denote the pairwise error probability for infinite interleaving as PEP 1 (d).
The Chernoff bound gives a true bound and is moreover easy to compute.
It is further known to correctly give the asymptotic exponential decay of the error probability for large d and snr [74, 95] .
The saddlepoint approximation may also be seen as an approximation of the complex-plane integration of Proposition 4.2.
For the sake of completeness, the authors also mention two additional ap-proximations.

4.2.2 MAP Demodulator

Then, a channel with binary input is said output-symmetric [96] if the following relation between the densities of the posterior log-likelihood ratios, seen as function the channel input, holds: EQUATION Proposition 4.5.
The inaccuracy of the expurgated bound was also remarked by Sethuraman [104] and Yeh al. [141] who noticed that this "bound" is actually not a bound in general.
The authors next show some examples to illustrate the accuracy of these bounds and approximations for convolutional and repeat-accumulate (RA) codes [32] .
In particular, the authors show the Chernoff/Bhattacharyya union bound (dash-dotted lines), the saddlepoint approximation (4.35) union bound (solid lines), the Gaussian approximation union bound (dashed lines) and the simulations.

4.2.3 Cumulant transform asymptotic analysis

Inspection of Figures 4.2 and 4.3 suggests that the bounds and approximations considered in the previous section yield the same asymptotic behavior of the error probability for large snr.
Here K may depend on the mapping (see for example snr (thick lines) and κ (ŝ) snr (thin lines) for 16-QAM with Gray (solid lines) and set partitioning (dashed lines) mappings in the AWGN channel.
The above result does not depend on the modulation nor the binary labeling, and confirms that BICM indeed behaves as a binary modulation and thus, the asymptotic performance depends on the Hamming distance of the the binary code C rather than on the Euclidean distance.
As expected, the limit value is 8, which does not depend on the modulation.
A finer approximation to the exponent of the error probability is given by the the following result.

4.3 Pairwise Error Probability for Finite Interleaving 4.3.1 Motivation

For this case, all symbol scores have Hamming weight 1 and are thus bit scores.
Moreover, since the channel is memoryless, the bit scores with infinite interleaving are independent.
Since the task of determining the exact distribution of the d pairwise different bits onto the N symbols can be hard, the authors follow the results of [141] and compute an average pairwise error probability by averaging over all possible distributions of d bits onto N symbols, equivalent to uniform interleaving for turbo-codes [8, 9] .
In Section 4.3.2 the authors present a general expression for the pairwise error probability, as well as its corresponding saddlepoint approximation.
In Section 4.3.3 the authors apply the theory to what is arguably the simplest case of BICM, QPSK under Nakagami fading.

4.3.2 A General Formula for the Pairwise Error Probability

For finite interleaving, every possible pattern corresponds to a different conditional pairwise error probability, denoted by PEP(d, ρ N ).
Taking into account all possible patterns ρ Again, the authors can use the saddlepoint approximation to obtain a result similar to that shown in Theorem 4.1.
The conditional pairwise error probability can be approximated to first-order by EQUATION.
This is possibly the simplest case of dependency between the bit sub-channels, with symbol scores of Hamming weight 2.

Theorem 4.3 ( [73]

The slope of the pairwise error probability changes at sufficiently large signal-tonoise ratio.
In the next section the authors generalize this result to higher order modulations.
The error probability at the crossing rapidly becomes small, at values typically below the operating point of common communication systems.

4.3.4 High-order Modulations: Asymptotic Analysis

The authors closely follow the analysis in [73] for general constellations and mappings, and estimate the signal-to-noise ratio at which the slope of the error probability changes.
As it happened with the bit score and PEP 1 (d), Eq. (4.73) may be in the saddlepoint approximation to obtain a heuristic approximation to the pairwise error probability for large snr, namely EQUATION .75) We use Eq. (4.75) to estimate the threshold SNR.the authors.the authors.
The authors observe that the main flattening of the error probability takes place at high snr.

4.4 Bounds and Approximations Above the Cutoff Rate

Spurred by the appearance of turbo-codes [11] and the rediscovery of LDPC codes [69] , there has been renewed interest in the past decade in the derivation of improved bounds for a region above the cutoff rate.
The authors briefly discuss such bounds for BICM.
The TSB is known to be the tightest bound in binary-input AWGN channels, and will be combined with the Gaussian approximation introduced in Section 4.2.
The error probability was analyzed in Section 3.1.2 for an ensemble of random codes.

5.1 Factor Graph Representation and Belief Propagation

The optimal decoding rule that minimizes the bit-error probability P b is the bit-wise MAP rule EQUATION Given a factorization, the corresponding Factor Graph (FG) is a bipartite graph with two sets of nodes, the variable nodes V and the function nodes F.
The BP general computation rules are given as follows [62] .
For the FG of the joint APP given in (5.2), all messages are marginal probabilities, or proportional to marginal probabilities.
The authors hence drop the superscript b in the bit scores to simplify the notation.

5.2 Density Evolution

This method, named density evolution, describes how the densities of the messages passed along the graph evolve through the iterations.
For given interleaver and scrambling sequence, the message vectors Ξ dem→dec and Ξ dec→dem are random vectors, function of the channel noise and fading and of the transmitted information message.
This algorithm produces the extrinsic information output for symbol c i by operating over a trellis finite window centered around the symbol position i.
Averaged over all possible interleavers, the probability that the neighborhood has cycles is bounded as [22, 60].
Figure 5 .5 shows that the positive tail of the message densities does not change much with the iterations, resulting in a nearly equal error probability.

5.3 EXIT Charts

In the previous section, the authors have described density evolution as a method that characterizes the message-passing process exactly in the limit for infinite interleavers.
Unfortunately, density evolution does not yield simple criteria to optimize the BICM-ID scheme.
EXIT charts represent the extrinsic information as a function of a priori information, y = exit dem (x), (5.19) x = exit dec (y), (5.20) and thus represent the transfer of extrinsic information in the demapper and decoder blocks.
The fixed point of BICM-ID, where further decoding iterations do not improve the performance, is the leftmost intersection of the EXIT curves.
Notice that the AWGN and BEC EXIT curves are close to each other.

5.4 The Area Theorem

EXIT charts exhibit a fundamental property when the extrinsic channel is a BEC [6] : the area under the EXIT curve with a BEC extrinsic channel is related to the rate of the iterative decoding element the authors are plotting the EXIT chart of.
Applying Theorem 5.2 to the demapper case yields the following result.

5.4.1 Code Design Considerations

The practical significance of the area theorem lies in the following observation.
A dec BICM-ID will improve its performance with iterations.
Furthermore, this result suggests that any area gap between the two EXIT curves translates in a rate loss with respect to the capacity C cm X [6] .
This matching condition was illustrated in [82, 83, 96] for binary LDPC codes.
As discussed earlier, this implies that, even with perfect extrinsic side information the demapper is not able to correctly infer the value of the bit i. e., the demapper EXIT curve cannot reach the point (1, 1) .

5.5 Improved Schemes

The authors discuss some BICM-ID schemes whose demapper EXIT curve does not treat the bits as uncoded.
In particular, the authors consider LDPC and RA-based constructions and show that significant gains can be achieved.
These constructions use in one way or another coded mappers, i. e., mappers with memory.
The EXIT curve of a code mixture is the sum of the individual EXIT curves, appropriately weighted by the length fractions corresponding to each code.

5.5.1 Standard concatenated codes with BICM-ID

The corresponding block diagram for this scheme is shown in Figure 5 .12.
This typically results in 3-dimensional EXIT charts and multiple scheduling algorithms, i. e., how decoding iterations of each type are scheduled [25, 27, 55] .
Hence the design of such schemes based on curve fitting is potentially complicated.
As Gray mapping does not improve much its EXIT transfer characteristics through iterations, the authors could simplify the scheme -and reduce the decoding complexity-by not performing iterations at the demapper, e. g. the BICM decoding described in Chapter 3.
Observe that the simulated error probability in Figure 5 .15 (N = 125000 and 100 decoding iterations) matches well the EXIT analysis predictions, especially the threshold, which the EXIT analysis locates at 0.2157 dB from the BICM capacity.

5.5.2 LDPC BICM-ID

Let us first consider the concatenation of an LDPC code with the binary labeling.
The average variable and check node degrees are given by dv respectively.
The design rate of this LDPC code ensemble is thus EQUATION and the overall rate of the construction is R = mr.
The interest of this improved BICM-ID construction is the lack of interleaver between the LDPC and the mapping.
The corresponding EXIT area result for this construction is given in the following.

Corollary 5.3 ( [6]

Consider the joint EXIT curve of the demapper and LDPC variable nodes y = exit ldpc dem,v (x) in a BICM-ID scheme with an LDPC code with left and right edge degree distributions given by λ(z) and ρ(z), respectively.
That is, again the authors recover a matching condition, namely, that the overall design spectral efficiency should be less than the channel capacity.
The messages passed along the edges of the graph correspond to M -ary variables.
The random coset technique described in Section 4.1 enabled them to consider i.i.d. messages and define density evolution and EXIT charts.

5.5.3 RA BICM-ID

Similarly to the LDPC BICM-ID construction, the authors can design an improved BICM-ID scheme based on RA codes [93, 120, 127] .
Following [6, Example 26] , the authors have the following area property.
The corresponding error probability simulation is shown in Figure 5 .21.
Also, the code designed with a BEC extrinsic channel is 0.1 dB away from that designed for the AWGN channel.

5.A Density Evolution Algorithm for BICM-ID

First, initialize the pdf of the messages Ξ dec→dem (the authors drop the time index for simplicity of notation) to a single mass-point at zero.
This represents the fact that, at the beginning of the BICM-ID process, no information is available from the decoder to the demapper.
A common criterion is to exit if the resulting error probability does not change significantly through the iterations.
Generally, it is much more convenient to work with cumulative distribution functions rather than with pdfs (the authors refer to pdfs here since the algorithm is called density evolution).
In particular, the authors review current work and outline how to extend the results they presented throughout the monograph to non-coherent detection, block-fading, multiple-input multiple-output (MIMO) channels and non-standard channels such as the exponential-noise channel.

6.1 Non-Coherent Demodulation

Orthogonal modulation with non-coherent detection is a practical choice for situations where the received signal phase cannot be reliably estimated and/or tracked.
Common choices of implementation for the modulator are pulse-position modulation (PPM) or frequency-shift keying (FSK) [95] .
The application of their main results to orthogonal modulation is straightforward.
Also, all integrals over y are now M -dimensional integrals.
As an example, 6.2 show the coded modulation and BICM capacities for the AWGN channel and the fully-interleaved Rayleigh fading channel with non-coherent detection, respectively.

6.2 Block-Fading

The block-fading channel [16, 87] is a useful channel model for a class of time-and/or frequency-varying fading channels where the duration of a block-fading period is determined by the product of the channel coherence bandwidth and the channel coherence time [95] .
Within a block-fading period, the channel fading gain remains constant.
The corresponding information-theoretic limit is the outage probability, and the design of efficient coded modulation schemes for the block-fading channel is based on approaching the outage probability.
This is due to the fact that while Gaussian inputs yield slope 4, the Singleton bound gives d sb = 3. In [46] the family of blockwise concatenated codes based on BICM was introduced.
In order to apply their results on error exponents and error probability, the authors need to follow Malkämaki' and Leib's approach [70] and derive the error exponent for a particular channel realization.

6.3 MIMO

Multiple antenna or MIMO channels model transmission systems where either the transmitter, the receiver or both, have multiple antennas available for transmission/reception.
An important feature of the design of BICM for MIMO channels is decoding complexity.
Note, however, that the size of X j,t b is exponential with the number of transmit antennas, which can make decoding very complex.
As illustrated in [20] sphere decoding techniques can also be employed to accurately estimate the coded modulation and BICM capacities.

6.4 Optical Communication: Discrete-Time Poisson Channel

The channel models the authors have mainly considered so far are variations of the additive Gaussian noise channel, which provide an accurate characterization for communication channels operating at radio and microwave frequencies.
For optical frequencies, however, the family of Poisson channels is commonly considered a more accurate channel model.
In particular, the so-called discrete-time Poisson (DTP) channel with pulse-energy modulations (PEM) constitutes a natural counterpart to the PSK and QAM modulations considered throughout the monograph [72] .
This spacing between constellation points was proved in [33] to minimize the pairwise error probability in the DTP channel at high signal energy levels.
As it happened in the Gaussian channel, BICM performs close to coded modulation in the DTP channel when Gray labeling is used [72] .

6.5 Additive Exponential Noise Channel

Channels with additive exponential noise (AEN) have been considered in the context of queueing systems [5] , and then in their own right [133] because their analytical characterization closely follows that of the Gaussian channel.
The capacity of such channels with average signal-to-noise ratio snr is given by log(1 + snr) [133] .

Conclusions

Coding in the signal space is dictated directly by Shannon capacity formula and suggested by the random coding achievability proof.
This paradigm changed radically with the advent of Coded Modulation.
The authors have presented a comprehensive review of the foundations of BICM in terms of information-theoretic, error probability and iterative decoding analysis.
Using this decoder, the authors have presented the derivation of the average error probability of the random coding ensemble and obtained the resulting error exponents, generalized mutual information and cutoff rate.
The authors have reviewed the density evolution analysis of BICM-ID and the application of the area theorem to BICM.

Did you find this useful? Give us your feedback

Figures (56)

Table 4.1 Asymptotic analysis for 8-PSK with varying interleaver length n = 3N and minimum distance d = 4.

Fig. 5.2 Factor graph representation of a BICM scheme of rate R = 1. In this case, C is a binary convolutional code of rate r = 1 3 and m = 3, i. e., 8-PSK or 8-QAM.

Fig. 4.8 Bit error probability union bounds and bit-error rate simulations of 8-PSK with the 8-state rate-2/3 convolutional code in a fully-interleaved Rayleigh fading channel. Interleaver length N = 30 (circles) and N = 1000 (diamonds). In solid lines, the saddlepoint approximation union bounds for N = 30, N = 100, N = 1000 and for infinite interleaving, with PEP1(d). In dashed, dashed-dotted, and dotted lines, the heuristic approximations with weight v = 1, 2, 3 respectively.

Fig. 5.11 Area theorem for the convolutional code (5, 7)8 (left) and for a set partitioning demapper in an AWGN channel with snr = 6 dB (right).

Fig. 5.16 Regular (3, 6) LDPC BICM-ID factor graph with m = 3.

Fig. 3.4 Error exponents for coded modulation (solid), BICM with independent parallel channels (dashed), BICM using metric (2.15) (dash-dotted), and BICM using metric (2.17) (dotted) for 16-QAM with Gray labeling, Rayleigh fading and snr = −25 dB. Crosses correspond to (from right to left) coded modulation, BICM with metric (2.15), BICM with metric (2.17) and BICM with metric (2.17) and s = 1.

Fig. 3.12 BICM channel capacity (in bits per channel use). Labels 1 and 2 are QPSK, 3 and 4 are 8-PSK and 5 and 6 are 16-QAM. Gray and set partitioning labeling rules correspond to dashed (and odd labels) and dashed-dotted lines (and even labels) respectively. Dotted lines are cases 1 and 6 with Nakagami-0.3 and Nakagami-1 (Rayleigh) fading (an ‘f’ is appended to the label index). Solid lines are linear approximation around Eb N0 lim .

Fig. 5.7 Extrinsic channel models in BICM-ID.

Fig. 5.20 EXIT chart fit with an irregular RA code a with BEC (left) and AWGN (right) extrinsic channel, for 8-PSK with Gray mapping in the AWGN channel with snr = 6 dB. Grouping factor a = 2 and non-binary accumulator from Figure 5.17. In dashed lines the accumulator-demapper curve; in solid lines the irregular repetition code.

Fig. 5.21 Error probability for optimal AWGN code with N = 100000 and 100 decoding iterations. In solid and dashed vertical lines the coded modulation capacity and code threshold, respectively; actual simulation in dotted line with diamonds.

Fig. 3.9 MMSE for Gaussian inputs (thick solid line), BPSK (dotted line), QPSK (dashdotted line), 8-PSK (dashed line) and 16-QAM (solid line).

Fig. 4.9 Comparison of simulation results and saddlepoint (solid line) and TSB Gaussian approximations (dash-dotted line) on the bit error rate of BICM with an RA code of rate 1/4 with K = 512 information bits, 16-QAM modulation with Gray mapping, in the fullyinterleaved AWGN Rayleigh fading channel.

Fig. 5.4 An instance of the oriented neighborhood N 2(µ5 → c1) of depth 4 with a trellis window of size 1 for m = 3.

Fig. 6.1 Coded modulation capacity (solid lines), BICM capacity (dash-dotted lines) for orthogonal modulation (PPM/FSK) with non-coherent detection in the AWGN channel.

Fig. 6.2 Coded modulation capacity (solid lines), BICM capacity (dash-dotted lines) for orthogonal modulation (PPM/FSK) with non-coherent detection in the fully-interleaved Rayleigh fading channel.

Fig. 4.3 Comparison of simulation results (dotted lines and markers), saddlepoint (in solid lines) and Gaussian approximations (dashed lines), and Chernoff union bound (in dashdotted lines) to the bit error rate of BICM in the fully-interleaved Rayleigh fading channel.

Fig. 5.12 Block diagram of concatenated codes with BICM-ID.

Fig. 4.6 Bit error rate of the (5, 7)8 convolutional code over QPSK in a fully interleaved fading channel with mf = 0.5; uniform interleaver of length n = 40, 200 (N = 20, 100). Diamonds for the simulation, solid lines for the union bound, dashed lines for the union bound assuming N2 = Nmin2 = max{0, d−N} (the upper one corresponds to N = 100) and dotted lines for the union bound for N2 = b d2 c.

Fig. 3.1 Coded modulation capacity in bits per channel use for multiple signal constellations with uniform inputs in the AWGN channel. For reference, the channel capacity with Gaussian inputs (3.4) is shown in thick lines.

Fig. 3.5 Error exponents for coded modulation (solid), BICM with independent parallel channels (dashed), BICM using metric (2.15) (dash-dotted), and BICM using metric (2.17) (dotted) for 8-PSK with Gray labeling, AWGN and snr = 5 dB.

Content maybe subject to copyright Report

Foundations and Trends



sample

Vol. xx, No xx (xxxx) 1–144

 xxxx xxxxxxxxx

DOI: xxxxxx

Bit-Interleaved Coded Modulation

Albert Guill´en i F`abregas

, Alfonso

Martinez

and Giuseppe Caire

Department of Engineering, University of Cambridge, Trumpington Street,

Cambridge, CB2 1PZ, United Kingdom, guillen@ieee.org

Centrum Wiskunde & Informatica (CWI), Kruislaan 413, Amsterdam,

1098 SJ, The Netherlands, alfonso.martinez@ieee.org

Electrical Engineering Department, University of Southern California, 3740

McClintock Av., Los Angeles, 90080 CA, USA, caire@usc.edu

Abstract

The principle of coding in the signal space follows directly from Shan-

non’s analysis of waveform Gaussian channels subject to an input con-

straint. The early design of communication systems focused separately

on modulation, namely signal design and detection, and error correct-

ing codes, which deal with errors introduced at the demodulator of

the underlying waveform channel. The correct perspective of signal-

space coding, although never out of sight of information theorists, was

brought back into the focus of coding theorists and system design-

ers by Imai’s and Ungerb¨ock’s pioneering work on coded modulation.

More recently, powerful families of binary codes with a good tradeoﬀ

between performance and decoding complexity have been (re-) discov-

ered. Bit-Interleaved Coded Modulation (BICM) is a pragmatic ap-

proach combining the best out of both worlds: it takes advantage of

the signal-space coding perspective, whilst allowing for the use of pow-

erful families of binary codes with virtually any modulation format.

BICM avoids the need for the complicated and somewhat less ﬂexi-

ble design typical of coded modulation. As a matter of fact, most of

today’s systems that achieve high spectral eﬃciency such as DSL, Wire-

less LANs, WiMax and evolutions thereof, as well as systems based on

low spectral eﬃciency orthogonal modulation, feature BICM, making

BICM the de-facto general coding technique for waveform channels.

The theoretical characterization of BICM is at the basis of eﬃcient cod-

ing design techniques and also of improved BICM decoders, e.g., those

based on the belief propagation iterative algorithm and approximations

thereof. In this monograph, we review the theoretical foundations of

BICM under the uniﬁed framework of error exponents for mismatched

decoding. This framework allows an accurate analysis without any par-

ticular assumptions on the length of the interleaver or independence

between the multiple bits in a symbol. We further consider the sensi-

tivity of the BICM capacity w ith respect to the signal-to-noise ratio

(SNR), and obtain a wideband regime (or low-SNR regime) character-

ization. We review eﬃcient tools for the error probability analysis of

BICM that go beyond the standard approach of considering inﬁnite in-

terleaving and take into consideration the dependency of the coded bit

observations introduced by the modulation. We also present bounds

that improve upon the union bound in the region beyond the cutoﬀ

rate, and are essential to characterize the performance of modern ran-

domlike codes used in concatenation with BICM. Finally, we turn our

attention to BICM with iterative deco ding, we review extrinsic infor-

mation transfer charts, the area theorem and code design via curve

ﬁtting. We conclude with an overview of some applications of BICM

beyond the classical coherent Gaussian channel.

Contents

List of Abbreviations, Acronyms and Symbols iii

1 Introduction 1

2 Channel Model and Code Ensembles 5

2.1 Channel Model: Encoding and Decoding 5

2.2 Coded Modulation 8

2.3 Bit-Interleaved Coded Modulation 9

2.A Continuous- and Discrete-Time Gaussian Channels 12

3 Information-Theoretic Foundations 16

3.1 Coded Modulation 17

3.2 Bit-Interleaved Coded Modulation 23

3.3 Comparison with Multilevel Coding 29

3.4 Mutual Information Analysis 36

3.5 Concluding Remarks and Related Work 46

ii Contents

4 Error Probability Analysis 49

4.1 Error Probability and the Union Bound 50

4.2 Pairwise Error Probability for Inﬁnite Interleaving 58

4.3 Pairwise Error Probability for Finite Interleaving 71

4.4 Bounds and Approximations Above the Cutoﬀ Rate 81

4.5 Concluding Remarks and Related Work 84

4.A Saddlepoint Location 86

4.B Asymptotic Analysis with Nakagami Fading 87

5 Iterative Decoding 89

5.1 Factor Graph Representation and Belief Propagation 91

5.2 Density Evolution 95

5.3 EXIT Charts 100

5.4 The Area Theorem 104

5.5 Improved Schemes 108

5.6 Concluding Remarks and Related Work 118

5.A Density Evolution Algorithm for BICM-ID 119

6 Applications 122

6.1 Non-Coherent Demodulation 122

6.2 Block-Fading 124

6.3 MIMO 127

6.4 Optical Communication: Discrete-Time Poisson Channel 129

6.5 Additive Exponential Noise Channel 130

7 Conclusions 133

References 136

List of Abbreviations, Acronyms and Symbols

APP A posteriori probability

AWGN Additive white Gaussian noise

BEC Binary erasure channel

BICM Bit-interleaved coded modulation

BICM-ID Bit-interleaved coded modulation with iterative decoding

BIOS Binary-input output-symmetric (channel)

BP Belief propagation

CM Coded modulation

EXIT Extrinsic information transfer

FG Factor graph

GMI Generalized mutual information

ISI Inter-symbol interference

LDPC Low-density parity-check (code)

MAP Maximum a posteriori

MIMO Multiple-input multiple-output

MLC Multi-level coding

MMSE Minimum mean-squared error

MSD Multi-stage decoding

PSK Phase-shift keying

iii

HTML Viewer

Frequently Asked Questions (12)

Q1. What contributions have the authors mentioned in the paper "Bit-interleaved coded modulation" ?

The principle of coding in the signal space follows directly from Shannon ’ s analysis of waveform Gaussian channels subject to an input constraint. The early design of communication systems focused separately on modulation, namely signal design and detection, and error correcting codes, which deal with errors introduced at the demodulator of the underlying waveform channel. In this monograph, the authors review the theoretical foundations of BICM under the unified framework of error exponents for mismatched decoding. This framework allows an accurate analysis without any particular assumptions on the length of the interleaver or independence between the multiple bits in a symbol. The authors further consider the sensitivity of the BICM capacity with respect to the signal-to-noise ratio ( SNR ), and obtain a wideband regime ( or low-SNR regime ) characterization. The authors review efficient tools for the error probability analysis of BICM that go beyond the standard approach of considering infinite interleaving and take into consideration the dependency of the coded bit observations introduced by the modulation. The authors also present bounds that improve upon the union bound in the region beyond the cutoff rate, and are essential to characterize the performance of modern randomlike codes used in concatenation with BICM. Finally, the authors turn their attention to BICM with iterative decoding, they review extrinsic information transfer charts, the area theorem and code design via curve fitting.

Q2. What have the authors stated for future works in "Bit-interleaved coded modulation" ?

In particular, the authors review current work and outline how to extend the results they presented throughout the monograph to non-coherent detection, block-fading, multiple-input multiple-output ( MIMO ) channels and non-standard channels such as the exponential-noise channel.

Q3. What is the purpose of the BICM approach?

In order to limit the loss of information arising in this separated approach, soft information about the coded bits is propagated from the demodulator to the decoder in the form of bit-wise a posteriori probabilities or log-likelihood ratios.

Q4. What is the way to decode binary codes?

At the receiver, instead of optimal joint decoding of all the component binary codes, a suboptimal multi-stage decoding, alternatively termed successive interference cancellation, achieves good performance with limited complexity.

Q5. What was the first era in Coding Theory?

The discovery of turbo codes [11] and the re-discovery of low-density parity-check (LDPC) codes [38, 69] with their corresponding iterative decoding algorithms marked a new era in Coding Theory.

Q6. What is the simplest way to construct codes for the Gaussian channel?

A practical way of constructing codes for the Gaussian channel consists of fixing the modulator signal set, and then considering codewords obtained as sequences over the fixed modulator signal set, or alphabet.

Q7. What is the simplest way to estimate the amplitude of a message?

Cumulant transform of pairwise score κpw(s, ρN ) Cumulant transform of pairwise score for pattern ρN M Input set (constellation) X cardinality m Number of bits per modulation symbol mf Nakagami fading parameter µ Labeling (mapping) rule M Message set m Message m̂ Message estimate mmse(snr) MMSE of estimating input X (Gaussian channel) N Number of channel uses N0 Noise spectral density (one-sided) N (·) Neighborhood around a node (in factor graph) νf→ϑ Function-to-variable message νϑ→f Variable-to-function message O ( f(x) )

Q8. What is the main idea behind the TCM?

Driven by Massey’s consideration on coding and modulation as a single entity [79], Ungerböck in 1982 proposed Trellis-Coded Modulation (TCM), based on the combination of trellis codes and discrete signal constellations through set partitioning [130] (see also [15]).

Q9. what is the a posteriori probability of a random variable?

Power expansion ratio ∆W Bandwidth expansion ratio E[U ] Expectation (of a random variable U) Eb Average bit energy Eb N0Ratio between average bit energy and noise spectral density Eb N0 lim Eb N0at vanishing SNR Es Average signal energy E(R) Reliability function at rate R Ebicm0 (ρ, s) BICM random coding exponent Ecm0 (ρ) CM random coding exponent Eq0(ρ, s) Generalized Gallager function Eqr (R) Random coding exponent with mismatched decoding exitdec(y) Extrinsic information at decoderContents vexitdem(x) Extrinsic information at demapper hk Fading realization at time k I(X;Y )

Q10. What are the main features of the monograph?

The authors then turn their attention to iterative decoding of BICM; the authors also review the underlying design techniques and introduce improved BICM schemes in a unified framework.

Q11. What is the coding rate of a channel?

Shannon’s promise that rates up to the channel capacity can be reliably transmitted over the channel comes together with the design challenge of effectively constructing coding schemes achieving these rates with limited encoding and decoding complexity.

Q12. what is the snr signal to noise ratio?

Bit probability (from the demapper) Q(·) Gaussian tail function q(x, y) Symbol decoding metric q(x,y) Codeword decoding metric qj(b, y) Bit decoding metric of j-th bit R Code rate, R = log2 |M|/N r Binary code rate, r = log2 |C|/n = R/m R0 Cutoff rate Rav0 Cutoff rate for average-channel model Rind0 Cutoff rate for independent-channel model Rq0 Generalized cutoff rate (mismatched decoding) ρN Bit distribution pattern over codeword∑∼x Summary operator —excluding x— ŝ Saddlepoint value σ2X Variance σ̂2X Pseudo-variance, σ 2 X ∆= E [ |X|2 ] − |E[X]|2 ζ0 Wideband slope snr Signal-to-noise ratio W Signal bandwidth X Input signal set (constellation) X jb Set of symbols with bit b at j-th label X ji1 ,...,jivbi1 ,...,biv