scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Bit-interleaved coded modulation

TL;DR: The paper presents in a comprehensive fashion the theory underlying bit-interleaved coded modulation, provides tools for evaluating its performance, and gives guidelines for its design.
Abstract: It has been recognized by Zehavi (1992) that the performance of coded modulation over a Rayleigh fading channel can be improved by bit-wise interleaving at the encoder output, and by using an appropriate soft-decision metric as an input to a Viterbi (1990) decoder. The paper presents in a comprehensive fashion the theory underlying bit-interleaved coded modulation, provides tools for evaluating its performance, and gives guidelines for its design.

Summary (11 min read)

Jump to: [2.1.1 Gaussian Channel Model][2.3.2 BICM Classical Model][2.3.3 Labeling Rules][2.A Continuous-and Discrete-Time Gaussian Channels][3.1.1 Channel Capacity][3.1.2 Error Probability with Random Codes][3.2 Bit-Interleaved Coded Modulation][3.2.1 Achievable Rates][Corollary 3.1 ( [77]][Corollary 3.2 ( [77]][3.2.2 Error Exponents][3.3 Comparison with Multilevel Coding][3.4 Mutual Information Analysis][3.4.1 Derivative of Mutual Information][3.4.2 Wideband Regime][3.5 Concluding Remarks and Related Work][4.1 Error Probability and the Union Bound][4.1.1 Linear Codes][4.1.2 Cumulant Transforms of Symbol Scores][4.2 Pairwise Error Probability for Infinite Interleaving][4.2.1 Exact Formulas, Bounds, and Approximations][4.2.2 MAP Demodulator][4.2.3 Cumulant transform asymptotic analysis][4.3 Pairwise Error Probability for Finite Interleaving 4.3.1 Motivation][4.3.2 A General Formula for the Pairwise Error Probability][Theorem 4.3 ( [73]][4.3.4 High-order Modulations: Asymptotic Analysis][4.4 Bounds and Approximations Above the Cutoff Rate][4.5 Concluding Remarks and Related Work][5.1 Factor Graph Representation and Belief Propagation][5.2 Density Evolution][5.3 EXIT Charts][5.4 The Area Theorem][5.4.1 Code Design Considerations][5.5 Improved Schemes][5.5.1 Standard concatenated codes with BICM-ID][5.5.2 LDPC BICM-ID][Corollary 5.3 ( [6]][5.5.3 RA BICM-ID][5.6 Concluding Remarks and Related Work][5.A Density Evolution Algorithm for BICM-ID][6.1 Non-Coherent Demodulation][6.2 Block-Fading][6.3 MIMO][6.4 Optical Communication: Discrete-Time Poisson Channel][6.5 Additive Exponential Noise Channel] and [Conclusions]

2.1.1 Gaussian Channel Model

  • A particularly interesting, yet simple, case is that of complex-plane signal sets (X ⊂ C, Y = C) in AWGN with fully-interleaved fading, EQUATION where h k are fading coefficients with unit variance, z k are the zeromean, unit-variance, circularly symmetric complex Gaussian samples, and snr is the signal-to-noise ratio (SNR).
  • Examples of input set X are unit energy PSK or QAM signal sets.
  • From the decoder viewpoint, the channel transition probability is decomposed as P Y,H|X (y, h|x) = P Y |X,H (y|x, h)P H (h), with EQUATION Under this assumption, the phase of the fading coefficient becomes irrelevant and the authors can assume that the fading coefficients are real-valued.
  • For simplicity of notation, the authors shall denote the channel transition probability simply as P Y |X (y|x), where the possible conditioning with respect to h or any other related channel state information h, is implicitly understood and will be clear from the context.

2.3.2 BICM Classical Model

  • With infinite-length interleaving, the m parallel channels were assumed to be independent in [29, 140] , or in other words, the correlations among the different subchannels are neglected.
  • The authors will see later that this "classical" representation of BICM as a set of parallel channels gives a good model, even though it can sometimes be optimistic.
  • The alternative model which uses the symbol mismatched decoding metric achieves a higher accuracy at a comparable modeling complexity.

2.3.3 Labeling Rules

  • For the decoder presented in previous sections, it was conjectured [29] that binary reflected Gray mapping was optimum, in the sense of having the largest BICM capacity.
  • This conjecture was supported by some numerical evidence, and was further refined in [2, 109] to possibly hold only for moderate-to-large values of SNR.
  • Indeed, Stierstorfer and Fischer [110] have shown that a different labeling -strictly regular set partitioning-is significantly better for small values of SNR.
  • Throughout the monograph, the authors use for their simulations the labeling rules depicted in Figure 2 .5, namely binary reflected Gray labeling [95] and set partitioning labeling [130] .
  • For QAM modulations, the symbol mapping is the Cartesian product of Gray mappings over the in-phase and quadrature components.

2.A Continuous-and Discrete-Time Gaussian Channels

  • The authors follow closely the review paper by Forney and Ungerböck [36] .
  • In the linear Gaussian channel, the input x(t), additive Gaussian noise component z(t), and output y(t) are related as where h(t; τ ) is a (possibly time-varying) channel impulse response.
  • Since all functions are real, their Fourier transforms are Hermitian and the authors need consider only the positive-frequency components.
  • The authors complement their analysis with a derivation of the error exponents of other variants of coded modulation, namely multi-level coding with successive decoding [140] and with independent decoding of all the levels.
  • The authors also give a simple expression for the first derivative of the BICM mutual information with respect to the signal-to-noise ratio, in terms of Minimum Mean-Equare Error (MMSE) for estimating the input of the channel from its output, and they relate this to the findings of [51, 67] .

3.1.1 Channel Capacity

  • For memoryless channels, Shannon's theorem yields the capacity formula: Theorem 3.1 (Shannon 1948 ).
  • Specifically, he proved that the error probability decreases exponentially with the block length according to a parameter called the reliability function.
  • While a uniform distribution is only optimal for large snr, it is simpler to implement and usually leads to more manageable analytical expressions.
  • In general, the probability distribution P X (x) that maximizes the mutual information for a given signal constellation depends on snr and on the specific constellation geometry.

3.1.2 Error Probability with Random Codes

  • Following in the footsteps of Gallager [39, Chapter 5] , this section provides an achievability theorem for a general decoding metric q(x, y) using random coding arguments.
  • The final result, concerning the error probability, can be found in Reference [59] .
  • The authors denote by Pe (m) the average error probability over the code ensemble when message m is transmitted and by Pe the error probability averaged over the message choices.
  • The authors define the mismatched random coding exponent as EQUATION Further improvements are possible by optimizing over the input distribution P X (x).
  • According to (3.15) , the average error probability Later, the authors will use the following data-processing inequality, which shows that the generalized Gallager function of any mismatched decoder is upperbounded by the Gallager function of a maximum likelihood decoder.

3.2 Bit-Interleaved Coded Modulation

  • The authors study the BICM decoder and determine the generalized mutual information and a lower bound to the reliability function.
  • Special attention is given to the comparison with the classical analysis of BICM as a set of m independent parallel channels (see Section 2.3).

3.2.1 Achievable Rates

  • The authors start with a brief review of the classical results on the achievable rates for BICM.
  • Under the assumption of an infinite-length interleaver, capacity and cutoff rate were studied in [29] .
  • This assumption (see Section 2.3) yields a set of m independent parallel binary-input channels, for which the corresponding mutual information and cutoff rate are the sum of the corresponding rates of each subchannel, and are given by EQUATION and EQUATION respectively.
  • An underlying assumption behind Eq. (3.30) is that the m independent channels are used the same number of times.
  • The authors now relate this BICM capacity with the generalized mutual information introduced in the previous section.

Corollary 3.1 ( [77]

  • For the metric in Eq. (2.15), EQUATION Expression (3.35) coincides with the BICM capacity above, even though the authors have lifted the assumption of infinite interleaving.
  • When the suboptimal metrics (2.17) are used, the authors have the following.

Corollary 3.2 ( [77]

  • The fundamental difference between their result and the generalized mutual information given in (3.36) is the optimization over s.
  • Since both expressions are equal when s = 1, the optimization over s may induce a larger achievable rate.
  • For the sake of illustration simplicity, the authors have only plotted the information rate for the Gray and set partitioning binary labeling rules from Figure 2 .5.
  • Observe that binary reflected Gray labeling pays a negligible penalty in information rate, being close to the coded modulation capacity.

3.2.2 Error Exponents

  • In their analysis of multilevel coding and successive decoding, Wachsmann et al. provided the error exponents of BICM modeled as a set of independent parallel channels [140] .
  • As mentioned in the previous section, both BICM models yield the same capacity.
  • In most cases, BICM with a max-log metric (2.17) incurs a marginal loss in the exponent for mid-to-large SNR.
  • Error exponents for coded modulation , BICM with independent parallel channels , BICM using metric (2.15) (dash-), and BICM using metric (2.17) for 16-QAM with Gray labeling, Rayleigh fading and snr = −25 dB.
  • An interesting question is whether the error exponent of the parallelchannel model is always larger than that of the mismatched-decoding model.

3.3 Comparison with Multilevel Coding

  • Multilevel codes (MLC) combined with multistage decoding (MSD) have been proposed [56, 140] as an efficient method to attain the channel capacity by using binary codes.
  • The multi-stage decoder operates by decoding the m levels separately.
  • Since the alternatives are not disjoint, application of the union bound to the error probability Pr{error|x While BICM suffers from a non-zero, yet small, capacity loss compared to CM and MLC/MSD, BICM attains a larger error exponent, whose loss with respect to CM is small.

3.4 Mutual Information Analysis

  • The authors focus on AWGN channels with and without fading and study some properties of the mutual information as a function of snr.
  • Building on work by Guo, Shamai and Verdú [51] , the authors first provide a simple expression for the first derivative of the mutual information with respect to snr.
  • This expression is of interest for the optimization of power allocation across parallel channels, as discussed by Lozano et al. [67] in the context of coded modulation systems.
  • Then, the authors study the BICM mutual information at low snr, that is in the wideband regime recently popularised by Verdú [134] .
  • For a given rate, BICM with Gray labeling loses at most 1.25 dB in received power.

3.4.1 Derivative of Mutual Information

  • A fundamental relationship between the input-output mutual information and the minimum mean-squared error (MMSE) in estimating the input from the output in additive Gaussian channels was discovered by Guo, Shamai and Verdú in [51] .
  • It is worth noting that, beyond its own intrinsic theoretical interest, this relationship has proved instrumental in optimizing the power allocation for parallel channels with arbitrary input distributions and in obtaining the minimum bit-energy-to-noisespectral-density ratio for reliable communication [67] .
  • For BICM, obtaining a direct relationship between the BICM capacity and the MMSE in estimating the coded bits given the output is a challenging problem.
  • The combination of Eqs. (3.33) and (3.61) yields a simple relationship between the first derivative of the BICM mutual information and the MMSE of coded modulation: Theorem 3.3 ( [49] ).
  • As for low snr, the authors notice a small loss, whose value is determined analytically from the analysis in the next section.

3.4.2 Wideband Regime

  • At very low signal-to-noise ratio snr, the energy of a single bit is spread over many channel degrees of freedom, leading to the wideband regime recently discussed at length by Verdú [134] .
  • Rather than studying the exact expression of the channel capacity, one considers a second-order Taylor series in terms of snr, where c 1 and c 2 depend on the modulation format, the receiver design, and the fading distribution.
  • It is somewhat surprising that the loss incurred by binary reflected Gray labeling with respect to coded modulation is bounded at low snr.

4.1 Error Probability and the Union Bound

  • Since some of these bit indices might be modulated in the same constellation symbol, the authors have m classes of symbol scores, each characterized by a different number of wrong bits (that is, the Hamming weight of the binary labels).
  • These scores are random variables whose density function depends on all the random elements in the channel, as well as the transmitted bits, their position in the symbol and the bit pattern.
  • In order to avoid this dependence, the authors will use the random coset code method used in [60] to analyze LDPC codes for the Inter-Symbol Interference (ISI) channel and in [10] to analyze nonbinary LDPC codes.
  • Clearly, the error probability computed this way gives an average over all possible scrambling sequences.
  • In [29] , the scrambler role was played by randomly choosing between a mapping rule µ and its complement μ with probability 1/2 at every channel use.

4.1.1 Linear Codes

  • If the underlying binary code C is linear and the channel is symmetric, the pairwise error probability depends on the transmitted codeword c m and the competing codeword c m only through their respective Hamming distance d [139] .
  • The union bound accurately characterizes the error probability in the region above the cutoff rate [139] .

4.1.2 Cumulant Transforms of Symbol Scores

  • The authors introduce the definition of the cumulant transform and apply it to the symbol scores.
  • The binary labels of the reference and competing symbols in the symbol score differ only by a single bit, and all d different bits of the pairwise error between the reference and competing codewords are mapped onto different modulation symbols.
  • As the authors will see in the next sections, this will significantly simplify the analysis.
  • As the authors will see, only the cumulant transform is needed to accurately approximate the error probability.
  • Fortunately, this effect shows in error probability values of interest only for short interleaver lengths, and assuming symbols of weight 1 is sufficient for most practical purposes.

4.2 Pairwise Error Probability for Infinite Interleaving

  • The authors study the pairwise error probability assuming infinite-length interleaving [29] .
  • As the authors saw in the previous chapter, this channel model does not fully characterize the fundamental limits of BICM.
  • While the model yields the same capacity, the error exponent is in general different.
  • The authors shall see that this model characterizes fairly accurately the error probability for medium-to-large signal-to-noise ratios when the union bound is employed.
  • Infinite-length interleaving implies that all d different bits in a pairwise error event are mapped onto d different symbols, i. e. there are no symbols with label-Hamming-weight larger than 1.

4.2.1 Exact Formulas, Bounds, and Approximations

  • The authors denote the pairwise error probability for infinite interleaving as PEP 1 (d).
  • The Chernoff bound gives a true bound and is moreover easy to compute.
  • It is further known to correctly give the asymptotic exponential decay of the error probability for large d and snr [74, 95] .
  • The saddlepoint approximation may also be seen as an approximation of the complex-plane integration of Proposition 4.2.
  • For the sake of completeness, the authors also mention two additional ap-proximations.

4.2.2 MAP Demodulator

  • Then, a channel with binary input is said output-symmetric [96] if the following relation between the densities of the posterior log-likelihood ratios, seen as function the channel input, holds: EQUATION Proposition 4.5.
  • The inaccuracy of the expurgated bound was also remarked by Sethuraman [104] and Yeh al. [141] who noticed that this "bound" is actually not a bound in general.
  • The authors next show some examples to illustrate the accuracy of these bounds and approximations for convolutional and repeat-accumulate (RA) codes [32] .
  • In particular, the authors show the Chernoff/Bhattacharyya union bound (dash-dotted lines), the saddlepoint approximation (4.35) union bound (solid lines), the Gaussian approximation union bound (dashed lines) and the simulations.

4.2.3 Cumulant transform asymptotic analysis

  • Inspection of Figures 4.2 and 4.3 suggests that the bounds and approximations considered in the previous section yield the same asymptotic behavior of the error probability for large snr.
  • Here K may depend on the mapping (see for example snr (thick lines) and κ (ŝ) snr (thin lines) for 16-QAM with Gray (solid lines) and set partitioning (dashed lines) mappings in the AWGN channel.
  • The above result does not depend on the modulation nor the binary labeling, and confirms that BICM indeed behaves as a binary modulation and thus, the asymptotic performance depends on the Hamming distance of the the binary code C rather than on the Euclidean distance.
  • As expected, the limit value is 8, which does not depend on the modulation.
  • A finer approximation to the exponent of the error probability is given by the the following result.

4.3 Pairwise Error Probability for Finite Interleaving 4.3.1 Motivation

  • For this case, all symbol scores have Hamming weight 1 and are thus bit scores.
  • Moreover, since the channel is memoryless, the bit scores with infinite interleaving are independent.
  • Since the task of determining the exact distribution of the d pairwise different bits onto the N symbols can be hard, the authors follow the results of [141] and compute an average pairwise error probability by averaging over all possible distributions of d bits onto N symbols, equivalent to uniform interleaving for turbo-codes [8, 9] .
  • In Section 4.3.2 the authors present a general expression for the pairwise error probability, as well as its corresponding saddlepoint approximation.
  • In Section 4.3.3 the authors apply the theory to what is arguably the simplest case of BICM, QPSK under Nakagami fading.

4.3.2 A General Formula for the Pairwise Error Probability

  • For finite interleaving, every possible pattern corresponds to a different conditional pairwise error probability, denoted by PEP(d, ρ N ).
  • Taking into account all possible patterns ρ Again, the authors can use the saddlepoint approximation to obtain a result similar to that shown in Theorem 4.1.
  • The conditional pairwise error probability can be approximated to first-order by EQUATION.
  • This is possibly the simplest case of dependency between the bit sub-channels, with symbol scores of Hamming weight 2.

Theorem 4.3 ( [73]

  • The slope of the pairwise error probability changes at sufficiently large signal-tonoise ratio.
  • In the next section the authors generalize this result to higher order modulations.
  • The error probability at the crossing rapidly becomes small, at values typically below the operating point of common communication systems.

4.3.4 High-order Modulations: Asymptotic Analysis

  • The authors closely follow the analysis in [73] for general constellations and mappings, and estimate the signal-to-noise ratio at which the slope of the error probability changes.
  • As it happened with the bit score and PEP 1 (d), Eq. (4.73) may be in the saddlepoint approximation to obtain a heuristic approximation to the pairwise error probability for large snr, namely EQUATION .75) We use Eq. (4.75) to estimate the threshold SNR.the authors.the authors.
  • The authors observe that the main flattening of the error probability takes place at high snr.

4.4 Bounds and Approximations Above the Cutoff Rate

  • Spurred by the appearance of turbo-codes [11] and the rediscovery of LDPC codes [69] , there has been renewed interest in the past decade in the derivation of improved bounds for a region above the cutoff rate.
  • The authors briefly discuss such bounds for BICM.
  • The TSB is known to be the tightest bound in binary-input AWGN channels, and will be combined with the Gaussian approximation introduced in Section 4.2.
  • The error probability was analyzed in Section 3.1.2 for an ensemble of random codes.

5.1 Factor Graph Representation and Belief Propagation

  • The optimal decoding rule that minimizes the bit-error probability P b is the bit-wise MAP rule EQUATION Given a factorization, the corresponding Factor Graph (FG) is a bipartite graph with two sets of nodes, the variable nodes V and the function nodes F.
  • The BP general computation rules are given as follows [62] .
  • For the FG of the joint APP given in (5.2), all messages are marginal probabilities, or proportional to marginal probabilities.
  • The authors hence drop the superscript b in the bit scores to simplify the notation.

5.2 Density Evolution

  • This method, named density evolution, describes how the densities of the messages passed along the graph evolve through the iterations.
  • For given interleaver and scrambling sequence, the message vectors Ξ dem→dec and Ξ dec→dem are random vectors, function of the channel noise and fading and of the transmitted information message.
  • This algorithm produces the extrinsic information output for symbol c i by operating over a trellis finite window centered around the symbol position i.
  • Averaged over all possible interleavers, the probability that the neighborhood has cycles is bounded as [22, 60].
  • Figure 5 .5 shows that the positive tail of the message densities does not change much with the iterations, resulting in a nearly equal error probability.

5.3 EXIT Charts

  • In the previous section, the authors have described density evolution as a method that characterizes the message-passing process exactly in the limit for infinite interleavers.
  • Unfortunately, density evolution does not yield simple criteria to optimize the BICM-ID scheme.
  • EXIT charts represent the extrinsic information as a function of a priori information, y = exit dem (x), (5.19) x = exit dec (y), (5.20) and thus represent the transfer of extrinsic information in the demapper and decoder blocks.
  • The fixed point of BICM-ID, where further decoding iterations do not improve the performance, is the leftmost intersection of the EXIT curves.
  • Notice that the AWGN and BEC EXIT curves are close to each other.

5.4 The Area Theorem

  • EXIT charts exhibit a fundamental property when the extrinsic channel is a BEC [6] : the area under the EXIT curve with a BEC extrinsic channel is related to the rate of the iterative decoding element the authors are plotting the EXIT chart of.
  • Applying Theorem 5.2 to the demapper case yields the following result.

5.4.1 Code Design Considerations

  • The practical significance of the area theorem lies in the following observation.
  • A dec BICM-ID will improve its performance with iterations.
  • Furthermore, this result suggests that any area gap between the two EXIT curves translates in a rate loss with respect to the capacity C cm X [6] .
  • This matching condition was illustrated in [82, 83, 96] for binary LDPC codes.
  • As discussed earlier, this implies that, even with perfect extrinsic side information the demapper is not able to correctly infer the value of the bit i. e., the demapper EXIT curve cannot reach the point (1, 1) .

5.5 Improved Schemes

  • The authors discuss some BICM-ID schemes whose demapper EXIT curve does not treat the bits as uncoded.
  • In particular, the authors consider LDPC and RA-based constructions and show that significant gains can be achieved.
  • These constructions use in one way or another coded mappers, i. e., mappers with memory.
  • The EXIT curve of a code mixture is the sum of the individual EXIT curves, appropriately weighted by the length fractions corresponding to each code.

5.5.1 Standard concatenated codes with BICM-ID

  • The corresponding block diagram for this scheme is shown in Figure 5 .12.
  • This typically results in 3-dimensional EXIT charts and multiple scheduling algorithms, i. e., how decoding iterations of each type are scheduled [25, 27, 55] .
  • Hence the design of such schemes based on curve fitting is potentially complicated.
  • As Gray mapping does not improve much its EXIT transfer characteristics through iterations, the authors could simplify the scheme -and reduce the decoding complexity-by not performing iterations at the demapper, e. g. the BICM decoding described in Chapter 3.
  • Observe that the simulated error probability in Figure 5 .15 (N = 125000 and 100 decoding iterations) matches well the EXIT analysis predictions, especially the threshold, which the EXIT analysis locates at 0.2157 dB from the BICM capacity.

5.5.2 LDPC BICM-ID

  • Let us first consider the concatenation of an LDPC code with the binary labeling.
  • The average variable and check node degrees are given by dv respectively.
  • The design rate of this LDPC code ensemble is thus EQUATION and the overall rate of the construction is R = mr.
  • The interest of this improved BICM-ID construction is the lack of interleaver between the LDPC and the mapping.
  • The corresponding EXIT area result for this construction is given in the following.

Corollary 5.3 ( [6]

  • Consider the joint EXIT curve of the demapper and LDPC variable nodes y = exit ldpc dem,v (x) in a BICM-ID scheme with an LDPC code with left and right edge degree distributions given by λ(z) and ρ(z), respectively.
  • That is, again the authors recover a matching condition, namely, that the overall design spectral efficiency should be less than the channel capacity.
  • The messages passed along the edges of the graph correspond to M -ary variables.
  • The random coset technique described in Section 4.1 enabled them to consider i.i.d. messages and define density evolution and EXIT charts.

5.5.3 RA BICM-ID

  • Similarly to the LDPC BICM-ID construction, the authors can design an improved BICM-ID scheme based on RA codes [93, 120, 127] .
  • Following [6, Example 26] , the authors have the following area property.
  • The corresponding error probability simulation is shown in Figure 5 .21.
  • Also, the code designed with a BEC extrinsic channel is 0.1 dB away from that designed for the AWGN channel.

5.A Density Evolution Algorithm for BICM-ID

  • First, initialize the pdf of the messages Ξ dec→dem (the authors drop the time index for simplicity of notation) to a single mass-point at zero.
  • This represents the fact that, at the beginning of the BICM-ID process, no information is available from the decoder to the demapper.
  • A common criterion is to exit if the resulting error probability does not change significantly through the iterations.
  • Generally, it is much more convenient to work with cumulative distribution functions rather than with pdfs (the authors refer to pdfs here since the algorithm is called density evolution).
  • In particular, the authors review current work and outline how to extend the results they presented throughout the monograph to non-coherent detection, block-fading, multiple-input multiple-output (MIMO) channels and non-standard channels such as the exponential-noise channel.

6.1 Non-Coherent Demodulation

  • Orthogonal modulation with non-coherent detection is a practical choice for situations where the received signal phase cannot be reliably estimated and/or tracked.
  • Common choices of implementation for the modulator are pulse-position modulation (PPM) or frequency-shift keying (FSK) [95] .
  • The application of their main results to orthogonal modulation is straightforward.
  • Also, all integrals over y are now M -dimensional integrals.
  • As an example, 6.2 show the coded modulation and BICM capacities for the AWGN channel and the fully-interleaved Rayleigh fading channel with non-coherent detection, respectively.

6.2 Block-Fading

  • The block-fading channel [16, 87] is a useful channel model for a class of time-and/or frequency-varying fading channels where the duration of a block-fading period is determined by the product of the channel coherence bandwidth and the channel coherence time [95] .
  • Within a block-fading period, the channel fading gain remains constant.
  • The corresponding information-theoretic limit is the outage probability, and the design of efficient coded modulation schemes for the block-fading channel is based on approaching the outage probability.
  • This is due to the fact that while Gaussian inputs yield slope 4, the Singleton bound gives d sb = 3. In [46] the family of blockwise concatenated codes based on BICM was introduced.
  • In order to apply their results on error exponents and error probability, the authors need to follow Malkämaki' and Leib's approach [70] and derive the error exponent for a particular channel realization.

6.3 MIMO

  • Multiple antenna or MIMO channels model transmission systems where either the transmitter, the receiver or both, have multiple antennas available for transmission/reception.
  • An important feature of the design of BICM for MIMO channels is decoding complexity.
  • Note, however, that the size of X j,t b is exponential with the number of transmit antennas, which can make decoding very complex.
  • As illustrated in [20] sphere decoding techniques can also be employed to accurately estimate the coded modulation and BICM capacities.

6.4 Optical Communication: Discrete-Time Poisson Channel

  • The channel models the authors have mainly considered so far are variations of the additive Gaussian noise channel, which provide an accurate characterization for communication channels operating at radio and microwave frequencies.
  • For optical frequencies, however, the family of Poisson channels is commonly considered a more accurate channel model.
  • In particular, the so-called discrete-time Poisson (DTP) channel with pulse-energy modulations (PEM) constitutes a natural counterpart to the PSK and QAM modulations considered throughout the monograph [72] .
  • This spacing between constellation points was proved in [33] to minimize the pairwise error probability in the DTP channel at high signal energy levels.
  • As it happened in the Gaussian channel, BICM performs close to coded modulation in the DTP channel when Gray labeling is used [72] .

6.5 Additive Exponential Noise Channel

  • Channels with additive exponential noise (AEN) have been considered in the context of queueing systems [5] , and then in their own right [133] because their analytical characterization closely follows that of the Gaussian channel.
  • The capacity of such channels with average signal-to-noise ratio snr is given by log(1 + snr) [133] .

Conclusions

  • Coding in the signal space is dictated directly by Shannon capacity formula and suggested by the random coding achievability proof.
  • This paradigm changed radically with the advent of Coded Modulation.
  • The authors have presented a comprehensive review of the foundations of BICM in terms of information-theoretic, error probability and iterative decoding analysis.
  • Using this decoder, the authors have presented the derivation of the average error probability of the random coding ensemble and obtained the resulting error exponents, generalized mutual information and cutoff rate.
  • The authors have reviewed the density evolution analysis of BICM-ID and the application of the area theorem to BICM.

Did you find this useful? Give us your feedback

Figures (56)

Content maybe subject to copyright    Report

Foundations and Trends
R
in
sample
Vol. xx, No xx (xxxx) 1–144
c
xxxx xxxxxxxxx
DOI: xxxxxx
Bit-Interleaved Coded Modulation
Albert Guill´en i F`abregas
1
, Alfonso
Martinez
2
and Giuseppe Caire
3
1
Department of Engineering, University of Cambridge, Trumpington Street,
Cambridge, CB2 1PZ, United Kingdom, guillen@ieee.org
2
Centrum Wiskunde & Informatica (CWI), Kruislaan 413, Amsterdam,
1098 SJ, The Netherlands, alfonso.martinez@ieee.org
3
Electrical Engineering Department, University of Southern California, 3740
McClintock Av., Los Angeles, 90080 CA, USA, caire@usc.edu
Abstract
The principle of coding in the signal space follows directly from Shan-
non’s analysis of waveform Gaussian channels subject to an input con-
straint. The early design of communication systems focused separately
on modulation, namely signal design and detection, and error correct-
ing codes, which deal with errors introduced at the demodulator of
the underlying waveform channel. The correct perspective of signal-
space coding, although never out of sight of information theorists, was
brought back into the focus of coding theorists and system design-
ers by Imai’s and Ungerb¨ock’s pioneering work on coded modulation.
More recently, powerful families of binary codes with a good tradeoff
between performance and decoding complexity have been (re-) discov-
ered. Bit-Interleaved Coded Modulation (BICM) is a pragmatic ap-
proach combining the best out of both worlds: it takes advantage of
the signal-space coding perspective, whilst allowing for the use of pow-
erful families of binary codes with virtually any modulation format.

BICM avoids the need for the complicated and somewhat less flexi-
ble design typical of coded modulation. As a matter of fact, most of
today’s systems that achieve high spectral efficiency such as DSL, Wire-
less LANs, WiMax and evolutions thereof, as well as systems based on
low spectral efficiency orthogonal modulation, feature BICM, making
BICM the de-facto general coding technique for waveform channels.
The theoretical characterization of BICM is at the basis of efficient cod-
ing design techniques and also of improved BICM decoders, e.g., those
based on the belief propagation iterative algorithm and approximations
thereof. In this monograph, we review the theoretical foundations of
BICM under the unified framework of error exponents for mismatched
decoding. This framework allows an accurate analysis without any par-
ticular assumptions on the length of the interleaver or independence
between the multiple bits in a symbol. We further consider the sensi-
tivity of the BICM capacity w ith respect to the signal-to-noise ratio
(SNR), and obtain a wideband regime (or low-SNR regime) character-
ization. We review efficient tools for the error probability analysis of
BICM that go beyond the standard approach of considering infinite in-
terleaving and take into consideration the dependency of the coded bit
observations introduced by the modulation. We also present bounds
that improve upon the union bound in the region beyond the cutoff
rate, and are essential to characterize the performance of modern ran-
domlike codes used in concatenation with BICM. Finally, we turn our
attention to BICM with iterative deco ding, we review extrinsic infor-
mation transfer charts, the area theorem and code design via curve
fitting. We conclude with an overview of some applications of BICM
beyond the classical coherent Gaussian channel.

Contents
List of Abbreviations, Acronyms and Symbols iii
1 Introduction 1
2 Channel Model and Code Ensembles 5
2.1 Channel Model: Encoding and Decoding 5
2.2 Coded Modulation 8
2.3 Bit-Interleaved Coded Modulation 9
2.A Continuous- and Discrete-Time Gaussian Channels 12
3 Information-Theoretic Foundations 16
3.1 Coded Modulation 17
3.2 Bit-Interleaved Coded Modulation 23
3.3 Comparison with Multilevel Coding 29
3.4 Mutual Information Analysis 36
3.5 Concluding Remarks and Related Work 46
i

ii Contents
4 Error Probability Analysis 49
4.1 Error Probability and the Union Bound 50
4.2 Pairwise Error Probability for Infinite Interleaving 58
4.3 Pairwise Error Probability for Finite Interleaving 71
4.4 Bounds and Approximations Above the Cutoff Rate 81
4.5 Concluding Remarks and Related Work 84
4.A Saddlepoint Location 86
4.B Asymptotic Analysis with Nakagami Fading 87
5 Iterative Decoding 89
5.1 Factor Graph Representation and Belief Propagation 91
5.2 Density Evolution 95
5.3 EXIT Charts 100
5.4 The Area Theorem 104
5.5 Improved Schemes 108
5.6 Concluding Remarks and Related Work 118
5.A Density Evolution Algorithm for BICM-ID 119
6 Applications 122
6.1 Non-Coherent Demodulation 122
6.2 Block-Fading 124
6.3 MIMO 127
6.4 Optical Communication: Discrete-Time Poisson Channel 129
6.5 Additive Exponential Noise Channel 130
7 Conclusions 133
References 136

List of Abbreviations, Acronyms and Symbols
APP A posteriori probability
AWGN Additive white Gaussian noise
BEC Binary erasure channel
BICM Bit-interleaved coded modulation
BICM-ID Bit-interleaved coded modulation with iterative decoding
BIOS Binary-input output-symmetric (channel)
BP Belief propagation
CM Coded modulation
EXIT Extrinsic information transfer
FG Factor graph
GMI Generalized mutual information
ISI Inter-symbol interference
LDPC Low-density parity-check (code)
MAP Maximum a posteriori
MIMO Multiple-input multiple-output
MLC Multi-level coding
MMSE Minimum mean-squared error
MSD Multi-stage decoding
PSK Phase-shift keying
iii

Citations
More filters
Book
01 Jan 2005

9,038 citations

Proceedings Article
01 Jan 2005
TL;DR: This book aims to provide a chronology of key events and individuals involved in the development of microelectronics technology over the past 50 years and some of the individuals involved have been identified and named.
Abstract: Alhussein Abouzeid Rensselaer Polytechnic Institute Raviraj Adve University of Toronto Dharma Agrawal University of Cincinnati Walid Ahmed Tyco M/A-COM Sonia Aissa University of Quebec, INRSEMT Huseyin Arslan University of South Florida Nallanathan Arumugam National University of Singapore Saewoong Bahk Seoul National University Claus Bauer Dolby Laboratories Brahim Bensaou Hong Kong University of Science and Technology Rick Blum Lehigh University Michael Buehrer Virginia Tech Antonio Capone Politecnico di Milano Javier Gómez Castellanos National University of Mexico Claude Castelluccia INRIA Henry Chan The Hong Kong Polytechnic University Ajit Chaturvedi Indian Institute of Technology Kanpur Jyh-Cheng Chen National Tsing Hua University Yong Huat Chew Institute for Infocomm Research Tricia Chigan Michigan Tech Dong-Ho Cho Korea Advanced Institute of Science and Tech. Jinho Choi University of New South Wales Carlos Cordeiro Philips Research USA Laurie Cuthbert Queen Mary University of London Arek Dadej University of South Australia Sajal Das University of Texas at Arlington Franco Davoli DIST University of Genoa Xiaodai Dong, University of Alberta Hassan El-sallabi Helsinki University of Technology Ozgur Ercetin Sabanci University Elza Erkip Polytechnic University Romano Fantacci University of Florence Frank Fitzek Aalborg University Mario Freire University of Beira Interior Vincent Gaudet University of Alberta Jairo Gutierrez University of Auckland Michael Hadjitheodosiou University of Maryland Zhu Han University of Maryland College Park Christian Hartmann Technische Universitat Munchen Hossam Hassanein Queen's University Soong Boon Hee Nanyang Technological University Paul Ho Simon Fraser University Antonio Iera University "Mediterranea" of Reggio Calabria Markku Juntti University of Oulu Stefan Kaiser DoCoMo Euro-Labs Nei Kato Tohoku University Dongkyun Kim Kyungpook National University Ryuji Kohno Yokohama National University Bhaskar Krishnamachari University of Southern California Giridhar Krishnamurthy Indian Institute of Technology Madras Lutz Lampe University of British Columbia Bjorn Landfeldt The University of Sydney Peter Langendoerfer IHP Microelectronics Technologies Eddie Law Ryerson University in Toronto

7,826 citations


Cites background from "Bit-interleaved coded modulation"

  • ...20) and urban empirical data [15, 50, 51] for transmit antennas both above and below the building skyline is typically proportional to d−2, even at relatively large distances....

    [...]

Journal ArticleDOI
TL;DR: This article surveys frequency domain equalization (FDE) applied to single-carrier (SC) modulation solutions and discusses similarities and differences of SC and OFDM systems and coexistence possibilities, and presents examples of SC-FDE performance capabilities.
Abstract: Broadband wireless access systems deployed in residential and business environments are likely to face hostile radio propagation environments, with multipath delay spread extending over tens or hundreds of bit intervals. Orthogonal frequency-division multiplex (OFDM) is a recognized multicarrier solution to combat the effects of such multipath conditions. This article surveys frequency domain equalization (FDE) applied to single-carrier (SC) modulation solutions. SC radio modems with frequency domain equalization have similar performance, efficiency, and low signal processing complexity advantages as OFDM, and in addition are less sensitive than OFDM to RF impairments such as power amplifier nonlinearities. We discuss similarities and differences of SC and OFDM systems and coexistence possibilities, and present examples of SC-FDE performance capabilities.

2,475 citations


Cites background from "Bit-interleaved coded modulation"

  • ...(Therefore, for a coded OFDM block transmitted over a frequency selective fading (multipath) channel, which resembles a fast fading temporal channel, BICM is the most suitable coding scheme [Cai98])....

    [...]

Book
30 Nov 2008
TL;DR: The goal of this paper is to present in a comprehensive fashion the theory underlying bit-interleaved coded modulation, to provide tools for evaluating its performance, and to give guidelines for its design.
Abstract: Zehavi (1992) showed that the performance of coded modulation over a Rayleigh fading channel can be improved by bit-wise interleaving the encoder output and by using an appropriate soft-decision metric as an input to a Viterbi decoder. The goal of this paper is to present in a comprehensive fashion the theory underlying bit-interleaved coded modulation, to provide tools for evaluating its performance, and to give guidelines for its design.

2,098 citations

Journal ArticleDOI
TL;DR: This paper describes the statistical models of fading channels which are frequently used in the analysis and design of communication systems, and focuses on the information theory of fading channel, by emphasizing capacity as the most important performance measure.
Abstract: In this paper we review the most peculiar and interesting information-theoretic and communications features of fading channels. We first describe the statistical models of fading channels which are frequently used in the analysis and design of communication systems. Next, we focus on the information theory of fading channels, by emphasizing capacity as the most important performance measure. Both single-user and multiuser transmission are examined. Further, we describe how the structure of fading channels impacts code design, and finally overview equalization of fading multipath channels.

2,017 citations


Cites background or methods from "Bit-interleaved coded modulation"

  • ...The recently introduced efficient turbo-coded multilevel modulation schemes [133] and the bit interleaved coded modulation (BICM) [42], as a special case, were motivated by informationtheoretic arguments demonstrating remarkable close to the ultimate capacity limit performance in the Gaussian and fading channels....

    [...]

  • ...(BICM) [42], as a special case, were motivated by information-...

    [...]

  • ...In the contribution [42], the scheme originally advocated by Zehavi, where a coded-modulation signaling is bit-interleaved and each channel is separately treated, is investigated via information-theoretic tools in the AWGN and flat-fading chan-...

    [...]

  • ...Another solution is offered by bit-interleaved coded modulation (BICM)....

    [...]

  • ...This is due to the fact that BICM increases the Hamming distance at the price of a moderate reduction of the Euclidean distance: see Table I. Recently, a scheme which combines bit-interleaved coded modulation with iterative (“turbo”) decoding was analyzed [404], [405]....

    [...]

References
More filters
Book
01 Jan 1991
TL;DR: The author examines the role of entropy, inequality, and randomness in the design of codes and the construction of codes in the rapidly changing environment.
Abstract: Preface to the Second Edition. Preface to the First Edition. Acknowledgments for the Second Edition. Acknowledgments for the First Edition. 1. Introduction and Preview. 1.1 Preview of the Book. 2. Entropy, Relative Entropy, and Mutual Information. 2.1 Entropy. 2.2 Joint Entropy and Conditional Entropy. 2.3 Relative Entropy and Mutual Information. 2.4 Relationship Between Entropy and Mutual Information. 2.5 Chain Rules for Entropy, Relative Entropy, and Mutual Information. 2.6 Jensen's Inequality and Its Consequences. 2.7 Log Sum Inequality and Its Applications. 2.8 Data-Processing Inequality. 2.9 Sufficient Statistics. 2.10 Fano's Inequality. Summary. Problems. Historical Notes. 3. Asymptotic Equipartition Property. 3.1 Asymptotic Equipartition Property Theorem. 3.2 Consequences of the AEP: Data Compression. 3.3 High-Probability Sets and the Typical Set. Summary. Problems. Historical Notes. 4. Entropy Rates of a Stochastic Process. 4.1 Markov Chains. 4.2 Entropy Rate. 4.3 Example: Entropy Rate of a Random Walk on a Weighted Graph. 4.4 Second Law of Thermodynamics. 4.5 Functions of Markov Chains. Summary. Problems. Historical Notes. 5. Data Compression. 5.1 Examples of Codes. 5.2 Kraft Inequality. 5.3 Optimal Codes. 5.4 Bounds on the Optimal Code Length. 5.5 Kraft Inequality for Uniquely Decodable Codes. 5.6 Huffman Codes. 5.7 Some Comments on Huffman Codes. 5.8 Optimality of Huffman Codes. 5.9 Shannon-Fano-Elias Coding. 5.10 Competitive Optimality of the Shannon Code. 5.11 Generation of Discrete Distributions from Fair Coins. Summary. Problems. Historical Notes. 6. Gambling and Data Compression. 6.1 The Horse Race. 6.2 Gambling and Side Information. 6.3 Dependent Horse Races and Entropy Rate. 6.4 The Entropy of English. 6.5 Data Compression and Gambling. 6.6 Gambling Estimate of the Entropy of English. Summary. Problems. Historical Notes. 7. Channel Capacity. 7.1 Examples of Channel Capacity. 7.2 Symmetric Channels. 7.3 Properties of Channel Capacity. 7.4 Preview of the Channel Coding Theorem. 7.5 Definitions. 7.6 Jointly Typical Sequences. 7.7 Channel Coding Theorem. 7.8 Zero-Error Codes. 7.9 Fano's Inequality and the Converse to the Coding Theorem. 7.10 Equality in the Converse to the Channel Coding Theorem. 7.11 Hamming Codes. 7.12 Feedback Capacity. 7.13 Source-Channel Separation Theorem. Summary. Problems. Historical Notes. 8. Differential Entropy. 8.1 Definitions. 8.2 AEP for Continuous Random Variables. 8.3 Relation of Differential Entropy to Discrete Entropy. 8.4 Joint and Conditional Differential Entropy. 8.5 Relative Entropy and Mutual Information. 8.6 Properties of Differential Entropy, Relative Entropy, and Mutual Information. Summary. Problems. Historical Notes. 9. Gaussian Channel. 9.1 Gaussian Channel: Definitions. 9.2 Converse to the Coding Theorem for Gaussian Channels. 9.3 Bandlimited Channels. 9.4 Parallel Gaussian Channels. 9.5 Channels with Colored Gaussian Noise. 9.6 Gaussian Channels with Feedback. Summary. Problems. Historical Notes. 10. Rate Distortion Theory. 10.1 Quantization. 10.2 Definitions. 10.3 Calculation of the Rate Distortion Function. 10.4 Converse to the Rate Distortion Theorem. 10.5 Achievability of the Rate Distortion Function. 10.6 Strongly Typical Sequences and Rate Distortion. 10.7 Characterization of the Rate Distortion Function. 10.8 Computation of Channel Capacity and the Rate Distortion Function. Summary. Problems. Historical Notes. 11. Information Theory and Statistics. 11.1 Method of Types. 11.2 Law of Large Numbers. 11.3 Universal Source Coding. 11.4 Large Deviation Theory. 11.5 Examples of Sanov's Theorem. 11.6 Conditional Limit Theorem. 11.7 Hypothesis Testing. 11.8 Chernoff-Stein Lemma. 11.9 Chernoff Information. 11.10 Fisher Information and the Cram-er-Rao Inequality. Summary. Problems. Historical Notes. 12. Maximum Entropy. 12.1 Maximum Entropy Distributions. 12.2 Examples. 12.3 Anomalous Maximum Entropy Problem. 12.4 Spectrum Estimation. 12.5 Entropy Rates of a Gaussian Process. 12.6 Burg's Maximum Entropy Theorem. Summary. Problems. Historical Notes. 13. Universal Source Coding. 13.1 Universal Codes and Channel Capacity. 13.2 Universal Coding for Binary Sequences. 13.3 Arithmetic Coding. 13.4 Lempel-Ziv Coding. 13.5 Optimality of Lempel-Ziv Algorithms. Compression. Summary. Problems. Historical Notes. 14. Kolmogorov Complexity. 14.1 Models of Computation. 14.2 Kolmogorov Complexity: Definitions and Examples. 14.3 Kolmogorov Complexity and Entropy. 14.4 Kolmogorov Complexity of Integers. 14.5 Algorithmically Random and Incompressible Sequences. 14.6 Universal Probability. 14.7 Kolmogorov complexity. 14.9 Universal Gambling. 14.10 Occam's Razor. 14.11 Kolmogorov Complexity and Universal Probability. 14.12 Kolmogorov Sufficient Statistic. 14.13 Minimum Description Length Principle. Summary. Problems. Historical Notes. 15. Network Information Theory. 15.1 Gaussian Multiple-User Channels. 15.2 Jointly Typical Sequences. 15.3 Multiple-Access Channel. 15.4 Encoding of Correlated Sources. 15.5 Duality Between Slepian-Wolf Encoding and Multiple-Access Channels. 15.6 Broadcast Channel. 15.7 Relay Channel. 15.8 Source Coding with Side Information. 15.9 Rate Distortion with Side Information. 15.10 General Multiterminal Networks. Summary. Problems. Historical Notes. 16. Information Theory and Portfolio Theory. 16.1 The Stock Market: Some Definitions. 16.2 Kuhn-Tucker Characterization of the Log-Optimal Portfolio. 16.3 Asymptotic Optimality of the Log-Optimal Portfolio. 16.4 Side Information and the Growth Rate. 16.5 Investment in Stationary Markets. 16.6 Competitive Optimality of the Log-Optimal Portfolio. 16.7 Universal Portfolios. 16.8 Shannon-McMillan-Breiman Theorem (General AEP). Summary. Problems. Historical Notes. 17. Inequalities in Information Theory. 17.1 Basic Inequalities of Information Theory. 17.2 Differential Entropy. 17.3 Bounds on Entropy and Relative Entropy. 17.4 Inequalities for Types. 17.5 Combinatorial Bounds on Entropy. 17.6 Entropy Rates of Subsets. 17.7 Entropy and Fisher Information. 17.8 Entropy Power Inequality and Brunn-Minkowski Inequality. 17.9 Inequalities for Determinants. 17.10 Inequalities for Ratios of Determinants. Summary. Problems. Historical Notes. Bibliography. List of Symbols. Index.

45,034 citations


"Bit-interleaved coded modulation" refers background in this paper

  • ...Consider the memoryless discrete-input, continuous-output channel with input , output , and transition distribution . Under the assumption , from the “chain rule” of mutual information [ 10 ] we obtain the inequality . The right-hand- and left-hand-side terms of this inequality are attained when the receiver has perfect CSI and no CSI, respectively....

    [...]

Book
01 Jan 1983

25,017 citations

Journal ArticleDOI
Emre Telatar1
01 Nov 1999
TL;DR: In this paper, the authors investigate the use of multiple transmitting and/or receiving antennas for single user communications over the additive Gaussian channel with and without fading, and derive formulas for the capacities and error exponents of such channels, and describe computational procedures to evaluate such formulas.
Abstract: We investigate the use of multiple transmitting and/or receiving antennas for single user communications over the additive Gaussian channel with and without fading. We derive formulas for the capacities and error exponents of such channels, and describe computational procedures to evaluate such formulas. We show that the potential gains of such multi-antenna systems over single-antenna systems is rather large under independenceassumptions for the fades and noises at different receiving antennas.

12,542 citations

Book
01 Jan 1963
TL;DR: A simple but nonoptimum decoding scheme operating directly from the channel a posteriori probabilities is described and the probability of error using this decoder on a binary symmetric channel is shown to decrease at least exponentially with a root of the block length.
Abstract: A low-density parity-check code is a code specified by a parity-check matrix with the following properties: each column contains a small fixed number j \geq 3 of l's and each row contains a small fixed number k > j of l's. The typical minimum distance of these codes increases linearly with block length for a fixed rate and fixed j . When used with maximum likelihood decoding on a sufficiently quiet binary-input symmetric channel, the typical probability of decoding error decreases exponentially with block length for a fixed rate and fixed j . A simple but nonoptimum decoding scheme operating directly from the channel a posteriori probabilities is described. Both the equipment complexity and the data-handling capacity in bits per second of this decoder increase approximately linearly with block length. For j > 3 and a sufficiently low rate, the probability of error using this decoder on a binary symmetric channel is shown to decrease at least exponentially with a root of the block length. Some experimental results show that the actual probability of decoding error is much smaller than this theoretical bound.

11,592 citations


"Bit-interleaved coded modulation" refers background in this paper

  • ...The discovery of turbo codes [11] and the re-discovery of low-density parity-check (LDPC) codes [38, 69] with their corresponding iterative Full text available at: http://dx....

    [...]

Proceedings Article
01 Jan 1993

7,742 citations

Frequently Asked Questions (12)
Q1. What contributions have the authors mentioned in the paper "Bit-interleaved coded modulation" ?

The principle of coding in the signal space follows directly from Shannon ’ s analysis of waveform Gaussian channels subject to an input constraint. The early design of communication systems focused separately on modulation, namely signal design and detection, and error correcting codes, which deal with errors introduced at the demodulator of the underlying waveform channel. In this monograph, the authors review the theoretical foundations of BICM under the unified framework of error exponents for mismatched decoding. This framework allows an accurate analysis without any particular assumptions on the length of the interleaver or independence between the multiple bits in a symbol. The authors further consider the sensitivity of the BICM capacity with respect to the signal-to-noise ratio ( SNR ), and obtain a wideband regime ( or low-SNR regime ) characterization. The authors review efficient tools for the error probability analysis of BICM that go beyond the standard approach of considering infinite interleaving and take into consideration the dependency of the coded bit observations introduced by the modulation. The authors also present bounds that improve upon the union bound in the region beyond the cutoff rate, and are essential to characterize the performance of modern randomlike codes used in concatenation with BICM. Finally, the authors turn their attention to BICM with iterative decoding, they review extrinsic information transfer charts, the area theorem and code design via curve fitting. 

In particular, the authors review current work and outline how to extend the results they presented throughout the monograph to non-coherent detection, block-fading, multiple-input multiple-output ( MIMO ) channels and non-standard channels such as the exponential-noise channel. 

In order to limit the loss of information arising in this separated approach, soft information about the coded bits is propagated from the demodulator to the decoder in the form of bit-wise a posteriori probabilities or log-likelihood ratios. 

At the receiver, instead of optimal joint decoding of all the component binary codes, a suboptimal multi-stage decoding, alternatively termed successive interference cancellation, achieves good performance with limited complexity. 

The discovery of turbo codes [11] and the re-discovery of low-density parity-check (LDPC) codes [38, 69] with their corresponding iterative decoding algorithms marked a new era in Coding Theory. 

A practical way of constructing codes for the Gaussian channel consists of fixing the modulator signal set, and then considering codewords obtained as sequences over the fixed modulator signal set, or alphabet. 

Cumulant transform of pairwise score κpw(s, ρN ) Cumulant transform of pairwise score for pattern ρN M Input set (constellation) X cardinality m Number of bits per modulation symbol mf Nakagami fading parameter µ Labeling (mapping) rule M Message set m Message m̂ Message estimate mmse(snr) MMSE of estimating input X (Gaussian channel) N Number of channel uses N0 Noise spectral density (one-sided) N (·) Neighborhood around a node (in factor graph) νf→ϑ Function-to-variable message νϑ→f Variable-to-function message O ( f(x) ) 

Driven by Massey’s consideration on coding and modulation as a single entity [79], Ungerböck in 1982 proposed Trellis-Coded Modulation (TCM), based on the combination of trellis codes and discrete signal constellations through set partitioning [130] (see also [15]). 

Power expansion ratio ∆W Bandwidth expansion ratio E[U ] Expectation (of a random variable U) Eb Average bit energy Eb N0Ratio between average bit energy and noise spectral density Eb N0 lim Eb N0at vanishing SNR Es Average signal energy E(R) Reliability function at rate R Ebicm0 (ρ, s) BICM random coding exponent Ecm0 (ρ) CM random coding exponent Eq0(ρ, s) Generalized Gallager function Eqr (R) Random coding exponent with mismatched decoding exitdec(y) Extrinsic information at decoderContents vexitdem(x) Extrinsic information at demapper hk Fading realization at time k I(X;Y ) 

The authors then turn their attention to iterative decoding of BICM; the authors also review the underlying design techniques and introduce improved BICM schemes in a unified framework. 

Shannon’s promise that rates up to the channel capacity can be reliably transmitted over the channel comes together with the design challenge of effectively constructing coding schemes achieving these rates with limited encoding and decoding complexity. 

Bit probability (from the demapper) Q(·) Gaussian tail function q(x, y) Symbol decoding metric q(x,y) Codeword decoding metric qj(b, y) Bit decoding metric of j-th bit R Code rate, R = log2 |M|/N r Binary code rate, r = log2 |C|/n = R/m R0 Cutoff rate Rav0 Cutoff rate for average-channel model Rind0 Cutoff rate for independent-channel model Rq0 Generalized cutoff rate (mismatched decoding) ρN Bit distribution pattern over codeword∑∼x Summary operator —excluding x— ŝ Saddlepoint value σ2X Variance σ̂2X Pseudo-variance, σ 2 X ∆= E [ |X|2 ] − |E[X]|2 ζ0 Wideband slope snr Signal-to-noise ratio W Signal bandwidth X Input signal set (constellation) X jb Set of symbols with bit b at j-th label X ji1 ,...,jivbi1 ,...,biv