scispace - formally typeset
Search or ask a question

Showing papers in "IEEE Transactions on Information Theory in 1996"


Journal ArticleDOI
TL;DR: Using log-likelihood algebra, it is shown that any decoder can be used which accepts soft inputs-including a priori values-and delivers soft outputs that can be split into three terms: the soft channel and aPriori inputs, and the extrinsic value.
Abstract: Iterative decoding of two-dimensional systematic convolutional codes has been termed "turbo" (de)coding. Using log-likelihood algebra, we show that any decoder can be used which accepts soft inputs-including a priori values-and delivers soft outputs that can be split into three terms: the soft channel and a priori inputs, and the extrinsic value. The extrinsic value is used as an a priori value for the next iteration. Decoding algorithms in the log-likelihood domain are given not only for convolutional codes but also for any linear binary systematic block code. The iteration is controlled by a stop criterion derived from cross entropy, which results in a minimal number of iterations. Optimal and suboptimal decoders with reduced complexity are presented. Simulation results show that very simple component codes are sufficient, block codes are appropriate for high rates and convolutional codes for lower rates less than 2/3. Any combination of block and convolutional component codes is possible. Several interleaving techniques are described. At a bit error rate (BER) of 10/sup -4/ the performance is slightly above or around the bounds given by the cutoff rate for reasonably simple block/convolutional component codes, interleaver sizes less than 1000 and for three to six iterations.

2,632 citations


Journal ArticleDOI
TL;DR: A method to evaluate an upper bound to the bit error probability of a parallel concatenated coding scheme averaged over all interleavers of a given length is proposed and used to shed some light on some crucial questions which have been floating around in the communications community since the proposal of turbo codes.
Abstract: A parallel concatenated coding scheme consists of two simple constituent systematic encoders linked by an interleaver. The input bits to the first encoder are scrambled by the interleaver before entering the second encoder. The codeword of the parallel concatenated code consists of the input bits to the first encoder followed by the parity check bits of both encoders. This construction can be generalized to any number of constituent codes. Parallel concatenated schemes employing two convolutional codes as constituent codes, in connection with an iterative decoding algorithm of complexity comparable to that of the constituent codes, have been previously shown to yield remarkable coding gains close to theoretical limits. They have been named, and are known as, "turbo codes". We propose a method to evaluate an upper bound to the bit error probability of a parallel concatenated coding scheme averaged over all interleavers of a given length. The analytical bounding technique is then used to shed some light on some crucial questions, which have been floating around in the communications community since the proposal of turbo codes.

1,224 citations


Journal ArticleDOI
TL;DR: Using expander graphs, a new family of asymptotically good, linear error-correcting codes are constructed that have linear time sequential decoding algorithms and logarithmic time parallel decoding algorithms that use a linear number of processors.
Abstract: Using expander graphs, we construct a new family of asymptotically good, linear error-correcting codes. These codes have linear time sequential decoding algorithms and logarithmic time parallel decoding algorithms that use a linear number of processors. We present both randomized and explicit constructions of these codes. Experimental results demonstrate the good performance of the randomly chosen codes.

949 citations


Journal ArticleDOI
Jorma Rissanen1
TL;DR: A sharper code length is obtained as the stochastic complexity and the associated universal process are derived for a class of parametric processes by taking into account the Fisher information and removing an inherent redundancy in earlier two-part codes.
Abstract: By taking into account the Fisher information and removing an inherent redundancy in earlier two-part codes, a sharper code length as the stochastic complexity and the associated universal process are derived for a class of parametric processes. The main condition required is that the maximum-likelihood estimates satisfy the central limit theorem. The same code length is also obtained from the so-called maximum-likelihood code.

906 citations


Journal ArticleDOI
TL;DR: An information-theoretic lower bound is given that for any set of priorities the total length of the encoding packets must be at least the girth, and the system introduced is optimal in terms of the total encoding length.
Abstract: We introduce a new method, called priority encoding transmission, for sending messages over lossy packet-based networks. When a message is to be transmitted, the user specifies a priority value for each part of the message. Based on the priorities, the system encodes the message into packets for transmission and sends them to (possibly multiple) receivers. The priority value of each part of the message determines the fraction of encoding packets sufficient to recover that part. Thus even if some of the encoding packets are lost en-route, each receiver is still able to recover the parts of the message for which a sufficient fraction of the encoding packets are received. For any set of priorities for a message, we define a natural quantity called the girth of the priorities. We develop systems for implementing any given set of priorities such that the total length of the encoding packets is equal to the girth. On the other hand, we give an information-theoretic lower bound that shows that for any set of priorities the total length of the encoding packets must be at least the girth. Thus the system we introduce is optimal in terms of the total encoding length. This work has immediate applications to multimedia and high-speed networks applications, especially in those with bursty sources and multiple receivers with heterogeneous capabilities. Implementations of the system show promise of being practical.

648 citations


Journal ArticleDOI
TL;DR: The interleaver in the turbo encoder is shown to reduce the number of low-weight codewords through a process called "spectral thinning," which results in the free distance asymptote being the dominant performance parameter for low and moderate signal-to-noise ratios.
Abstract: The performance of turbo codes is addressed by examining the code's distance spectrum. The "error floor" that occurs at moderate signal-to-noise ratios is shown to be a consequence of the relatively low free distance of the code. It is also shown that the "error floor" can be lowered by increasing the size of the interleaver without changing the free distance of the code. Alternatively, the free distance of the code may be increased by using primitive feedback polynomials. The excellent performance of turbo codes at low signal-to-noise ratios is explained in terms of the distance spectrum. The interleaver in the turbo encoder is shown to reduce the number of low-weight codewords through a process called "spectral thinning." This thinned distance spectrum results in the free distance asymptote being the dominant performance parameter for low and moderate signal-to-noise ratios.

497 citations


Journal ArticleDOI
TL;DR: There does not exist a finite value of R for which even infinitely many agents can make D arbitrarily small, and in this isolated-agents case the asymptotic behavior of the minimal error frequency in the limit as L and then R tend to infinity is determined.
Abstract: We consider a new problem in multiterminal source coding motivated by the following decentralized communication/estimation task. A firm's Chief Executive Officer (CEO) is interested in the data sequence {X(t)}/sub t=1//sup /spl infin// which cannot be observed directly, perhaps because it represents tactical decisions by a competing firm. The CEO deploys a team of L agents who observe independently corrupted versions of {X(t)}/sub t=1//sup /spl infin//. Because {X(t)} is only one among many pressing matters to which the CEO must attend, the combined data rate at which the agents may communicate information about their observations to the CEO is limited to, say, R bits per second. If the agents were permitted to confer and pool their data, then in the limit as L/spl rarr//spl infin/ they usually would be able to smooth out their independent observation noises entirely. Then they could use their R bits per second to provide the CEO with a representation of {X(t)} with fidelity D(R), where D(/spl middot/) is the distortion-rate function of {X(t)}. In particular, with such data pooling D can be made arbitrarily small if R exceeds the entropy rate H of {X(t)}. Suppose, however, that the agents are not permitted to convene, Agent i having to send data based solely on his own noisy observations {Y/sub i/(t)}. We show that then there does not exist a finite value of R for which even infinitely many agents can make D arbitrarily small. Furthermore, in this isolated-agents case we determine the asymptotic behavior of the minimal error frequency in the limit as L and then R tend to infinity.

468 citations


Journal ArticleDOI
TL;DR: It is shown that any point in the capacity region of a Gaussian multiple-access channel is achievable by single-user coding without requiring synchronization among users, provided that each user "splits" data and signal into two parts.
Abstract: It is shown that any point in the capacity region of a Gaussian multiple-access channel is achievable by single-user coding without requiring synchronization among users, provided that each user "splits" data and signal into two parts. Based on this result, a new multiple-access technique called rate-splitting multiple accessing (RSMA) is proposed. RSMA is a code-division multiple-access scheme for the M-user Gaussian multiple-access channel for which the effort of finding the codes for the M users, of encoding, and of decoding is that of at most 2M-1 independent point-to-point Gaussian channels. The effects of bursty sources, multipath fading, and inter-cell interference are discussed and directions for further research are indicated.

468 citations


Journal ArticleDOI
TL;DR: This is an individual sequence result which shows the difference between the exponential growth wealth of the best state-constant rebalanced portfolio and the universal portfolio with side information is uniformly less than (d/(2n))log (n+1)+(k/n)log 2 for every stock market and side-information sequence and for all time n.
Abstract: We present a sequential investment algorithm, the /spl mu/-weighted universal portfolio with side information, which achieves, to first order in the exponent, the same wealth as the best side-information dependent investment strategy (the best state-constant rebalanced portfolio) determined in hindsight from observed market and side-information outcomes. This is an individual sequence result which shows the difference between the exponential growth wealth of the best state-constant rebalanced portfolio and the universal portfolio with side information is uniformly less than (d/(2n))log (n+1)+(k/n)log 2 for every stock market and side-information sequence and for all time n. Here d=k(m-1) is the number of degrees of freedom in the state-constant rebalanced portfolio with k states of side information and m stocks. The proof of this result establishes a close connection between universal investment and universal data compression.

380 citations


Journal ArticleDOI
TL;DR: A new family of lattice constellations, based on complex algebraic number fields, which have good performance on Rayleigh fading channels are presented and some of these lattices also present a reasonable packing density and thus may be used at the same time over a Gaussian channel.
Abstract: Recent work on lattices matched to the Rayleigh fading channel has shown how to construct good signal constellations with high spectral efficiency. We present a new family of lattice constellations, based on complex algebraic number fields, which have good performance on Rayleigh fading channels. Some of these lattices also present a reasonable packing density and thus may be used at the same time over a Gaussian channel. Conversely, we show that particular versions of the best lattice packings (D/sub 4/, E/sub 6/, E/sub 8/, K/sub 12/, /spl Lambda//sub 16/, /spl Lambda//sub 24/), constructed from totally complex algebraic cyclotomic fields, present better performance over the Rayleigh fading channel. The practical interest in such signal constellations rises from the need to transmit information at high rates over both terrestrial and satellite links. Some further results in algebraic number theory related to ideals and their factorization are presented and the decoding algorithm used with these lattice constellations are illustrated together with practical results.

366 citations


Journal ArticleDOI
TL;DR: A new class of asymptotically good, linear error-correcting codes can be both encoded and decoded in linear time and both randomized and explicit constructions of these codes are presented.
Abstract: We present a new class of asymptotically good, linear error-correcting codes These codes can be both encoded and decoded in linear time They can also be encoded by logarithmic-depth circuits of linear size and decoded by logarithmic depth circuits of size O(nlogn) We present both randomized and explicit constructions of these codes

Journal ArticleDOI
TL;DR: It is shown that the achievable rates depend on the noise distribution only via its power and thus coincide with the capacity region of a white Gaussian noise channel with signal and noise power equal to those of the original channel.
Abstract: We study the performance of a transmission scheme employing random Gaussian codebooks and nearest neighbor decoding over a power limited additive non-Gaussian noise channel. We show that the achievable rates depend on the noise distribution only via its power and thus coincide with the capacity region of a white Gaussian noise channel with signal and noise power equal to those of the original channel. The results are presented for single-user channels as well as multiple-access channels, and are extended to fading channels with side information at the receiver.

Journal ArticleDOI
TL;DR: It is shown that, among all trellises that represent a given code, the original trellis introduced by Bahl, Cocke, Jelinek, and Raviv in 1974, and later rediscovered by Wolf, Massey, and Forney, uniquely minimizes the edge count.
Abstract: In this semi-tutorial paper, we will investigate the computational complexity of an abstract version of the Viterbi algorithm on a trellis, and show that if the trellis has e edges, the complexity of the Viterbi algorithm is /spl Theta/(e). This result suggests that the "best" trellis representation for a given linear block code is the one with the fewest edges. We will then show that, among all trellises that represent a given code, the original trellis introduced by Bahl, Cocke, Jelinek, and Raviv in 1974, and later rediscovered by Wolf (1978), Massey (1978), and Forney (1988), uniquely minimizes the edge count, as well as several other figures of merit. Following Forney and Kschischang and Sorokine (1995), we will also discuss "trellis-oriented" or "minimal-span" generator matrices, which facilitate the calculation of the size of the BCJR trellis, as well as the actual construction of it.

Journal ArticleDOI
TL;DR: It is shown that labeled samples are necessary to construct a classification rule and that they are exponentially more valuable than unlabeled samples.
Abstract: We observe a training set Q composed of l labeled samples {(X/sub 1/,/spl theta//sub 1/),...,(X/sub l/, /spl theta//sub l/)} and u unlabeled samples {X/sub 1/',...,X/sub u/'}. The labels /spl theta//sub i/ are independent random variables satisfying Pr{/spl theta//sub i/=1}=/spl eta/, Pr{/spl theta//sub i/=2}=1-/spl eta/. The labeled observations X/sub i/ are independently distributed with conditional density f/sub /spl theta/i/(/spl middot/) given /spl theta//sub i/. Let (X/sub 0/,/spl theta//sub 0/) be a new sample, independently distributed as the samples in the training set. We observe X/sub 0/ and we wish to infer the classification /spl theta//sub 0/. In this paper we first assume that the distributions f/sub 1/(/spl middot/) and f/sub 2/(/spl middot/) are given and that the mixing parameter is unknown. We show that the relative value of labeled and unlabeled samples in reducing the risk of optimal classifiers is the ratio of the Fisher informations they carry about the parameter /spl eta/. We then assume that two densities g/sub 1/(/spl middot/) and g/sub 2/(/spl middot/) are given, but we do not know whether g/sub 1/(/spl middot/)=f/sub 1/(/spl middot/) and g/sub 2/(/spl middot/)=f/sub 2/(/spl middot/) or if the opposite holds, nor do we know /spl eta/. Thus the learning problem consists of both estimating the optimum partition of the observation space and assigning the classifications to the decision regions. Here, we show that labeled samples are necessary to construct a classification rule and that they are exponentially more valuable than unlabeled samples.

Journal ArticleDOI
TL;DR: It is shown that imposing a unit-energy constraint results in a lower mean-square error at a comparable computational complexity, and a new characterization of the optimum delay is described and shown how to compute it.
Abstract: A unified approach for computing the optimum settings of a length-N/sub f/ input-aided equalizer that minimizes the mean-square error between the equalized channel impulse response and a target impulse response of a given length N/sub b/ is presented. This approach offers more insight into the problem, easily accommodates correlation in the input and noise sequences, leads to significant computational savings, and allows us to analyze a variety of constraints on the target impulse response besides the standard unit-tap constraint. In particular, we show that imposing a unit-energy constraint results in a lower mean-square error at a comparable computational complexity. Furthermore, we show that, under the assumed constraint of finite-length filters, the relative delay between the equalizer and the target impulse response plays a crucial role in optimizing performance. We describe a new characterization of the optimum delay and show how to compute it. Finally, we derive reduced-parameter pole-zero models of the equalizer that achieve the high performance of a long all-zero equalizer at a much lower implementation cost.

Journal ArticleDOI
TL;DR: The memoryless property of the exponential distribution is used to develop a new noniterative algorithm for obtaining the optimal quantizer design, which needs only a single sequence of solutions to one-dimensional nonlinear equations.
Abstract: This paper presents solutions to the entropy-constrained scalar quantizer (ECSQ) design problem for two sources commonly encountered in image and speech compression applications: sources having the exponential and Laplacian probability density functions. We use the memoryless property of the exponential distribution to develop a new noniterative algorithm for obtaining the optimal quantizer design. We show how to obtain the optimal ECSQ either with or without an additional constraint on the number of levels in the quantizer. In contrast to prior methods, which require a multidimensional iterative solution of a large number of nonlinear equations, the new method needs only a single sequence of solutions to one-dimensional nonlinear equations (in some Laplacian cases, one additional two-dimensional solution is needed). As a result, the new method is orders of magnitude faster than prior ones. We show that as the constraint on the number of levels in the quantizer is relaxed, the optimal ECSQ becomes a uniform threshold quantizer (UTQ) for exponential, but not for Laplacian sources. We then further examine the performance of the UTQ and optimal ECSQ, and also investigate some interesting alternatives to the UTQ, including a uniform-reconstruction quantizer (URQ) and a constant dead-zone ratio quantizer (CDZRQ).

Journal ArticleDOI
R. Zamir1
TL;DR: It is shown that for difference (or balanced) distortion measures, this loss is bounded by a universal constant, which is the minimax capacity of a suitable additive-noise channel and is equal to the maximin redundancy over the rate-distortion function of the additive noise "test" channel.
Abstract: The rate-distortion function for source coding with side information at the decoder (the "Wyner-Ziv problem") is given in terms of an auxiliary random variable, which forms a Markov chain with the source and the side information. This Markov chain structure, typical to the solution of multiterminal source coding problems, corresponds to a loss in coding rate with respect to the conditional rate-distortion function, i.e., to the case where the encoder is fully informed. We show that for difference (or balanced) distortion measures, this loss is bounded by a universal constant, which is the minimax capacity of a suitable additive-noise channel. Furthermore, in the worst case, this loss is equal to the maximin redundancy over the rate-distortion function of the additive noise "test" channel. For example, the loss in the Wyner-Ziv problem is less than 0.5 bit/sample in the squared-error distortion case, and it is less than 0.22 bit for a binary source with Hamming distance. These results have implications also in universal quantization with side information, and in more general multiterminal source coding problems.

Journal ArticleDOI
TL;DR: A description of the relationship between Err(f) and Dist( f) which is nearly complete in all its aspects, and entirely complete in some and a new connection between linearity testing and Fourier analysis is obtained.
Abstract: Let Dist(f,g)=Pr/sub u/[f(u)/spl ne/g(u)] denote the relative distance between functions f,g mapping from a group G to a group H, and let Dist(f) denote the minimum, over all linear functions (homomorphisms) g, of Dist(f,g). Given a function f:G/spl rarr/H we let Err(f)=Pr/sub u,/spl upsi//[f(u)+f(/spl upsi/)/spl ne/f(u+/spl upsi/)] denote the rejection probability of the Blum-Luby-Rubinfeld (1993) linearity test. Linearity testing is the study of the relationship between Err(f) and Dist(f), and in particular lower bounds on Err(f) in terms of Dist(f). We discuss when the underlying groups are G=GF(2)/sup n/ and H=GF(2). In this case, the collection of linear functions describe a Hadamard code of block length 2/sup n/ and for an arbitrary function f mapping GF(2)/sup n/ to GF(2) the distance Dist(l) measures its distance to a Hadamard code. Err(f) is a parameter that is "easy to measure" and linearity testing studies the relationship of this parameter to the distance of f. The code and corresponding test are used in the construction of efficient probabilistically checkable proofs and thence in the derivation of hardness of approximation. Improved analyses translate into better nonapproximability results. We present a description of the relationship between Err(f) and Dist(f) which is nearly complete in all its aspects, and entirely complete in some. We present functions L,U:[0,1]/spl rarr/[0,1] such that for all x /spl isin/ [0,1] we have L(x)/spl les/Err(f)/spl les/U(x) whenever Dist(f)=x, with the upper bound being tight on the whole range, and the lower bound tight on a large part of the range and close on the rest. Part of our strengthening is obtained by showing a new connection between linearity testing and Fourier analysis.

Journal ArticleDOI
TL;DR: It is shown that the class of two-layer neural networks with bounded fan-in is efficiently learnable in a realistic extension to the probably approximately correct (PAC) learning model.
Abstract: We show that the class of two-layer neural networks with bounded fan-in is efficiently learnable in a realistic extension to the probably approximately correct (PAC) learning model. In this model, a joint probability distribution is assumed to exist on the observations and the learner is required to approximate the neural network which minimizes the expected quadratic error. As special cases, the model allows learning real-valued functions with bounded noise, learning probabilistic concepts, and learning the best approximation to a target function that cannot be well approximated by the neural network. The networks we consider have real-valued inputs and outputs, an unlimited number of threshold hidden units with bounded fan-in, and a bound on the sum of the absolute values of the output weights. The number of computation steps of the learning algorithm is bounded by a polynomial in 1//spl epsiv/, 1//spl delta/, n and B where /spl epsiv/ is the desired accuracy, /spl delta/ is the probability that the algorithm fails, n is the input dimension, and B is the bound on both the absolute value of the target (which may be a random variable) and the sum of the absolute values of the output weights. In obtaining the result, we also extended some results on iterative approximation of functions in the closure of the convex hull of a function class and on the sample complexity of agnostic learning with the quadratic loss function.

Journal ArticleDOI
TL;DR: This paper proposes a new identification scheme, based on error-correcting codes, which is zero-knowledge and seems of practical value, and describes several variants, including one which has an identity-based character.
Abstract: The present paper investigates the possibility of designing zero-knowledge identification schemes based on hard problems from coding theory. Zero-knowledge proofs were introduced by Goldwasser, Micali, and Rackoff (1985). Their practical significance was soon demonstrated in the work of Fiat and Shamir [1986], who turned zero-knowledge proofs of quadratic residuosity into efficient means of establishing user identities. In the present paper, we propose a new identification scheme, based on error-correcting codes, which is zero-knowledge and seems of practical value. Furthermore, we describe several variants, including one which has an identity-based character. The security of our schemes depends on the hardness of finding a word of given syndrome and prescribed (small) weight with respect to some randomly generated binary linear error-correcting code. This is, of course, not the first attempt to design a cryptographic scheme using tools from coding theory. The difference is that identification protocols do not follow the public key paradigm based on trap-door functions and described in the seminal Diffie-Hellman paper [1976]. Rather, they only require one-way functions, which opens the way to using, in a rather direct manner, simple combinatorial problems of the kind provided by coding theory. The resulting schemes compare favorably to their number-theoretic analogs.

Journal ArticleDOI
Noga Alon1, Alon Orlitsky2
TL;DR: The expected number of bits the sender must transmit for one and for multiple instances in two communication scenarios are studied and relate this number to the chromatic and Korner (1973) entropies of a naturally defined graph.
Abstract: A sender wants to accurately convey information to a receiver who has some, possibly related, data. We study the expected number of bits the sender must transmit for one and for multiple instances in two communication scenarios and relate this number to the chromatic and Korner (1973) entropies of a naturally defined graph.

Journal ArticleDOI
TL;DR: An efficient scheme that produces an encoding of a given message such that the message can be decoded from any portion of the encoding that is approximately equal to the length of the message is developed.
Abstract: We develop an efficient scheme that produces an encoding of a given message such that the message can be decoded from any portion of the encoding that is approximately equal to the length of the message. More precisely, an (n,c,l,r)-erasure-resilient code consists of an encoding algorithm and a decoding algorithm with the following properties. The encoding algorithm produces a set of l-bit packets of total length cn from an n-bit message. The decoding algorithm is able to recover the message from any set of packets whose total length is r, i.e., from any set of r/l packets. We describe erasure-resilient codes where both the encoding and decoding algorithms run in linear time and where r is only slightly larger than n.

Journal ArticleDOI
TL;DR: In the case of Ornstein, Prohorov and other distances of the Kantorovich-Vasershtein type, it is shown that the finite-precision resolvability is equal to the rate-distortion function with a fidelity criterion derived from the accuracy measure, which leads to new results on nonstationary rate- Distortion theory.
Abstract: We study the randomness necessary for the simulation of a random process with given distributions, on terms of the finite-precision resolvability of the process. Finite-precision resolvability is defined as the minimal random-bit rate required by the simulator as a function of the accuracy with which the distributions are replicated. The accuracy is quantified by means of various measures: variational distance, divergence, Orstein (1973), Prohorov (1956) and related measures of distance between the distributions of random process. In the case of Ornstein, Prohorov and other distances of the Kantorovich-Vasershtein type, we show that the finite-precision resolvability is equal to the rate-distortion function with a fidelity criterion derived from the accuracy measure. This connection leads to new results on nonstationary rate-distortion theory. In the case of variational distance, the resolvability of stationary ergodic processes is shown to equal entropy rate regardless of the allowed accuracy. In the case of normalized divergence, explicit expressions for finite-precision resolvability are obtained in many cases of interest; and connections with data compression with minimum probability of block error are shown.

Journal ArticleDOI
TL;DR: This paper defines a convolutional code as the dual of a complete linear behavior in the sense of Willems (1979) and describes a set of generalized first-order descriptions for convolutionan codes using ideas from systems theory.
Abstract: It is well known that a convolutional code is essentially a linear system defined over a finite field. In this paper we elaborate on this connection. We define a convolutional code as the dual of a complete linear behavior in the sense of Willems (1979). Using ideas from systems theory, we describe a set of generalized first-order descriptions for convolutional codes. As an application of these ideas, we present a new algebraic construction for convolutional codes.

Journal ArticleDOI
TL;DR: This work proposes abstract minimum complexity regression estimators for dependent observations, which may be adapted to a particular list of parametric models, and establishes upper bounds on the statistical risks of the proposed estimators in terms of certain deterministic indices of resolvability.
Abstract: The minimum complexity regression estimation framework (Barron, 1991; Barron and Cover, 1991 and Rissanen, 1989) is a general data-driven methodology for estimating a regression function from a given list of parametric models using independent and identically distributed (i.i.d.) observations. We extend Barron's regression estimation framework to m-dependent observations and to strongly mixing observations. In particular, we propose abstract minimum complexity regression estimators for dependent observations, which may be adapted to a particular list of parametric models, and establish upper bounds on the statistical risks of the proposed estimators in terms of certain deterministic indices of resolvability. Assuming that the regression function satisfies a certain Fourier-transform-type representation, we examine minimum complexity regression estimators adapted to a list of parametric models based on neural networks and by using the upper bounds for the abstract estimators, we establish rates of convergence for the statistical risks of these estimators. Also, as a key tool, we extend the classical Bernstein inequality from i.i.d. random variables to m-dependent processes and to strongly mixing processes.

Journal ArticleDOI
TL;DR: Several new and old lower bounds on the capacity are derived in a unified manner by assuming different front-end receiver filters, in particular the sampled whitened matched filter (SWMF) and the minimum mean-squared error-decision feedback equalizer (MMSE-DFE) filter.
Abstract: The discrete-time additive Gaussian intersymbol interference (ISI) channel with i.i.d. (not necessarily Gaussian) input signals is considered. Several new and old lower bounds on the capacity are derived in a unified manner by assuming different front-end receiver filters, in particular the sampled whitened matched filter (SWMF) and the minimum mean-squared error-decision feedback equalizer (MMSE-DFE) filter. The features of the bounds are demonstrated and compared in several examples with binary and quaternary input signals. It is also shown that the effect of an ideal post-cursor or tail cancellation, in an information-preserving context, depends primarily on the front-end filter. While, as is well known, ideal post-cursor cancellation at the output of the SWMF decreases the information, the opposite trend is seen when an MMSE-DFE front filter is considered. This observation reflects the basic theoretical obstacles in precoding, i.e., ideal post-cursor cancellation in the presence of a pre-cursor. It is used to assess the inherent loss (in terms of information rates as compared to the rates achievable with the hypothetical ideal post-cursor cancellation) associated with any post-cursor cancellation technique such as precoding, DFE, or other variants, when operating in synergy with the MMSE-DFE front-end filter. The effect of the front-end filter on an ideally interleaved, precoded coded-modulation system is also addressed.

Journal ArticleDOI
TL;DR: Two weighting procedures are presented for compaction of output sequences generated by binary independent sources whose unknown parameter may occasionally change and it is proved that additional-transition redundancy is not more than 3/2 log T bits per transition.
Abstract: Two weighting procedures are presented for compaction of output sequences generated by binary independent sources whose unknown parameter may occasionally change. The resulting codes need no knowledge of the sequence length T, i.e., they are strongly sequential, and also the number of parameter changes is unrestricted. The additional-transition redundancy of the first method was shown to achieve the Merhav lower bound, i.e., log T bits per transition. For the second method we could prove that additional-transition redundancy is not more than 3/2 log T bits per transition, which is more than the Merhav bound; however, the storage and computational complexity of this method are also more interesting than those of the first method. Simulations show that the difference in redundancy performance between the two methods is negligible.

Journal ArticleDOI
TL;DR: A new Reed-Solomon decoding algorithm is presented, which embodies several refinements of an earlier algorithm, and applies very naturally to RS codes which have been doubly extended by the inclusion of 0 and /spl infin/ as additional locations.
Abstract: We present a new Reed-Solomon decoding algorithm, which embodies several refinements of an earlier algorithm. Some portions of this new decoding algorithm operate on symbols of length lgq bits; other portions operate on somewhat longer symbols. In the worst case, the total number of calculations required by the new decoding algorithm is proportional to nr, where n is the code's block length and r is its redundancy. This worst case workload is very similar to prior algorithms. But in many applications, average-case workload and error-correcting performance are both much better. The input to the new algorithm consists of n received symbols from GF(q), and n nonnegative real numbers, each of which is the reliability of the corresponding received symbol. Any conceivable errata pattern has a "score" equal to the sum of the reliabilities of its locations with nonzero errata values. A max-likelihood decoder would find the minimum score over all possible errata patterns. Our new decoding algorithm finds the minimum score only over a subset of these possible errata patterns. The errata within any candidate errata pattern may be partitioned into "errors" and "erasures," depending on whether the corresponding reliabilities are above or below an "erasure threshold." Different candidate errata patterns may have different thresholds, each chosen to minimize its corresponding ERRATA COUNT, which is defined as 2/spl middot/(number of errors)+(number of erasures). The new algorithm finds an errata pattern with minimum score among all errata patterns for which ERRATA COUNT/spl les/r+1 where r is the redundancy of the RS code. Conventional algorithms also require that the erasure threshold be set a priori; the new algorithm obtains the best answer over all possible settings of the erasure threshold. Conventional cyclic RS codes have length n=q-1, and their locations correspond to the nonzero elements of GF(q). The new algorithm also applies very naturally to RS codes which have been doubly extended by the inclusion of 0 and /spl infin/ as additional locations.

Journal ArticleDOI
TL;DR: It is shown that the asymptotic variances of the estimates are close to the Cramer-Rao bound (CRB) for high SNR, however, the ratio of the ascyptotic variance and the CRB has a polynomial growth in the noise variance.
Abstract: The high-order ambiguity function (HAF) is a nonlinear operator designed to detect, estimate, and classify complex signals whose phase is a polynomial function of time. The HAF algorithm, introduced by Peleg and Porat (1991), estimates the phase parameters of polynomial-phase signals measured in noise. The purpose of this correspondence is to analyze the asymptotic accuracy of the HAF algorithm in the case of additive white Gaussian noise. It is shown that the asymptotic variances of the estimates are close to the Cramer-Rao bound (CRB) for high SNR. However, the ratio of the asymptotic variance and the CRB has a polynomial growth in the noise variance.

Journal ArticleDOI
R. Kotter1
TL;DR: An efficient general GMD decoding scheme for linear block codes in the framework of error-correcting pairs is derived and it is shown that it can find all relevant error-erasure-locating functions with complexity O(o/ sub 1/nd), where o/sub 1/ is the size of the first nongap in the function space associated with the code.
Abstract: Generalized minimum-distance (GMD) decoding is a standard soft-decoding method for block codes. We derive an efficient general GMD decoding scheme for linear block codes in the framework of error-correcting pairs. Special attention is paid to Reed-Solomon (RS) codes and one-point algebraic-geometry (AG) codes. For RS codes of length n and minimum Hamming distance d the GMD decoding complexity turns out to be in the order O(nd), where the complexity is counted as the number of multiplications in the field of concern. For AG codes the GMD decoding complexity is highly dependent on the curve in consideration. It is shown that we can find all relevant error-erasure-locating functions with complexity O(o/sub 1/nd), where o/sub 1/ is the size of the first nongap in the function space associated with the code. A full GMD decoding procedure for a one-point AG code can be performed with complexity O(dn/sup 2/).