scispace - formally typeset
Search or ask a question

Showing papers in "IEEE Transactions on Information Theory in 1972"


Journal ArticleDOI
TL;DR: In this paper, a maximum likelihood sequence estimator for a digital pulse-amplitude-modulated sequence in the presence of finite intersymbol interference and white Gaussian noise is developed, which comprises a sampled linear filter, called a whitened matched filter, and a recursive nonlinear processor, called the Viterbi algorithm.
Abstract: A maximum-likelihood sequence estimator for a digital pulse-amplitude-modulated sequence in the presence of finite intersymbol interference and white Gaussian noise is developed, The structure comprises a sampled linear filter, called a whitened matched filter, and a recursive nonlinear processor, called the Viterbi algorithm. The outputs of the whitened matched filter, sampled once for each input symbol, are shown to form a set of sufficient statistics for estimation of the input sequence, a fact that makes obvious some earlier results on optimum linear processors. The Viterbi algorithm is easier to implement than earlier optimum nonlinear processors and its performance can be straightforwardly and accurately estimated. It is shown that performance (by whatever criterion) is effectively as good as could be attained by any receiver structure and in many cases is as good as if intersymbol interference were absent. Finally, a simplified but effectively optimum algorithm suitable for the most popular partial-response schemes is described.

2,667 citations


Journal ArticleDOI
TL;DR: This work introduces the problem of a single source attempting to communicate information simultaneously to several receivers and determines the families of simultaneously achievable transmission rates for many extreme classes of channels to lead to a new approach to the compound channels problem.
Abstract: We introduce the problem of a single source attempting to communicate information simultaneously to several receivers. The intent is to model the situation of a broadcaster with multiple receivers or a lecturer with many listeners. Thus several different channels with a common input alphabet are specified. We shall determine the families of simultaneously achievable transmission rates for many extreme classes of channels. Upper and lower bounds on the capacity region will be found, and it will be shown that the family of theoretically achievable rates dominates the family of rates achievable by previously known time-sharing and maximin procedures. This improvement is gained by superimposing high-rate information on low-rate information. All of these results lead to a new approach to the compound channels problem.

1,781 citations


Journal ArticleDOI
D. Chu1
TL;DR: This correspondence describes the construction of complex codes of the form exp i \alpha_k whose discrete circular autocorrelations are zero for all nonzero lags.
Abstract: This correspondence describes the construction of complex codes of the form exp i \alpha_k whose discrete circular autocorrelations are zero for all nonzero lags. There is no restriction on code lengths.

1,624 citations


Journal ArticleDOI
TL;DR: A simple algorithm for computing channel capacity is suggested that consists of a mapping from the set of channel input probability vectors into itself such that the sequence of probability vectors generated by successive applications of the mapping converges to the vector that achieves the capacity of the given channel.
Abstract: By defining mutual information as a maximum over an appropriate space, channel capacities can be defined as double maxima and rate-distortion functions as double minima. This approach yields valuable new insights regarding the computation of channel capacities and rate-distortion functions. In particular, it suggests a simple algorithm for computing channel capacity that consists of a mapping from the set of channel input probability vectors into itself such that the sequence of probability vectors generated by successive applications of the mapping converges to the vector that achieves the capacity of the given channel. Analogous algorithms then are provided for computing rate-distortion functions and constrained channel capacities. The algorithms apply both to discrete and to continuous alphabet channels or sources. In addition, a formalization of the theory of channel capacity in the presence of constraints is included. Among the examples is the calculation of close upper and lower bounds to the rate-distortion function of a binary symmetric Markov source.

1,472 citations


Journal ArticleDOI
TL;DR: It is shown that as the signal-to-noise ratio (SNR) increases, the asymptotic behavior of these decoding algorithms cannot be improved, and computer simulations indicate that even for SNR the performance of a correlation decoder can be approached by relatively simple decoding procedures.
Abstract: A class of decoding algorithms that utilizes channel measurement information, in addition to the conventional use of the algebraic properties of the code, is presented. The maximum number of errors that can, with high probability, be corrected is equal to one less than d , the minimum Hamming distance of the code. This two-fold increase over the error-correcting capability of a conventional binary decoder is achieved by using channel measurement (soft-decision) information to provide a measure of the relative reliability of each of the received binary digits. An upper bound on these decoding algorithms is derived, which is proportional to the probability of an error for d th order diversity, an expression that has been evaluated for a wide range of communication channels and modulation techniques. With the aid of a lower bound on these algorithms, which is also a lower bound on a correlation (maximum-likelihood) decoder, we show for both the Gaussian and Rayleigh fading channels, that as the signal-to-noise ratio (SNR) increases, the asymptotic behavior of these decoding algorithms cannot be improved. Computer simulations indicate that even for !ow SNR the performance of a correlation decoder can be approached by relatively simple decoding procedures. In addition, we study the effect on the performance of these decoding algorithms when a threshold is used to simplify the decoding process.

1,165 citations



Journal ArticleDOI
Suguru Arimoto1
TL;DR: A systematic and iterative method of computing the capacity of arbitrary discrete memoryless channels is presented and a few inequalities that give upper and lower bounds on the capacity are derived.
Abstract: A systematic and iterative method of computing the capacity of arbitrary discrete memoryless channels is presented. The algorithm is very simple and involves only logarithms and exponentials in addition to elementary arithmetical operations. It has also the property of monotonic convergence to the capacity. In general, the approximation error is at least inversely proportional to the number of iterations; in certain circumstances, it is exponentially decreasing. Finally, a few inequalities that give upper and lower bounds on the capacity are derived.

854 citations


Journal ArticleDOI
TL;DR: It is shown that matrices consisting of mutually orthogonal complementary sets of sequences can be used as operators so as to per form transformations and inverse transformations on a one- or two-dimensional array of real time or spatial functions.
Abstract: A set of equally long finite sequences, the elements of which are either + 1 or - 1, is said to be a complementary set of sequences if the sum of autocorrelation functions of the sequences in that set is zero except for a zero-shift term. A complementary set of sequences is said to be a mate of another set if the sum of the cross-correlation functions of the corresponding sequences in these two sets is zero everywhere. Complementary sets of sequences are said to be mutually orthogonal complementary sets if any two of them are mates to each other. In this paper we discuss the properties of such complementary sets of sequences. Algorithms for synthesizing new sets from a given set are given. Recursive formulas for constructing mutually orthogonal complementary sets are presented. It is shown that matrices consisting of mutually orthogonal complementary sets of sequences can be used as operators so as to per form transformations and inverse transformations on a one- or two-dimensional array of real time or spatial functions. The similarity between such new transformations and the Hadamard transformation suggests applications of such new transformations to signal processing and image coding.

626 citations


Journal ArticleDOI
TL;DR: This tutorial paper proves the Szegio theorem for the special case of finite-order Toeplitz matrices, which is both simple and intuitive and contains the important concepts involved in the most general case.
Abstract: Since covariance matrices of weakly stationary random processes are Toeplitz, much of the theory involving asymptotic results for such processes is simply the theory of the asymptotic behavior of Toeplitz forms. The fundamental theorem of this type is the Szego theorem on the asymptotic eigenvalue distribution of Toeplitz matrices. This theorem is often quoted but relatively little understood in the engineering literature. In this tutorial paper we prove the Szegio theorem for the special case of finite-order Toeplitz matrices. In this setting the mathematical sophistication of the classical proofs is not required and the proof is both simple and intuitive--yet it contains the important concepts involved in the most general case.

575 citations


Journal ArticleDOI
TL;DR: A further modification to Cover and Hart's nearest neighbor decision rule, the reduced nearest neighbor rule, is introduced and experimental results demonstrate its accuracy and efficiency.
Abstract: A further modification to Cover and Hart's nearest neighbor decision rule, the reduced nearest neighbor rule, is introduced. Experimental results demonstrate its accuracy and efficiency.

515 citations


Journal ArticleDOI
TL;DR: A definition of discrete Markovian random fields is formulated analogously to a definition for the continuous case given by Levy, and this theory is applied to the problem of obtaining spectral estimates of a two-dimensional field, given observation over a limited aperture.
Abstract: A definition of discrete Markovian random fields is formulated analogously to a definition for the continuous case given by Levy. This definition in the homogeneous Gaussian case leads to a difference equation that sets forth the state of the field in terms of its values on a band of minimum width P , where P is the order of the process. The state of the field at position (i,j) is given by the set of values of the nearest neighbors within distance P of the point (i,j) . Conversely, given a difference equation satisfying certain conditions relating to stability, there corresponds a homogeneous discrete Markov random field. This theory is applied to the problem of obtaining spectral estimates of a two-dimensional field, given observation over a limited aperture.

Journal ArticleDOI
TL;DR: A decoding procedure is given that corrects all errors guaranteed correctable by the asymptotic lower bound on d .
Abstract: For any rate R, 0 , a sequence of specific (n,k) binary codes with rate R_n > R and minimum distance d is constructed such that \begin{equation} \lim_{n \rightarrow \infty} \inf \frac{d}{n} \geq (1 - r ^{-1} R)H^{-1} (1 - r)> 0 \end{equation} (and hence the codes are asymptotically good), where r is the maximum of \frac{1}{2} and the solution of \begin{equation} R = \frac{r^2}{1 + \log_2 [1 - H^{-1}(1 - r)]}. \end{equation} The codes are extensions of the Reed-Solomon codes over GF(2^m) With a simple algebraic description of the added digits. Alternatively, the codes are the concatenation of a Reed-Solomon outer code of length N = 2^m - 1 with N distinct inner codes, namely all the codes in Wozeneraft's ensemble of randomly shifted codes. A decoding procedure is given that corrects all errors guaranteed correctable by the asymptotic lower bound on d . This procedure can be carried out by a simple decoder which performs approximately n^2 \log n computations.

Journal ArticleDOI
D. Foley1
TL;DR: The design-set error rate for a two-class problem with multivariate normal distributions is derived as a function of the sample size per class (N) and dimensionality (L) and is demonstrated to be an extremely biased estimate of either the Bayes or test- set error rate.
Abstract: In many practical pattern-classification problems the underlying probability distributions are not completely known. Consequently, the classification logic must be determined on the basis of vector samples gathered for each class. Although it is common knowledge that the error rate on the design set is a biased estimate of the true error rate of the classifier, the amount of bias as a function of sample size per class and feature size has been an open question. In this paper, the design-set error rate for a two-class problem with multivariate normal distributions is derived as a function of the sample size per class (N) and dimensionality (L) . The design-set error rate is compared to both the corresponding Bayes error rate and the test-set error rate. It is demonstrated that the design-set error rate is an extremely biased estimate of either the Bayes or test-set error rate if the ratio of samples per class to dimensions (N/L) is less than three. Also the variance of the design-set error rate is approximated by a function that is bounded by 1/8N .

Journal ArticleDOI
TL;DR: The structure and performance of a class of nonlinear detectors for discrete-time signals in additive white noise are investigated and three general classes of symmetric, unimodal, univariate probability density functions are introduced that are generalizations of the Gaussian, Cauchy, and beta distributions.
Abstract: The structure and performance of a class of nonlinear detectors for discrete-time signals in additive white noise are investigated. The detectors considered consist of a zero-memory nonlinearity (ZNL) followed by a linear filter whose output is compared with a threshold. That this class of detectors is a reasonable one to study is apparent from the fact that both the Neyman-Pearson optimum and the locally optimum (i.e., weak-signal optimum) detectors for statistically independent noise samples can be put into this form. The measure of detector performance used is the asymptotic relative efficiency (ARE) of the nonlinear detector under study with respect to a linear detector appropriate for the same detection problem. A general expression for this ARE is given along with the result that the non-linearity maximizing this expression is any linear function of the nonlinearity in the appropriate constant-signal locally optimum detector. To illustrate the structure and performance of these nonlinear detectors for a wide range of non-Gaussian noise distributions, three general classes of symmetric, unimodal, univariate probability density functions are introduced that are generalizations of the Gaussian, Cauchy, and beta distributions.

Journal ArticleDOI
TL;DR: It is shown that the leading term in the variance computations is identical with the Cramer-Rao lower bound calculated using the Fisher information matrix.
Abstract: We are interested in estimating the moments of the spectral density of a comp[ex Gaussian signal process \{ q^{(1)} (t) \} when the signal process is immersed in independent additive complex Gaussian noise \{q^{(2)} (t) \} . Using vector samples Q = \{ q(t_1),\cdots ,q(t_m)\} , where q(t) = q^{(1)}(t) + q^{(2)}(t) , estimators for determining the spectral moments or parameters of the signal-process power spectrum may be constructed. These estimators depend upon estimates of the covariance function R_1 (h) of the signal process at only one value of h eq 0 . In particular, if m = 2 , these estimators are maximum-likelihood solutions. (The explicit solution of the likelihood equations for m > 2 is still an unsolved problem.) using these solutions, asymptotic (with sample size) formulas for the means and variances of the spectral mean frequency and spectral width are derived. It is shown that the leading term in the variance computations is identical with the Cramer-Rao lower bound calculated using the Fisher information matrix. Also considered is the case Where the data set consists of N samples Of continuous data, each of finite duration. In this case asymptotic (with N ) formulas are also derived for the means and variances of the spectral mean frequency and spectral width.

Journal ArticleDOI
TL;DR: This work derives a simple algorithm for the ranking of binary sequences of length n and weight w and uses it for source encoding a memoryless binary source that generates O's and l's with probability p = 1 - q.
Abstract: We derive a simple algorithm for the ranking of binary sequences of length n and weight w . This algorithm is then used for source encoding a memoryless binary source that generates O's with probability q and l's with probability p = 1 - q .

Journal ArticleDOI
TL;DR: It is shown that the causal minimum-mean-square-error estimate of the stochastic intensity is incorporated in the optimum Reiffen-Sherman detector in the same way as if it were known.
Abstract: Equations are derived that describe the time evolution of the posterior statistics of a general Markov process that modulates the intensity function of an observed inhomogeneous Poisson counting process. The basic equation is a stochastic differential equation for the conditional characteristic function of the Markov process. A separation theorem is established for the detection of a Poisson process having a stochastic intensity function. Specifically, it is shown that the causal minimum-mean-square-error estimate of the stochastic intensity is incorporated in the optimum Reiffen-Sherman detector in the same way as if it were known. Specialized results are obtained when a set of random variables modulate the intensity. These include equations for maximum a posteriori probability estimates of the variables and some accuracy equations based on the Cramer-Rao inequality. Procedures for approximating exact estimates of the Markov process are given. A comparison by simulation of exact and approximate estimates indicates that the approximations suggested can work well even under low count rate conditions.

Journal ArticleDOI
TL;DR: The weight distribution of all 2^26 cosets of the (32,6) first-order Reed-Muller code is presented, and this equivalent problem: how well are the 2^32 Boolean functions of five variables approximated by the2^5 linear functions and their complements?
Abstract: In this paper we present the weight distribution of all 2^26 cosets of the (32,6) first-order Reed-Muller code. The code is invariant under the complete affine group, of order 32 \times 31 \times 30 \times 28 \times 24 \times 16. In the Appendix we show (by hand computations) that this group partitions the 2^26 cosets into only 48 equivalence classes, and we obtain the number of cosets in each class. A simple computer program then enumerated the weights of the 32 vectors ih each of the 48 cosets. These coset enumerations also answer this equivalent problem: how well are the 2^32 Boolean functions of five variables approximated by the 2^5 linear functions and their complements?

Journal ArticleDOI
H. Ogura1
TL;DR: It is shown that any nonlinear functional of the Poisson process with finite variance can be developed in terms of these orthogonal functionals, corresponding to the Cameron-Martin theorem in the case of the Brownian-motion process.
Abstract: In analogy to the orthogonal functionals of the Brownian-motion process developed by Wiener, lto, and others, a theory of the orthogonal functionals of the Poisson process is presented making use of the concept of multivariate orthogonal polynomials. Following a brief discussion of Charlier polynomials of a single variable, multivariate Charlier polynomials are introduced. An explicit representation as well as an orthogonality property are given. A multiple stochastic integral of a multivariate function with respect to the Poisson process, called the multiple Poisson-Wiener integral, is defined using the multivariate Charlier polynomials. A multiple Poisson-Wiener integral, which gives a polynomial functional of the Poisson process, is orthogonal to any other of different degree. Several explicit forms are given for the sake of application. It is shown that any nonlinear functional of the Poisson process with finite variance can be developed in terms of these orthogonal functionals, corresponding to the Cameron-Martin theorem in the case of the Brownian-motion process. Finally, some possible applications to nonlinear problems associated with the Poisson process are briefly discussed.

Journal ArticleDOI
TL;DR: It is shown in this paper that these seemingly different approaches to efficient digitization of analog data actually are equivalent in the sense that their optimum rate versus distortion performances are identical, which permits us to deduce that permutation codes provide a readily implementable block-coding alternative to buffer-instrumented variable-length codes.
Abstract: Amplitude quantization and permutation encoding are two of the many approaches to efficient digitization of analog data. It is shown in this paper that these seemingly different approaches actually are equivalent in the sense that their optimum rate versus distortion performances are identical. Although this equivalence becomes exact only when the quantizer output is perfectly entropy coded and the permutation code block length is infinite, it nonetheless has practical consequences both for quantization and for permutation encoding. In particular, this equivalence permits us to deduce that permutation codes provide a readily implementable block-coding alternative to buffer-instrumented variable-length codes. Moreover, the abundance of methods in the literature for optimizing quantizers with respect to various criteria can be translated directly into algorithms for generating source permutation codes that are optimum for the same purposes. The optimum performance attainable with quantizers (hence, permutation codes) of a fixed entropy rate is explored too. The investigation reveals that quantizers with uniformly spaced thresholds are quasi-optimum with considerable generality, and are truly optimum in the mean-squared sense for data having either an exponential or a Laplacian distribution. An attempt is made to provide some analytical insight into why simple uniform quantization is so good so generally.

Journal ArticleDOI
TL;DR: Digital data-processing problems such as the detection of multiple echoes, various methods of linear filtering the complex cepstrum, the picket-fence phenomenon, minimum-maximum phase situations, and amplitude- versus phase-smoothing for the additive-noise case are examined empirically and where possible theoretically, and are discussed.
Abstract: A technique for decomposing a composite signal of unknown multiple wavelets overlapping in time is described. The computation algorithm incorporates the power cepstrum and complex cepstrum techniques. It has been found that the power cepstrum is most efficient in recognizing wavelet arrival times and amplitudes while the complex cepstrum is invaluable in estimating the form of the basic wavelet and its echoes, even if the latter are distorted. Digital data-processing problems such as the detection of multiple echoes, various methods of linear filtering the complex cepstrum, the picket-fence phenomenon, minimum-maximum phase situations, and amplitude- versus phase-smoothing for the additive-noise case are examined empirically and where possible theoretically, and are discussed. A similar investigation is performed for some of the preceding problems when the echo or echoes are distorted versions of the wavelet, thereby giving some insight into the complex problem of separating a composite signal composed of several additive stochastic processes. The threshold results are still empirical and the results should be extended to multi-dimensional data. Applications are the decomposition or resolution of signals (e.g., echoes) in radar and sonar, seismology, speech, brain waves, and neuroelectric spike data. Examples of results are presented for decomposition in the absence and presence of noise for specified signals. Results are tendered for the decomposition of pulse-type data appropriate to many systems and for the decomposition of brain waves evoked by visual stimulation.

Journal ArticleDOI
TL;DR: The analysis shows further that the "natural" choice of bias in the metric is the code rate and gives insight into why the Fano metric has proved to be the best practical choice in sequential decoding.
Abstract: It is shown that the metric proposed originally by Fano for sequential decoding is precisely the required statistic for minimum-error-probability decoding of variable-length codes. The analysis shows further that the "natural" choice of bias in the metric is the code rate and gives insight into why the Fano metric has proved to be the best practical choice in sequential decoding. The recently devised Jelinek-Zigangirov "stack algorithm" is shown to be a natural consequence of this interpretation of the Fano metric. Finally, it is shown that the elimination of the bias in the "truncated" portion of the code tree gives a slight reduction in average computation at the sacrifice of increased error probability.

Journal ArticleDOI
TL;DR: It is shown that classical invariant theory permits a straightforward and systematic proof of Gleason's theorems and their generalizations.
Abstract: Gleason has recently shown that the weight enumerators of binary and ternary self-dual codes are polynomials in two given polynomials. In this paper it is shown that classical invariant theory permits a straightforward and systematic proof of Gleason's theorems and their generalizations. The joint weight enumerator of two codes (analogous to the joint density function of two random variables) is defined and shown to satisfy a MacWilliams theorem. Invariant theory is then applied to generalize Gleason's theorem to the complete weight enumerator of self-dual codes over GF(3) , the Lee metric enumerator over GF(5) (given by Klein in 1884!) and over GF(7) (given by Maschke in 1893!), the Hamming enumerator over GF(q) , and over GF(4) with all weights divisible by 2, the joint enumerator of two self-dual codes over GF(2) , and a number of other results.

Journal ArticleDOI
TL;DR: The optimal detection scheme for signals of the random-telegraph type with unknown transition intensities is derived and a general likelihood-ratio formula for the detection of compound RPP's is obtained.
Abstract: A class of point processes that possess intensity functions are studied. The processes of this class, which seem to include most point processes of practical interest, are called regular point processes (RPP's). Expressions for the evolution of these processes and especially for their joint occurrence statistics are derived. Compound RPP's, which are RPP's whose intensity functions are themselves stochastic processes, are shown to be RPP's whose intensity functions are given as the causal minimum mean-squared-error (MMSE) estimates of the given intensity functions. The superposition of two independent RPP's is shown to yield an RPP whose intensity is given as a causal least squares estimate of the appropriate combination of the two given intensity functions. A general likelihood-ratio formula for the detection of compound RPP's is obtained. Singular detection cases are characterized. Detection procedures thai use only the total number of counts are discussed. As an example, the optimal detection scheme for signals of the random-telegraph type with unknown transition intensities is derived.

Journal ArticleDOI
TL;DR: In this paper constructions are given for combining two, three, or four codes to obtain new codes, many of which have more codewords than any comparable code previously known to us.
Abstract: In this paper constructions are given for combining two, three, or four codes to obtain new codes. The Andryanov-Saskovets construction is generalized. It is shown that the Preparata double-error-correcting codes may be extended by about (block length) ^{1/2} symbols, of which only one is a check symbol, and that e -error-correcting BCH codes may sometimes be extended by (block !ength) ^{1/e} symbols, of which only one is a check symbol. Several new families of linear and nonlinear double-error-correcting codes are obtained. Finally, an infinite family of linear codes is given with d/n = \frac{1}{3} , the first three being the (24,2^12, 8) Golay code, a (48,2^15, 16) code, and a (96,2^18, 32) code. Most of the codes given have more codewords than any comparable code previously known to us.

Journal ArticleDOI
TL;DR: Some asymptotic properties of the k_n nearest neighbor classification rule are studied including an expression for a consistent upper bound on the probability of misclassification.
Abstract: The k_n nearest neighbor classification rule is a nonparametric classification procedure that assigns a random vector Z to one of two populations \pi_1, \pi_2 . Samples of equal size n are taken from \pi_1 and \pi_2 and are ordered separately with respect to their distance from Z = z . The rule assigns Z to \pi_1 if the distance of the k_n th sample observation from \pi_1 to z is less than the distance of the k_n th sample observation from \pi_2 to z ; otherwise Z is assigned to \pi_2 . This rule is equivalent to the Fix and Hodges, "majority rule" [4] or the nearest neighbor rule of Cover and Hart [3]. This paper studies some asymptotic properties of this rule including an expression for a consistent upper bound on the probability of misclassification.

Journal ArticleDOI
TL;DR: It is shown that optimum encoding of a source permutation code is easy to instrument even for very long block lengths, and that permutation codes are asymptotically ideal for small rates and perform as well as the best entropy-coded quantizers presently known for intermediate rates.
Abstract: Source encoding techniques based on permutation codes are investigated. For a broad class of distortion measures it is shown that optimum encoding of a source permutation code is easy to instrument even for very long block lengths. Also, the nonparametric nature of permutation encoding is well suited to situations involving unknown source statistics. For the squared-error distortion measure a procedure for generating good permutation codes of a given rate and block length is described. The performance of such codes for a memoryless Gaussian source is compared both with the rate-distortion function bound and with the performance of various quantization schemes. The comparison reveals that permutation codes are asymptotically ideal for small rates and perform as well as the best entropy-coded quantizers presently known for intermediate rates. They can be made to compare favorably at high rates, too, provided the coding delay associated with extremely long block lengths is tolerable.

Journal ArticleDOI
TL;DR: It turns out that the optimum error exponents of variable-length- to-block coding are identical with those of block-to-variable-length coding and are related in an interesting way to Renyi's generalized entropy function.
Abstract: Variable-length-to-block codes are a generalization of run-length codes. A coding theorem is first proved. When the codes are used to transmit information from fixed-rate sources through fixed-rate noiseless channels, buffer overflow results. The latter phenomenon is an important consideration in the retrieval of compressed data from storage. The probability of buffer overflow decreases exponentially with buffer length and we determine the relation between rate and exponent size for memoryless sources. We obtain codes that maximize the overflow exponent for any given transmission rate exceeding the source entropy and present asymptotically optimal coding algorithms whose complexity grows linearly with codeword length. It turns out that the optimum error exponents of variable-length-to-block coding are identical with those of block-to-variable-length coding and are related in an interesting way to Renyi's generalized entropy function.

Journal ArticleDOI
TL;DR: A detailed analysis of the Gaussian case resulted in a type of correlation detector, which correlates the received data with the least square estimators of the possible signals in the absence of uncertainty, which can be interpreted as generalized maximum-likelihood estimators.
Abstract: Simultaneous detection and estimation under multiple hypotheses when data from only one observation interval are available, are treated on the basis of statistical decision theory. Estimation is carried out under the assumption that the signal of interest is not present with probability 1, which is necessary if detection is to be a meaningful operation. Also, we consider the case where the operations of detection and estimation are coupled. Specific detector and estimator structures are determined for the case of strong coupling when the cost of estimation error is given by a quadratic function. The detector structures are in general complex nonlinear functions of the received data. However, a detailed analysis of the Gaussian case resulted in a type of correlation detector, which correlates the received data with the least square estimators of the possible signals in the absence of uncertainty. The associated optimum estimator structure is found to be a weighted sum of least square estimators in the absence of uncertainty. Also, joint detection and estimation under multiple hypotheses is discussed for the case of a simple cost function. The estimators that result can be interpreted as generalized maximum-likelihood estimators. Finally, optimum prediction and filtering are briefly considered.

Journal ArticleDOI
TL;DR: TheAverage distortion that is obtained with universal coding schemes that are independent of the source statistics is shown to be asymptotically identical to the smallest average distortion that can be achieved with the best individual coding scheme.
Abstract: The encoding of sources with unknown statistics is considered. The average distortion that is obtained with universal coding schemes that are independent of the source statistics is shown to be asymptotically identical to the smallest average distortion that can be achieved with the best individual coding scheme (i.e., a code that is based on the specific statistics of the source). This result is shown to hold for any stationary source, as well as for a class of nonstationary sources. The discussion is limited to a certain important class of metric spaces.