scispace - formally typeset
Search or ask a question

Showing papers in "IEEE Transactions on Information Theory in 1970"


Journal ArticleDOI
TL;DR: An optimum rejection rule is described and a general relation between the error and reject probabilities and some simple properties of the tradeoff in the optimum recognition system are presented.
Abstract: The performance of a pattern recognition system is characterized by its error and reject tradeoff. This paper describes an optimum rejection rule and presents a general relation between the error and reject probabilities and some simple properties of the tradeoff in the optimum recognition system. The error rate can be directly evaluated from the reject function. Some practical implications of the results are discussed. Examples in normal distributions and uniform distributions are given.

841 citations


Journal ArticleDOI
TL;DR: Minimal encoders are shown to be immune to catastrophic error propagation and, in fact, to lead in a certain sense to the shortest decoded error sequences possible per error event.
Abstract: A convolutional encoder is defined as any constant linear sequential circuit. The associated code is the set of all output sequences resulting from any set of input sequences beginning at any time. Encoders are called equivalent if they generate the same code. The invariant factor theorem is used to determine when a convolutional encoder has a feedback-free inverse, and the minimum delay of any inverse. All encoders are shown to be equivalent to minimal encoders, which are feedback-free encoders with feedback-free delay-free inverses, and which can be realized in the conventional manner with as few memory elements as any equivalent encoder, Minimal encoders are shown to be immune to catastrophic error propagation and, in fact, to lead in a certain sense to the shortest decoded error sequences possible per error event. In two appendices, we introduce dual codes and syndromes, and show that a minimal encoder for a dual code has exactly the complexity of the original encoder; we show that systematic encoders with feedback form a canonical class, and compare this class to the minimal class.

801 citations


Journal ArticleDOI
TL;DR: In this correspondence, a formula for the phase angles is derived that yields generally low peak factors, often comparable to that of a sinusoidal signal of equal power.
Abstract: This correspondence considers the problem of how to adjust the phase angles of a periodic signal with a given power spectrum to minimize its peak-to-peak amplitude. This "peak-factor problem" arises in radar, sonar, and numerous other applications. However, in spite of the wide-spread interest it has evoked, the peak-factor problem has so far defied solution except in cases where the number of spectral components is small enough to permit an effectively exhaustive search of all phase angle combinations. In this correspondence, a formula for the phase angles is derived that yields generally low peak factors, often comparable to that of a sinusoidal signal of equal power. A formula is also derived for the case in which the phase angles are restricted to 0 and \pi . The latter formula is applicable to the problem of constructing binary sequences of arbitrary length with Iow autocorrelation coefficients for nonzero shifts.

724 citations


Journal ArticleDOI
TL;DR: Relationships between the probability of error, the equivocation, and the Chernoff bound are examined for the two-hypothesis decision problem and the results are extended to the case of any finite number of hypotheses.
Abstract: Relationships between the probability of error, the equivocation, and the Chernoff bound are examined for the two-hypothesis decision problem. The effect of rejections on these bounds is derived. Finally, the results are extended to the case of any finite number of hypotheses.

357 citations


Journal ArticleDOI
TL;DR: The criterion for optimality is that the normalized average of the squared norm of the difference between the T - second undistorted source sample and the corresponding T -second sample delivered to the final destination be minimum.
Abstract: This paper is concerned with the transmission of information with a fidelity criterion where the source output may be distorted prior to encoding and, furthermore, where the output of the decoder may be distorted prior to its delivery to the final destination. The criterion for optimality is that the normalized average of the squared norm of the difference between the T - second undistorted source sample and the corresponding T -second sample delivered to the final destination be minimum. The optimal structure of the encoder and decoder is derived for any T .

235 citations


Journal ArticleDOI
TL;DR: The positive coding theorem is proved for the possibly nonstationary Gaussian autoregressive source with a constraint on the parameters and it is shown that the rate distortion function of any time-discrete autore progressive source can be bounded below by the rate distortions of the independent-letter identically distributed generating sequence with the same distortion measure.
Abstract: The rate distortion function R(D) is calculated for two time-discrete autoregressive sources--the time-discrete Gaussian autoregressive source with a mean-square-error fidelity criterion and the binary-symmetric first-order Markov source with an average probability-of-error per bit fidelity criterion. In both cases it is shown that R(D) is bounded below by the rate distortion function of the independent-letter identically distributed sequence that generates the autoregressive source. This lower bound is shown to hold with equality for a nonzero region of small average distortion. The positive coding theorem is proved for the possibly nonstationary Gaussian autoregressive source with a constraint on the parameters. Finally, it is shown that the rate distortion function of any time-discrete autoregressive source with a difference distortion measure can be bounded below by the rate distortion function of the independent-letter identically distributed generating sequence with the same distortion measure.

147 citations


Journal ArticleDOI
TL;DR: It is shown that any spectrum whatsoever can be recovered if \{t_n\} is a Poisson point process on the positive (or negative) half-axis and randomly jittered sampling at the Nyquist rate is alias free.
Abstract: The notion of alias-free sampling is generalized to apply to random processes x(t) sampled at random times t_n ; sampling is said to be alias free relative to a family of spectra if any spectrum of the family can be recovered by a linear operation on the correlation sequence \{r(n)\} , where r(n) = E[x(l_{m+n}) \overline{x(t_m)}] . The actual sampling times t_n need not be known to effect recovery of the spectrum of x(t) . Various alternative criteria for verifying alias-free sampling are developed. It is then shown that any spectrum whatsoever can be recovered if \{t_n\} is a Poisson point process on the positive (or negative) half-axis. A second example of alias-free sampling is provided for spectra on a finite interval by periodic sampling (for t \leq t_o or t \geq t_o ) in which samples are randomly independently skipped (expunged), such that the average sampling rate is an arbitrarily small fraction of the Nyquist rate. A third example shows that randomly jittered sampling at the Nyquist rate is alias free. Certain related open questions are discussed. These concern the practical problems involved in estimating a spectrum from imperfectly known \{ r(n) \} .

137 citations



Journal ArticleDOI
TL;DR: This paper suggests a learning scheme, "learning with a probabilistic teacher," which works with unclassified samples and is computationally feasible for many practical problems.
Abstract: The Bayesian learning scheme is computationally infeasible for most of the unsupervised learning problems. This paper suggests a learning scheme, "learning with a probabilistic teacher," which works with unclassified samples and is computationally feasible for many practical problems. In this scheme a sample is probabilistically assigned with a class with appropriate probabilities computed using all the information available: Then the sample is used in learning the parameter values given this assignment of the class. The convergence of the scheme is established and a comparison with the best linear estimator is presented.

121 citations


Journal ArticleDOI
TL;DR: This theorem completely characterizes the codewords of the u th-order Reed-Muller code whose weights are less than twice the minimum weight and leads to the weight enumerators for thosecodewords.
Abstract: The following theorem is proved. Let f(x_1,\cdots, x_m) be a binary nonzero polynomial of m variables of degree u . H the number of binary m -tuples (a_1,\cdots, a_m) with f(a_1, \cdots, a_m) = 1 is less than 2^{m- u+1} , then f can be reduced by an invertible affme transformation of its variables to one of the following forms. \begin{equation} f = y_1 \cdots y_{ u - \mu} (y_{ u-\mu+1} \cdots y_{ u} + y_{ u+1} \cdots y_{ u+\mu}), \end{equation} where m \geq u+\mu and u \geq \mu \geq 3 . \begin{equation} f = y_1 \cdots y_{ u-2}(y_{ u-1} y_{ u} + y_{ u+1} y_{ u+2} + \cdots + y_{ u+2\mu -3} y_{ u+2\mu-2}), \end{equation} This theorem completely characterizes the codewords of the u th-order Reed-Muller code whose weights are less than twice the minimum weight and leads to the weight enumerators for those codewords. These weight formulas are extensions of Berlekamp and Sloane's results.

110 citations


Journal ArticleDOI
TL;DR: An interleaved fading channel whose state is known to the receiver is analyzed and the reliability function E(R) is obtained for rates R in the range R_c\leq R \leq C.
Abstract: An interleaved fading channel whose state is known to the receiver is analyzed. The reliability function E(R) is obtained for rates R in the range R_c \leq R \leq C . The capacity is shown to be C = E_A \{ \frac{1}{2} \ln (1 + A^2 n)\} where A is a factor describing the fading mechanism and u is the signal-to-noise ratio per dimension.


Journal ArticleDOI
TL;DR: This paper solves the classical two-armed-bandit problem under the finite-memory constraint described below and shows the optimal value of r, over all m -state algorithms (f, e), will be shown to be l-1 / (l-1 + 1) .
Abstract: This paper solves the classical two-armed-bandit problem under the finite-memory constraint described below. Given are probability densities p_0 and p_1 , and two experiments A and B . It is not known which density is associated with which experiment. Thus the experimental outcome Y of experiment A is as likely to be distributed according to p_0 as it is to be distributed according to p_1 . It is desired to sequentially choose an experiment to be performed on the basis of past observations according to the algorithm T_n = f(T_{n-1}, e_n, Y_n), e_n = e(T_{n-1}) , where T_n \in \{1, 2, \cdots, m\} is the state of memory at time n, e_n \in \{A, B\} is the choice of experiment, and Y_n , is the random variable observation. The goal is to maximize the asymptotic proportion r of uses of the experiment associated with density p_0 . Let l(y) = p_0 (y) / p_1 (y) , and let \bar{l} and \bar{\bar{l}} denote the almost everywhere greatest lower bound and least upper bound on l(y) . Let 1 = \max {\bar{\bar{l}}, 1/\bar{l}} . Then the optimal value of r , over all m -state algorithms (f, e) , will be shown to be l^{m-1} / (l^{m-1} + 1) . An e -optimal family of m -state algorithms will be demonstrated. In general, optimal algorithms do not exist, and e -optimal algorithms require artificial randomization.

Journal ArticleDOI
TL;DR: It is shown how the Karhunen-Loeve (K-L) series representation for a finite sample of a discrete random sequence, stationary to the second order, may be further decomposed into a pair of series by utilizing certain symmetry properties of the covariance matrix of the sequence.
Abstract: It is shown how the Karhunen-Loeve (K-L) series representation for a finite sample of a discrete random sequence, stationary to the second order, may be further decomposed into a pair of series by utilizing certain symmetry properties of the covariance matrix of the sequence. The theory is applied to the particular example of a first-order Markov sequence, the series representation of which has not so far been reported in the literature. The generalization to the case of continuous random functions on a finite interval is similar and is therefore only briefly described.

Journal ArticleDOI
T. Berger1
TL;DR: By incorporating a delta modulation scheme in the random coding argument, this work proves a source coding theorem that guarantees the R(D) curves are physically significant for information transmission purposes even though Wiener processes are nonstationary.
Abstract: Rate distortion functions are calculated for time discrete and time continuous Wiener processes with respect to the mean squared error criterion. In the time discrete case, we find the interesting result that, for 0 \leq D \leq \sigma^2 /4 , R(D) for the Wiener process is identical to R(D) for the sequence of zero mean independent normally distributed increments of variance \sigma^2 whose partial sums form the Wiener process. In the time continuous case, we derive the explicit formula R(D) = 2 \sigma^2 / ( \pi^2 D) , where \sigma^2 is the variance of the increment daring a one-second interval. The resuiting R(D) curves are compared with the performance of an optimum integrating delta modulation system. Finally, by incorporating a delta modulation scheme in the random coding argument, we prove a source coding theorem that guarantees our R(D) curves are physically significant for information transmission purposes even though Wiener processes are nonstationary.

Journal ArticleDOI
TL;DR: Lower bounds on the average distortion produced by noise for a class of distortion functions are derived and relate the "smoothness" of modulation techniques to the minimum error that can be achieved with them.
Abstract: We consider the problem of transmission of analog data over a noisy channel. It is assumed that the channel input is of the form \surd S f(t, X) , where X is an n -dimensional source vector, and S is the allowable transmitted power. The performance of any given modulation scheme f(t, \cdot ) as a function of the transmitted power S is studied. Lower bounds on the average distortion produced by noise for a class of distortion functions are derived. These bounds relate the "smoothness" of modulation techniques to the minimum error that can be achieved with them. It is shown that when the analog source emits a sequence of mutually independent real random variables at a rate of R per second, the mean-square error that is associated with any practical modulation scheme f(t, \cdot) decays no faster than S^{-2} as the signal power S \rightarrow \infty . It follows that in the case of a band-limited additive white Gaussian channel no single modulation scheme f(t, \cdot ) can achieve the ideal rate-distortion bound on the mean-square error for all values of S , if the channel bandwidth is larger than the source rate R .

Journal ArticleDOI
TL;DR: The general problem of determining the photoelectron "counting" distribution resulting from an electromagnetic field impinging on a quantum detector is formulated and various limiting forms of this distribution are derived, including the necessary conditions for those commonly accepted.
Abstract: In this paper we formulate the general problem of determining the photoelectron "counting" distribution resulting from an electromagnetic field impinging on a quantum detector. Although the detector model used was derived quantum mechanically, our treatment is wholly classical and includes all results known to date. This combination is commonly referred to as the semiclassical approach. The emphasis, however, lies in directing the problem towards optical communication. The electromagnetic field is assumed to be the sum of a deterministic signal and a zero-mean narrow-band Gaussian random process, and is expanded in a Karhunen-Loeve series of orthogonal functions. Several examples are given. It is shown that all the results obtainable can be written explicitly in terms of the noise covariance function. Particular attention is given to the case of a signal plus white Gaussian noise, both of which are band-limited to \pm B Hz. Since the result is a fundamental one, to add some physical insight, we show four methods by which it can be obtained. Various limiting forms of this distribution are derived, including the necessary conditions for those commonly accepted. The likelihood functional is established and is shown to be the product of Laguerre polynomials. For the problem of continuous estimation, the Fisher information kernel is derived and an important limiting form is obtained. The maximum a posteriori (MAP) and maximum-likelihood (ML) estimation equations are also derived. In the latter case the results are also functions of Laguerre polynomials.

Journal ArticleDOI
TL;DR: It is shown that the asymptotic behavior of the tail of the sum distribution of a finite number of individually log- normal distributed variates displays the log-normal character of those variates with maximum logarithmic variance.
Abstract: It is shown that the asymptotic behavior of the tail of the sum distribution of a finite number of individually log-normal distributed variates displays the log-normal character of those variates with maximum logarithmic variance. A quantitative definition of the sum-distribution "tail" is established in terms of upper bounds on the relative error or deviation from the asymptotic log-normal property.

Journal ArticleDOI
TL;DR: The important class of polynomial codes, introduced by Kasami, Lin, and Peterson, satisfy the conditions and several new results are obtained about some of these codes.
Abstract: Using a polynomial approach to linear codes, one derives necessary and sufficient conditions for cyclic q -ary codes to be invariant under the general linear group of substitutions over an extension field of GF(q) . The important class of polynomial codes, introduced by Kasami, Lin, and Peterson, satisfy the conditions and several new results are obtained about some of these codes.

Journal ArticleDOI
TL;DR: It is shown that nonsingular detection problems of this form can always be interpreted as problems of the apparently more special "signal-in-noise" type, where the cross-covariance function of the signal and noise must be of a special "one-sided" form.
Abstract: We give a comprehensive discussion of the structure of the likelihood ratio (LR) for discrimination between two Gaussian processes, one of which is white. Several more general problems can be reduced, usually by differentiation, to this form. We shall show that nonsingular detection problems of this form can always be interpreted as problems of the apparently more special "signal-in-noise" type, where the cross-covariance function of the signal and noise must be of a special "one-sided" form. Moreover, the LR for this equivalent problem can be written in the same form as that for known signals in white Gaussian noise, with the causal estimate of the signal process replacing the known signal. This single formula will be shown to be equivalent to a variety of other formulas, including all those previously known. The proofs are based on a resolvent identity and on a representation theorem for second-order processes, both of which have other applications. This paper also contains a discussion of the various stochastic integrals and infinite determinants that arise in Gaussian detection problems

Journal ArticleDOI
TL;DR: A stochastic approximation algorithm is developed for estimating a mixture of normal density functions with unknown means and unknown variances that minimizes an information criterion that has interesting properties for density approximations.
Abstract: A stochastic approximation algorithm is developed for estimating a mixture of normal density functions with unknown means and unknown variances. The algorithm minimizes an information criterion that has interesting properties for density approximations. The conditions on the convergence of this nonlinear estimation algorithm are discussed, and a numerical example is presented.

Journal ArticleDOI
TL;DR: A construction is given to construct a new family of nongroup single-error correcting codes of all lengths n from 2^m to 3 · 2^{m-1} - 1 , for every m \geq 3 .
Abstract: A construction is given that combines an (n, M_1, d_1) code with an (n, M_2, d_2 = [\frac{1}{2}(d_1 + 1)]) code to form a (2n, M_1 M_2, d_1) code. This is used to construct a new family of nongroup single-error correcting codes of all lengths n from 2^m to 3 · 2^{m-1} - 1 , for every m \geq 3 . These codes have more codewords than any group code of the same length and minimum distance. A number of other nongroup codes are also obtained. Examples of the new codes are (16,2560,3) and (16,36,7) codes, both having more codewords than any comparable group code.

Journal ArticleDOI
TL;DR: The study of the running time of programs for computing infinite sets of natural numbers leads to an arithmetic of computers, which is a distributive lattice.
Abstract: Two practical considerations concerning the use of computing machines are the amount of information that must be given to the machine for it to perform a given task and the time it takes the machine to perform it. The size of programs and their running time are studied for mathematical models of computing machines. The study of the amount of information (i.e., number of bits) in a computer program needed for it to put out a given finite binary sequence leads to a definition of a random sequence; the random sequences of a given length are those that require the longest programs. The study of the running time of programs for computing infinite sets of natural numbers leads to an arithmetic of computers, which is a distributive lattice.


Journal ArticleDOI
TL;DR: It is demonstrated that identification theory implies unsupervised learning is possible in many important cases, and a general method is presented as inclusive as the one revealed here, which is effective for all the many cases wherein unsuper supervised learning is known to be possible.
Abstract: The first portion of this paper is tutorial. Beginning with a standard definition of an abstract pattern-recognition machine, "learning" is given a mathematical meaning and the distinction is made between supervised and unsupervised learning. The bibliography will help the interested reader retrace the history of learning in pattern recognition. The exposition now focuses attention on unsupervised learning. Carefully, it is explained how problems in this subject can be viewed as problems in the identification of finite mixtures, a statistical theory that has achieved some maturity. From this vantage point, it is demonstrated that identification theory implies unsupervised learning is possible in many important cases. The remaining sections present a general method for achieving unsupervised learning. Other authors have proposed schemes having greater computational convenience, but no method previously published is as inclusive as the one revealed here, which we demonstrate to be effective for all the many cases wherein unsupervised learning is known to be possible.

Journal ArticleDOI
TL;DR: The number of codewords of weight i in the second-order binary Reed-Muller code of length 2^m is given by A_i = 0 unless i = 2 m-1 or 2^{m-1} or 2-l-j, for some j, 0 \leq j \leqi [m/2], A_0 = A_{2^m} = 1.
Abstract: In this paper, we establish the following result. Theorem: A_i , the number of codewords of weight i in the second-order binary Reed-Muller code of length 2^m is given by A_i = 0 unless i = 2^{m-1} or 2^{m-1} \pm 2^{m-l-j} , for some j, 0 \leq j \leq [m/2], A_0 = A_{2^m} = 1 , and \begin{equation} \begin{split} A_{2^{m-1} \pm 2^{m-1-j}} = 2^{j(j+1)} &\{\frac{(2^m - 1) (2^{m-1} - 1 )}{4-1} \} \\ .&\{\frac{(2^{m-2} - 1)(2^{m-3} -1)}{4^2 - 1} \} \cdots \\ .&\{\frac{(2^{m-2j+2} -1)(2^{m-2j+1} -1)}{4^j -1} \} , \\ & 1 \leq j \leq [m/2] \\ \end{split} \end{equation} \begin{equation} A_{2^{m-1}} = 2 \{ 2^{m(m+1)/2} - \sum_{j=0}^{[m/2]} A_{2^{m-1} - 2^{m-1-j}} \}. \end{equation}

Journal ArticleDOI
TL;DR: The comparison shows that spherically invariant processes are slightly more general than Gaussian compound processes, and a simple expression for the probability distribution is given and some expectation values are calculated.
Abstract: This correspondence discusses the comparison between the class of spherically invariant processes and a particular class of Gaussian compound processes. We give a simple expression for the probability distribution and calculate some expectation values. The comparison shows that spherically invariant processes are slightly more general.

Journal ArticleDOI
TL;DR: Here, a weighted-square error-distortion measure is considered and the class of all stationary random processes that satisfy a certain strong mixing property are considered, that have zero mean, known power, and a bounded fourth moment.
Abstract: In certain situations, a single transmission system must be designed to function satisfactorily when used for any source from a class a of sources. In this situation, the rate-distortion function R_a (d) is the minimum capacity required by any transmission system that can transmit each source from a with average distortion \leq d . One of the most interesting classes of sources is a class of random processes. Here we consider a weighted-square error-distortion measure and the class of all stationary random processes that satisfy a certain strong mixing property, that have zero mean, known power, and a bounded fourth moment, and that satisfy one of the following alternative specifications on the spectrum: 1) the spectrum is known exactly; 2) the amount of power within the band 0 \leq f \leq f_k is known for N -- 1 frequencies f_1 \le f_2 \le \cdots \le f_{N-1} ; or 3) the fraction of power outside some frequency f_l is \leq 1 -- \gamma . For the class of sources determined by each of the above three cases and for an arbitrary error-weighting function we evaluate the rate-distortion function.

Journal ArticleDOI
TL;DR: Two approximations are considered: the first is a power-series expansion of the probablity-density function; the second is a discrete-time version of a previously proposed algorithm that assumes the conditional distribution is normal.
Abstract: Statistics conditioned on quantized measurements are considered in the general case. These results are specialized to Gaussian parameters and then extended to discrete-time linear systems. The conditional mean of the system's state vector may be found by passing the conditional mean of the measurement history through the Kalman filter that would be used had the measurements been linear. Repetitive use of Bayes' rule is not required. Because the implementation of this result requires lengthy numerical quadrature, two approximations are considered: the first is a power-series expansion of the probablity-density function; the second is a discrete-time version of a previously proposed algorithm that assumes the conditional distribution is normal. Both algorithms may be used with any memory length on stationary or nonstationary data. The two algorithms are applied to the noiseless-channel versions of the PCM, predictive quantization, and predictive-comparison data compression systems; ensemble-average performance estimates of the nonlinear filters are derived. Simulation results show that the performance estimates are quite accurate for most of the cases tested.

Journal ArticleDOI
TL;DR: A computational procedure, based on information theory, is used to obtain the maximum entropy estimates of p(x) in a number of cases.
Abstract: The problem of estimating a probability density function p(x) over [0, \infty) when several low-order moments are known is considered. A computational procedure, based on information theory, is used to obtain the maximum entropy estimates of p(x) in a number of cases. The situation when the first two moments only are known is considered in some detail. A table is included for estimating p(x) when given \mu_1 ^{\prime} , \mu_2 ^{\prime} with \mu_2 ^{\prime} \leq 2(\mu_1 ^{\prime}) ^ 2 .