# Showing papers in "IEEE Transactions on Information Theory in 1967"

••

TL;DR: The nearest neighbor decision rule assigns to an unclassified sample point the classification of the nearest of a set of previously classified points, so it may be said that half the classification information in an infinite sample set is contained in the nearest neighbor.

Abstract: The nearest neighbor decision rule assigns to an unclassified sample point the classification of the nearest of a set of previously classified points. This rule is independent of the underlying joint distribution on the sample points and their classifications, and hence the probability of error R of such a rule must be at least as great as the Bayes probability of error R^{\ast} --the minimum probability of error over all decision rules taking underlying probability structure into account. However, in a large sample analysis, we will show in the M -category case that R^{\ast} \leq R \leq R^{\ast}(2 --MR^{\ast}/(M-1)) , where these bounds are the tightest possible, for all suitably smooth underlying distributions. Thus for any number of categories, the probability of error of the nearest neighbor rule is bounded above by twice the Bayes probability of error. In this sense, it may be said that half the classification information in an infinite sample set is contained in the nearest neighbor.

12,243 citations

••

TL;DR: The upper bound is obtained for a specific probabilistic nonsequential decoding algorithm which is shown to be asymptotically optimum for rates above R_{0} and whose performance bears certain similarities to that of sequential decoding algorithms.

Abstract: The probability of error in decoding an optimal convolutional code transmitted over a memoryless channel is bounded from above and below as a function of the constraint length of the code. For all but pathological channels the bounds are asymptotically (exponentially) tight for rates above R_{0} , the computational cutoff rate of sequential decoding. As a function of constraint length the performance of optimal convolutional codes is shown to be superior to that of block codes of the same length, the relative improvement increasing with rate. The upper bound is obtained for a specific probabilistic nonsequential decoding algorithm which is shown to be asymptotically optimum for rates above R_{0} and whose performance bears certain similarities to that of sequential decoding algorithms.

6,804 citations

••

916 citations

••

TL;DR: In this paper, the error-free run and error-cluster distributions are derived for binary communication channels using functions of finite-state Markov chains, and a simple class of models is investigated and shown experimentally to be capable of representing HF radio statistics.

Abstract: The characterization of binary communication channels using functions of finite-state Markov chains is considered. Two distributions which are relevant to code evaluation, i.e., the error-free run and error-cluster distributions, are derived. It is shown for an N -state model, partitioned into a group of k error-free states and N-k error states, that the general form of the error-free run distribution is the weighted sum of at most k exponentials, and that of the error-cluster distribution the weighted sum of at most N-k exponentials. As evidence of the capability of such models to characterize real communication channels, a simple class of models is investigated and shown experimentally to be capable of representing HF radio statistics.

438 citations

••

360 citations

••

TL;DR: The class of codes discussed in this paper has the property that its error-correction capability is described in terms of correcting errors in specific digits of a code word even though other digits in the code may be decoded incorrectly.

Abstract: The class of codes discussed in this paper has the property that its error-correction capability is described in terms of correcting errors in specific digits of a code word even though other digits in the code may be decoded incorrectly. To each digit of the code words is assigned an error protection level f_{i} . Then, if f errors occur in the reception of a code word, all digits which have protection f_{i} greater than or equal to f will be decoded correctly even though the entire code word may not be decoded correctly. Methods for synthesizing these codes are described and illustrated by examples. One method of synthesis involves combining the parity check matrices of two or more ordinary random error-correcting codes to form the parity check matrix of the new code. A decoding algorithm based upon the decoding algorithms of the component codes is presented. A second method of code generation is described which follows from the observation that for a linear code, the columns of the parity check matrix corresponding to the check positions must span the column space of the matrix. Upper and lower bounds are derived for the number of check digits required for such codes. The lower bound is based upon counting the number of unique syndromes required for a specified error-correction capability. The upper bound is the result of a constructive procedure for forming the parity check matrices of these codes. Tables of numerical values for the upper and lower bounds are presented.

359 citations

••

TL;DR: The present paper reinterprets spectral measurements without paradox, by introducing a concept to be called "conditional spectrum," and examples are given of functions ruled by chance, that have the observed "erratic" behavior and conditional spectral density.

Abstract: Noises in thin metallic fills, semiconductors, nerve tissues, and many other media, have measured spectral densities proportional to f^\theta - 2} withfthe frequency and \theta a constant 0 \leq \theta . The energy of these " f^{\theta-2} noises" behaves more "erratically" in time, than expected from functions subject to the Wiener-Khinchin spectral theory. Moreover, blind extrapolation of the " f^{\theta -2} law" to f=0 incorrectly suggests, when 0 \leq 1 , that the total energy is infinite ("infrared catastrophe"). The problems thus raised are of the greatest theoretical interest, and of the greatest practical importance in the design of electronic devices. The present paper reinterprets these spectral measurements without paradox, by introducing a concept to be called "conditional spectrum." Examples are given of functions ruled by chance, that have the observed "erratic" behavior and conditional spectral density. A conditional spectrum is obtained when a procedure, meant to measure a sample Wiener-Khinchin spectrum, is applied to a sample conditioned to be nonconstant. The conditional spectrum is defined, not only for nonconstant samples from all random functions of the Wiener-Khinchin theory, but also for nonconstant samples from certain nonstationary random functions, and for nonconstant samples from a new generalization of random functions, called "sporadic functions." The simplest sporadic functions, having a f^{-2} conditional spectral density, is a "direct current" with a single discontinuity uniformly distributed over - \infty . The other f^{\theta -2} noises to be described partake both of direct current and of white noise (the \theta =2 limit of f^{\theta - 2} noise), and continuously span the gap between these limits. In many cases, their noise energy can be said to be proportional to the square of their "dc" component. Empirical studies will be suggested, and the descriptive value of the concepts of dc component and of spectrum will be discussed.

334 citations

••

TL;DR: The problem to be considered in this paper is that of designing radar signals and receivers that are optimum for detecting a point target masked by a background of clutter returns and thermal noise.

Abstract: The problem to be considered in this paper is that of designing radar signals and receivers that are optimum for detecting a point target masked by a background of clutter returns and thermal noise. The problem of choosing an optimum signal when no constraints are placed on the type of signals allowed is discussed briefly, but the remainder of the paper is restricted to signals and receiver impulse responses that are uniformly spaced, phase and amplitude-tapered pulse trains. Expressions for the signal-to-interference ratio obtained when a signal is used with its matched filter (pm/) and with the optimum filter or clutter filter (\rho_{cf}) are then derived together with an explicit expression for the clutter filter. An iterative technique for maximizing \rho_{cf} is devised. This scheme has the useful property that it generates a sequence of signals whose \rho_{cf} 's form a monotone, nondecreasing sequence. This is followed by an application of the calculus of variations to derive the Euler equations for the stationary points of \rho_{cf} and \rho_{mf} . The form of the Euler equations suggests iterative techniques for their solution; in fact, the technique suggested for the solution of the Euler equation associated with \rho_{cf} is essentially the iterative technique that was described above.

180 citations

••

TL;DR: Performance characteristics of the optimal PAM systems, showing the mean-squared error versus the signal-to-noise ratio, are determined explicitly for several examples and are compared both with those of certain suboptimal systems and with the optimal performance theoretically attainable (OPTA).

Abstract: Intersymbol interference and additive noise are two common sources of distortion in data transmission systems. For pulse amplitude modulation (PAM) communication links, the combination of transmitter waveform and linear receiver that minimizes the mean-squared error arising from these sources is determined. An extension to include the effects of timing jitter is performed in a companion paper. Performance characteristics of the optimal PAM systems, showing the mean-squared error versus the signal-to-noise ratio, are determined explicitly for several examples. These characteristics are compared both with those of certain suboptimal systems and with the optimal performance theoretically attainable (OPTA), derived by combining Shannon's concepts of the capacity of a channel and the rate distortion function of a source. The optimal PAM systems are seen to perform very close to the OPTA for low signal-to-noise ratios. For high signal-to-noise ratios, however, the mean-squared error of optimal PAM systems decreases as the reciprocal of the signal-to-noise ratio, but the OPTA decreases more rapidly, except for band-limited channels. The performance of PAM systems can be improved at high signal-to-noise ratios by coding techniques. One such technique, called Shannon-Cantor coding, is discussed briefly.

180 citations

••

TL;DR: A new class of linear block codes, called self-orthogonal quasi-cyclic codes, is defined, and it is shown that the problem of designing these codes is equivalent to the problems of designing disjoint difference sets.

Abstract: A new class of linear block codes, called self-orthogonal quasi-cyclic codes, is defined. It is shown that the problem of designing these codes is equivalent to the problem of designing disjoint difference sets. As a result, several classes of optimal and near-optimal codes can be constructed analytically and other codes can be found by a computer-aided search procedure. A list of codes is given for practical values of minimum distance and efficiency. Two easily implemented decoding algorithms are described, and a Monte Carlo evaluation of the performance of several codes on the binary symmetric channel is presented. This evaluation shows that, when decoded with the better of the two algorithms, these codes perform nearly as well as the Bose-Chaudhuri-Hocquenghem (BCH) codes with the same minimum distance and efficiency in the cases examined. Although these codes must be long relative to the BCH codes, the low cost and lack of complexity of the equipment required to correct large numbers of errors should make them competitive for practical systems.

175 citations

••

TL;DR: A class of binary recurrent codes for correcting independent errors is given which has guaranteed error-limiting properties and the results of a computer simulation indicate that these codes perform better in some situations than other codes using threshold decoding.

Abstract: A class of binary recurrent codes for correcting independent errors is given which has guaranteed error-limiting properties. These codes can be simply decoded using threshold decoding, and will recover from any decoding error caused by either an uncorrectable transmission error or a temporary malfunction of the encoder or decoder. A number of such codes are given along with a synthesis procedure. The results of a computer simulation are given which indicate that these codes perform better in some situations than other codes using threshold decoding.

••

TL;DR: The performance of systems using sequential decoding is limited by the computational and buffer capabilities of the decoder, not by the probability of making a decoding error.

Abstract: In sequential decoding, the number of computations which the decoder must perform to decode the received digits is a random variable. In this paper, we derive a Paretian lower bound to the distribution of this random variable. We show that P [C > L] L^{-\rho} , where C is the number of computations which the sequential decoder must perform to decode a block of \Lambda transmitted bits, and is a parameter which depends on the channel and the rate of the code. Our bound is valid for all sequential decoding schemes and all discrete memoryless channels. In Section II we give an example of a special channel for which a Paretian bound can be easily derived. In Sections III and IV we treat the general channel. In Section V we relate this bound to the memory buffer requirements of real-time sequential decoders. In Section VI, we show that this bound implies that certain moments of the distribution of the computation per digit are infinite, and we determine lower bounds to the rates above which these moments diverge. In most cases, our bounds coincide with previously known upper bounds to rates above which the moments converge. We conclude that the performance of systems using sequential decoding is limited by the computational and buffer capabilities of the decoder, not by the probability of making a decoding error. We further note that our bound applies only to sequential decoding, and that, in certain special cases (Section II), algebraic decoding methods prove superior.

••

TL;DR: It is shown that the time average of a slowly-varying carrier strength can be estimated despite the presence of noise, and theoretical Cramer-Rao lower bounds on the standard deviation in unbiased estimation are determined.

Abstract: The separate estimation of carrier strength and of narrowband additive noise strength from N envelope samples is considered. The mathematical problem is one of the joint estimation of the two parameters in the Rice probability-density function. The estimation is useful, for example, in the radar analysis of targets containing both specular and distributed components. In the mathematical analysis, theoretical Cramer-Rao lower bounds on the standard deviation in unbiased estimation are first determined. Estimation procedures based upon the method of maximum likelihood and two simple methods of moments are then developed. An error analysis for each procedure is performed by Monte Carlo simulation. For the purpose of comparison, two direct averaging methods are also analyzed. Results are extended to the case where an additional noise is also present, but with known strength. It is also shown that the time average of a slowly-varying carrier strength can be estimated despite the presence of noise.

••

TL;DR: This paper presents a model suitable for many problems and evolves a solution in the form of a machine that "learns" to solve the problem without external aid, said to "learn without a teacher".

Abstract: An important problem in pattern recognition or signal detection is the recognition of a pattern that is completely characterized statistically except for a finite set of unknown parameters. If a machine is required to solve such a problem on a number of occasions, it is possible to take advantage of this repetition. One can design a machine that will extract more and more of the pertinent information about these unknown parameters as it recognizes the patterns and readjusts itself to be more selective to them; the machine improves in performance as it gains experience on the problem. This paper presents a model suitable for many such problems and evolves a solution in the form of a machine that "learns" to solve the problem without external aid. Such machines are said to "learn without a teacher." The Bayes solution to the model problem requires the computation of the a posteriori probability density of the unknown parameters. A recursive equation for this density is derived. This equation describes the structure of a relatively simple system of finite size that may be realized in a delay-feedback form. The application of the model and the synthesis of a learning system are illustrated by the derivation of a receiver for the detection of signals of unknown amplitude in white Gaussian noise.

••

••

TL;DR: The optimal decision rule is derived under the assumption of Markov dependence among the patterns to be recognized and applied to character-recognition problems.

Abstract: In many pattern-recognition problems there exist dependencies among the patterns to be recognized. In the past, these dependencies have not been introduced into the mathematical model when designing an optimal pattern-recognition system. In this paper the optimal decision rule is derived under the assumption of Markov dependence among the patterns to be recognized. Subsequently, this decision rule is applied to character-recognition problems. The main idea is to balance appropriately the information which is obtained from contextual considerations and the information from measurements on the character being recognized and thus arrive at a decision using both. Bayes' decision in Markov chains is presented and this mode of decision is adapted to character recognition. A look-ahead mode of decision is presented. The problem of estimation of transition probabilities is discussed. The experimental system is described and results of experiments on English legal text and names are presented.

••

TL;DR: The emphasis is on broadband (frequency-dependent) structures, and their associated space-time operators, by which the system as a whole is represented, and with the help of which one can apply the general methods of statistical communication theory to the central problems of signal processing for detection, communication, and classification.

Abstract: A theory of reverberation and related first-order scattered fields is developed, based on the assumption of weak inhomogeneities (i.e., primary scattering only), and a consequent representation in terms of Poisson point processes in space and time. Both surface and volume reverberation are included, separately and together, for general geometries, source and receiver at the same and different locations, and arbitrary transmitting and receiving apertures. A combination of field- and ray-theory is employed to obtain a characteristic scattered waveform, where the inhomogeneous medium is replaced by a homogeneous and isotropic one in which a spatially and temporally random ensemble of point scatterers is embedded. The effects of the scattering mechanism are described generally by a linear, time-varying filter response. The medium itself is seen to be dispersive and is represented by a set of linear (statistical) space-time operators, by which the signal source and the receiver are coupled to one another, as well as to the point scatterers. Broadband as well as narrowband signals and reverberation are included in the model, which is capable of handling general apertures, illuminating signals, doppler of the scatterers, multiple sources and receivers (overlapping beams), and a characteristic time-varying scatter mechanism, that reveals in detail the inherent nonstationarity of the reverberation. Shadowing effects of "rough" surfaces are included, and a variety of important special results, such as the case of narrowband excitation, and simple (time- and frequency-independent) scattering, are also described. The emphasis is on broadband (frequency-dependent) structures, and their associated space-time operators, by which the system as a whole is represented, and with the help of which one can apply the general methods of statistical communication theory to the central problems of signal processing for detection, communication, and classification, in an environment dominated by reverberation or clutter and analogous signal-dependent noise processes.

••

TL;DR: The problem of discriminating two Gaussian signals by using only a finite number of linear observables is considered and it is found that the set of observables that minimizes H is a set of coefficients of the simultaneously orthogonal expansions of the two signals.

Abstract: Consider the problem of discriminating two Gaussian signals by using only a finite number of linear observables. How to choose the set of n observables to minimize the error probability P_{e} , is a difficult problem. Because H , the Hellinger integral, and H^{2} form an upper and a lower bound for P_{e} , we minimize H instead. We find that the set of observables that minimizes H is a set of coefficients of the simultaneously orthogonal expansions of the two signals. The same set of observables maximizes the Hajek J -divergence as well.

••

TL;DR: This analytical technique produced codes for lengths near 100 digits as good as, or better than, any previously known binary pulse compression codes in less than 15 minutes computer time.

Abstract: An analytical technique for generating good binary pulse compression codes is developed. The first step in constructing a code of a given length N is to divide all the residues modulo N and less than N into residue classes. A code digit a(i)=\pm 1 is assigned to all members, i , of certain of these classes and a(i)=-1 to N and all members, i , of the remaining classes. Many of these divisions resulted in difference sets and corresponding binary codes with single-level periodic code correlations. Other divisions resulted in two-level periodic code correlations. In order for a binary pulse compression code to have low autocorrelation sidelobes, its periodic correlation sidelobes must be low. Therefore, codes with low periodic correlations were sought. Good binary codes for lengths just above 100 digits down to lengths near 10 digits were found. Several of them are known to be optimum codes. When programmed on an IBM 7094 , this analytical technique produced codes for lengths near 100 digits as good as, or better than, any previously known binary pulse compression codes in less than 15 minutes computer time.

••

TL;DR: The classical Fokker-Planck-Kolmogorov equations are generalized to hold for conditional probability density functions of arbitrary random processes both for one-dimensional and for vector random processes.

Abstract: In this paper, the classical Fokker-Planck-Kolmogorov equations are generalized to hold for conditional probability density functions of arbitrary random processes. Conditions are derived under which the generalized equations are of finite order both for one-dimensional and for vector random processes. An extension of the generalized equations which overcomes degeneracy occurring in the steady-state case is also presented.

••

••

Bell Labs

^{1}TL;DR: A model for a channel with multiple adjacent synchronization errors is proposed, and a lower and upper bound on the redundancy necessary to correct a given error rate will be derived.

Abstract: A synchronization error is said to occur when either a bit which does not belong appears, or is detected in a channel between bits which were transmitted; or a bit which was transmitted is lost or not detected. A model for such a channel will be proposed, and a lower and upper bound on the redundancy necessary to correct a given error rate will be derived. We will consider the case of single synchronization error correction in detail, and stronger bounds will be derived for that case. We will consider multiple adjacent synchronization errors as a special case, and show that the bounds can be tightened in this case as well.

••

TL;DR: The purpose of this correspondence is to show that the cited optimum properties can be retained by defining a generalized Karhunen-Loeve expansion which considers the possibility of two or more stochastic processes generating the random functions.

Abstract: In a recent paper Watanabe (1965) has established two useful theorems concerning the optimum properties of the Karhunen-Loeve expansion for the random functions of a stochastic process. The resulting coordinate system (Karhunen-Loeve system) was found to be optimum in the sense of i) minimizing the mean-square error committed by approximating the expansion of an infinite series by a finite number of terms, ii) minimizing the entropy function defined over the probability distribution of the coordinate coefficients for the entire ensemble. These optimum features were then effectively applied to the preprocessing of input data for the speech recognition problem. However, in the development of these properties considerations were only given to the situation where the random functions come from the same stochastic process, and consequently the coordinate coefficients in the expansion were treated to be nonrandom. The question then arises as to whether the optimum properties will hold if the random functions are realizations of more than one stochastic process as in the case of many preprocessing problems in pattern recognization and signal detection. The purpose of this correspondence is to show. that the cited optimum properties can be retained by defining a generalized Karhunen-Loeve expansion which considers the possibility of two or more stochastic processes generating the random functions. Necessary conditions are derived to assure the existence of such an expansion. Applications of these results are indicated in the ranking and selection of feature measurements for the sequential recognition (decision) problems.

••

TL;DR: The present model of the dispersive reverberation channel cannot be fully described by the statistical structures postulated earlier (and successfully) in radar astronomy: a considerably more involved second-order statistic is required (and obtained) here.

Abstract: In Part I a general, first-order scattering model for reverberation has been constructed, from which the characteristic aperture and medium operators have been obtained for general geometries, signals, and arrays. Here, in Part II, the second-order statistics (intensities, covariances, and spectra) of such reverberation processes are determined in detail. In addition to the fluctuation, or purely random component of the received scatter process, an average or purely deterministic component often exists and may be significant, particularly for scattering from surfaces. Since the medium is dispersive (and time varying), reverberation is a non-stationary process (as is well known). Except for purely sinusoidal signals, these processes are (approximately) stationary only for signals of short duration (i.e., at fixed ranges), and only then can be said to possess an intensity spectrum in the usual sense. In this latter instance the familiar reverberation laws (intensity \sim input signal, and \sim range ^{-3} or range ^{-2} , respectively, for surface or volume scatter) hold for the fluctuation component, while different laws (intensity \sim range ^{-2} , or range ^{0} ) govern the deterministic components. With the composite process no such simple behavior usually appears, unless the geometry insures the dominance of a surface or volume effect. Moreover, in the general case these laws are only approximations, which may often be poor ones. Particular attention is also given to narrowband signals. A variety of expressions for the covariances of the slowly-varying components of the received reverberation is developed in detail, for both surface and volume scatter. These results are very simply combined (additively in the means, variances, etc.) to yield analogous results for the complete, or composite reverberation process. Conditions for normality are given and a number of new results for the probability densities of such nonstationary, narrowband Gaussian processes are obtained. The first-order statistics of the envelope and phase of the received scattered return are also derived, for general narrowband signals, extending the earlier work of Rayleigh, Rice, Hoyt, Nakagami, Middleton, and Beckmann. Many special cases for arbitrary illuminating signals are considered, including an approach to complex (distributed) targets, uniform beams, multiple receivers, transmitter and receiver at the same location, large and small doppler, etc., where the general formulation (as in Part I) is broadband, given in terms of frequency-selective apertures, time-varying dynamic cross sections, and arbitrary input signal waveforms. A number of second-order statistical properties of the medium response is defined and evaluated, which are needed for the combined theoretical and experimental exploration of the medium itself. It is found that the present model of the dispersive reverberation channel cannot be fully described by the statistical structures postulated earlier (and successfully) in radar astronomy: a considerably more involved second-order statistic is required (and obtained) here. To what extent it may provide an adequate description of reality remains to be studied. Part II concludes with a short summary of the principal results and assumptions. Many special cases for arbitrary illuminating signals are considered, including an approach to complex (distributed) targets, uniform beams, multiple receivers, transmitter and receiver at the same location, large and small doppler, etc., where the general formulation (as in Part I) is broadband, given in terms of frequency-selective apertures, time-varying dynamic cross sections, and arbitrary input signal waveforms. A number of second-order statistical properties of the medium response is defined and evaluated,which are needed for the combined theoretical and experimental exploration of the medium itself. It is found that the present model of the dispersive reverberation channel cannot be fully described by the statistical structures postulated earlier (and successfully) in radar astronomy: a considerably more involved second-order statistic is required (and obtained) here. To what extent it may provide an adequate description of reality remains to be studied. Part II concludes with a short summary of the principal results and assumptions.

••

••

TL;DR: The distribution of the ratio of minimum distance to code length of a random linear code approaches a step distribution as the code length becomes arbitrarily large at fixed code rate.

Abstract: The distribution of the ratio of minimum distance to code length of a random linear code approaches a step distribution as the code length becomes arbitrarily large at fixed code rate. The location of the step is at the smaller value of p satisfying 1 + p \log_{2}p + (1 - p) \log_{2} (1 - p) = k/n .

••

TL;DR: A generalized burst-error correcting code is constructed and it is pointed out that the above large distance codes may be utilized in the construction of this burst- error code.

Abstract: Arithmetic codes are error-correcting or detecting codes implemented by ordinary arithmetic operations. Arithmetic codes with large distance, and therefore, capable of multierror correction are constructed. These codes are analogous to the finite field codes corresponding to maximal recurring sequences generated by shift registers whose characteristic polynomial is a primitive polynomial. These arithmetic codes are generated by the recurring sequence formed by the inverse of a prime having two as a primitive root. The distance as well as the redundancy increases with the code length. These codes have large redundancy but may be useful in specialized cases. Since the difference between a cyclic shift of a code word and the code word itself is another code word, a two-level function can be formed and the code used as an acquirable code. They can detect error bursts whose length is half the code length. A generalized burst-error correcting code is constructed and it is pointed out that the above large distance codes may be utilized in the construction of this burst-error code.

••

••

TL;DR: A device for performing binary integration on quantized data from a multiple-resolution-element radar is described, and expressions for false-alarm and detection probabilities are given.

Abstract: A device for performing binary integration on quantized data from a multiple-resolution-element radar is described. The moving window detector is described, and expressions for false-alarm and detection probabilities are given. A scheme for determining the optimum detection thresholds and the optimum quantizing levels is given. Application of the moving window detector to a particular multiple-range-element radar is described.

••

TL;DR: This paper discusses networks (directed graphs) having one input node, one output node, and an arbitrary number of intermediate nodes, whose branches are noisy communications channels, in which the input to each channel appears at its output corrupted by additive Gaussian noise.

Abstract: This paper discusses networks (directed graphs) having one input node, one output node, and an arbitrary number of intermediate nodes, whose branches are noisy communications channels, in which the input to each channel appears at its output corrupted by additive Gaussian noise. Each branch is labeled by a non-negative real parameter which specified how noisy it is. A branch originating at a node has as input a linear combination of the outputs of the branches terminating at that node. The channel capacity of such a network is defined. Its value is bounded in terms of branch parameter values and procedures for computing values for general networks are described. Explicit solutions are given for the class D_{0} which includes series-parallel and simple bridge networks and all other networks having r paths, b branches, and v nodes with r = b - \upsilon + 2 , and for the class D_{1} of networks which is inductively defined to include D_{0} and all networks obtained by replacing a branch of a network in D_{1} by a network in D_{1} . The general results are applied to the particular networks which arise from the decomposition of a simple feedback system into successive forward and reverse (feedback) channels. When the feedback channels are noiseless, the capacities of the forward channels are shown to add. Some explicit expressions and some bounds are given for the case of noisy feedback channels.