# Showing papers in "Information & Computation in 1959"

••

TL;DR: A sequence of restrictions that limit grammars first to Turing machines, then to two types of system from which a phrase structure description of the generated language can be drawn, and finally to finite state Markov sources are shown to be increasingly heavy.

Abstract: A grammar can be regarded as a device that enumerates the sentences of a language. We study a sequence of restrictions that limit grammars first to Turing machines, then to two types of system from which a phrase structure description of the generated language can be drawn, and finally to finite state Markov sources (finite automata). These restrictions are shown to be increasingly heavy in the sense that the languages that can be generated by grammars meeting a given restriction constitute a proper subset of those that can be generated by grammars meeting the preceding restriction. Various formulations of phrase structure description are considered, and the source of their excess generative power over finite state sources is investigated in greater detail.

1,330 citations

••

TL;DR: A certain analogy is found to exist between a special case of Fisher's quantity of information I and the inverse of the “entropy power” of Shannon and this constitutes a sharpening of the uncertainty relation of quantum mechanics for canonically conjugated variables.

Abstract: A certain analogy is found to exist between a special case of Fisher's quantity of information I and the inverse of the “entropy power” of Shannon (1949, p. 60). This can be inferred from two facts: (1) Both quantities satisfy inequalities that bear a certain resemblance to each other. (2) There is an inequality connecting the two quantities. This last result constitutes a sharpening of the uncertainty relation of quantum mechanics for canonically conjugated variables. Two of these relations are used to give a direct proof of an inequality of Shannon (1949, p. 63, Theorem 15). Proofs are not elaborated fully. Details will be given in a doctoral thesis that is in preparation.

792 citations

••

TL;DR: This paper considers the possibility of storing several of the lower order component distributions and using this partial information to form an approximation to the actual high order distribution.

Abstract: The measurement and/or storage of high order probability distributions implies exponential increases in equipment complexity. This paper considers the possibility of storing several of the lower order component distributions and using this partial information to form an approximation to the actual high order distribution. The approximation method is based on an information measure for the “closeness” of two distributions and on the criterion of maximum entropy. Approximations consisting of products of appropriate lower order distributions are proved to be optimum under suitably restricted conditions. Two such product approximations can be compared and the better one selected without any knowledge of the actual high order distribution other than that implied by the lower order distributions.

178 citations

••

TL;DR: An iterative method is presented which gives an optimum approximation to the Joint probability distribution of a set of binary variables given the joint probability distributions of any subsets of the variables (any set of component distributions).

Abstract: An iterative method is presented which gives an optimum approximationto the joint probability distribution of a set of binary variables given the joint probability distributions of any subsets of the variables (any set of component distributions) The most significant feature of this approximation procedure is that there is no limitation to the number or type of component distributions that can be employed Each step of the iteration gives an improved approximation, and the procedure converges to give an approximation that is the minimum information (ie maximum entropy) extension of the component distributions employed

107 citations

••

IBM

^{1}TL;DR: This note is a discussion of H. A. Simon's model (1955) concerning the class of frequency distributions generally associated with the name of G. K. Zipf, showing thatSimon's model is analytically circular in the case of the linguistic laws of Estoup-Zipf and Willis-Yule.

Abstract: This note is a discussion of H. A. Simon's model (1955) concerning the class of frequency distributions generally associated with the name of G. K. Zipf. The main purpose is to show that Simon's model is analytically circular in the case of the linguistic laws of Estoup-Zipf and Willis-Yule. Insofar as the economic law of Pareto is concerned, Simon has himself noted that his model is a particular case of that of Champernowne; this is correct, with some reservation. A simplified version of Simon's model is included.

81 citations

••

79 citations

••

TL;DR: This note establishes a connection between Hadamard matrices H4t and the maximal binary codes M( 4t, 2t; 8t), M(4t — 1, 2T; 4t) and M(3t — 2, 3t; 2t) in two symbols 0 and 1.

Abstract: This note establishes a connection between Hadamard matrices H4t and the maximal binary codes M(4t, 2t; 8t), M(4t — 1, 2t; 4t) and M(4t — 2, 2t; 2t) in two symbols 0 and 1, where by M(n, d; m) we mean a set of m n-place sequences with 0 and 1 such that the Hamming distance between any two sequences is greater than or equal to d. The structure of these maximal codes is also studied in this paper.

61 citations

••

TL;DR: A minimum principle is obtained for the sum of entropies of two distributions related as the absolute squares of a Fourier transform pair and a generalized uncertainty principle, for any set of observables not simultaneously measurable, is conjectured.

Abstract: A minimum principle is obtained for the sum of entropies of two distributions related as the absolute squares of a Fourier transform pair. The minimum is shown to be attained for a Gaussian pair. The joint entropy is calculated for two other Fourier pairs of interest. Applications to the uncertainty principle are made by defining a joint entropy for position and momentum. A generalized uncertainty principle, for any set of observables not simultaneously measurable, is conjectured.

51 citations

••

TL;DR: A new definition for the capacity C of a (discrete or semicontinuous) channel with finite memory is given, and the strong converse of the coding theorem is shown to hold for a particular finite-memory channel recently considered by Wolfowitz.

Abstract: A new definition for the capacity C of a (discrete or semicontinuous) channel with finite memory is given. In terms of this definition both the coding theorem and its weak converse are easily established. In particular, all questions of the ergodicity of the source-channel distribution are avoided, and we are able to show for discrete channels that both the ergodic and stationary capacities (as given by the Shannon-McMillan definition) coincide with that given here. Finally, the strong converse of the coding theorem is shown to hold for a particular finite-memory channel recently considered by Wolfowitz.

44 citations

••

TL;DR: A special purpose computer is described which calculates conditional probabilities and uses the illogical principle of induction and it can imitate many forms of animal learning.

Abstract: A special purpose computer is described which calculates conditional probabilities. The input to the computer is a set of channels which are in either an active or inactive state. At any instant a particular set of channels will, in general, be active; the computer calculates the conditional probability of all the other channels, based on what has happened in the past. The computer can be extended to forecast the probability of future signals and the past can be weighed in any desired manner. Such a computer uses the illogical principle of induction and it can imitate many forms of animal learning. Full details are given for the construction of such machines.

41 citations

••

TL;DR: Variational techniques are used to establish the following result: a dynamical system governed by the differential equation y ˙ = A y + f ( t ) ( 1 ) where A is a real constant matrix with distinct eigenvalues, which implies the concept of switching surfaces.

Abstract: In this paper variational techniques are used to establish the following result: suppose a dynamical system is governed by the differential equation y ˙ = A y + f ( t ) ( 1 ) where A is a real constant matrix with distinct eigenvalues. Suppose that these eigenvalues are further restricted to have nonpositive real parts but are not required to be purely real. Finally let each component φ(t) of the vector forcing functing f(t) satisfy, for all t , the conditions | ϕ i ( t ) | ≦ γ i ( i = 1 , 2 , ⋅ ⋅ ⋅ , n ) ( 2 ) where the γi's are preassingned constants. It is shown that, given an arbitrary initial condition y (0), the forcing function that will bring the system to its equilibrium position in the shortest possible time is such that φi(t)=±γi , and the instants of time at which φi(t) changes from ± γi to ± γi are obtained by considering the output of the adjoint system. Further relationship between the given system and the adjoint system is discussed in the paper. It is also shown that this solution, obtained by variational techniques, implies the concept of switching surfaces.

••

IBM

^{1}TL;DR: The solution to the problem of efficient machine manipulation of formal systems in which the predicates display a high degree of symmetry is embodied in a theorem and a rule of syntactic symmetry.

Abstract: In the past few years, digital computer programs that depart from the traditional numerical computation and data processing for which these machines were conceived have become increasingly commonplace. In many of these programs, the computer is called upon to manipulate a complex formal logistic system as a tool to implement the solution of a problem. This paper is concerned with the problem of efficient machine manipulation of formal systems in which the predicates display a high degree of symmetry. The solution to the problem, embodied in a theorem and a rule of syntactic symmetry is specified in Section I. The theorem is in fact a metatheorem concerning formal systems, and is used in the synthesis of proofs. On the other hand, the rule is an invaluable aid in the search for a proof by the so-called analytic method. In Section II, the set of all syntactic symmetries for a given set of formulas is constructed and displayed in a form conducive to minimum effort programming for a computer.

••

Bell Labs

^{1}TL;DR: The plan of a program that enables a computer to “learn” to play tic-tac-toe, and related 3×3 board games, is described and the definition of an L-automaton is introduced via a formal, behaviouristic definition, in an attempt to give an abstract characterization of machine “learning”.

Abstract: The plan of a program that enables a computer to “learn” to play tic-tac-toe, and related 3×3 board games, is described The programmed computer has no built-in knowledge of the game to be played, except for a rule for determining legal moves It specifically does not “know” that constitutes a win, loss, or draw, but must be informed of the outcome at the end of each play Experience indicates that a fair competence in tic-tac-toe playing is reached after 30 to 50 plays Generalizing from this example of a “learning machine”, thenotion of an L-automaton is introduced via a formal, behaviouristic definition, in an attempt to give an abstract characterization of machine “learning” A solution to the design problem for a general class of L-automata is presented

••

General Electric

^{1}TL;DR: An optimal way to estimate the higher order distributions is suggested and the results are applied to a coding problem to determine the delay required before encoding a message in order to achieve a prescribed fraction of the optimal comparison given by Shannon's coding theorem.

Abstract: This paper is part of a general study of efficient information selection, storage and processing. It is assumed that the information is contained in binary time series generated by a stochastic source. The main problem is to determine how to approximate the statistical properties of this information source by lower order probability distributions. First, it is determined what restrictions are imposed by known lower order probability distributions on the higher order distributions which are to be determined or estimated. This study suggests an optimal way to estimate the higher order distributions. In the second part the entropy changes which occur in going from lower to higher order probability distributions are studied. The upper and lower bounds for the entropy of the higher order distributions are computed in terms of the entropies of the lower order distributions. These results allow the computation of the “strength” of the conditions which are imposed on the higher order distribution and are not induced by the lower order ones. From this one can compute the importance of knowing the higher order probability distributions for information processing. In conclusion, these results are applied to a coding problem to determine the delay required before encoding a message in order to achieve a prescribed fraction of the optimal comparison given by Shannon's coding theorem.

••

TL;DR: It is shown that the class of possible restoring organs is wider than that discussed by von Neumann, and makes it possible to write truth-tables for restoring organs, and to synthesize them by means of two-valued or multivalued devices.

Abstract: Redundant automata, as described by von Neumann, use “restoring organs” in order to remove the effect of malfunctions. It is shown that the class of possible restoring organs is wider than that discussed by von Neumann. If “triplication” is used, a class of majority elements for the multivalued case provides restoring action in approximately the same amount as von Neumann's two-valued majority organs. If “multiplexing” is used, possible restoring organs can be classified in terms of their “length.” The length-1 restoring organs are majority elements; the length-2 restoring organs are derived from the majority organ concept, etc. This approach makes it possible to write truth-tables for restoring organs, and to synthesize them by means of two-valued or multivalued devices.

••

TL;DR: The new modern “matter of fact” point of view states the impossibility to prove that experimental errors are inevitable and that it is impossible to go to the limit of infinitely small errors.

Abstract: Classical physics was based upon the assumption that experimental errors were just accidental and should be ignored by the theory. Modern physics realizes that errors are inevitable and that it is impossible to go to the limit of infinitely small errors. The uncertainty principle and the negentropy principle of information prove that the smaller the error, the greater the price that must be paid for the observation. There is no exact limit to the accuracy, but its high cost makes it unattainable. Classical physics assumed complete determinism. The new modern “matter of fact” point of view states the impossibility to prove this assumption. If experimental errors are very small, it takes a longer time to reach the final statistical distribution, but this final state of statistical equilibrium will always be reached. Many examples are discussed, and the “matter of fact” point of view is compared with similar ideas presented by the Vienna school, by M. Born and by the Copenhagen school.

••

TL;DR: This paper discusses digital techniques by which habit-forming and learning may be simulated, focusing attention upon reinforcement, and uses the language of computer programming to describe the flow of control and mathematical probability to analyze the effect of various reinforcement functions on the asymptotic behavior of simulating programs.

Abstract: This paper discusses digital techniques by which habit-forming and learning may be simulated. After classifying the types of simulation mechanisms it discusses types of habit-forming and learning to be simulated, focusing attention upon reinforcement. It uses the language of computer programming to describe the flow of control, and the language of mathematical probability to analyze the effect of various reinforcement functions on the asymptotic behavior of simulating programs. It shows further, again in programming terms, how the “delayed random selector” part of the simulating process may be “factored out” as a separate unit applicable either to habit-forming or learning, which latter are distinguished by whether the reinforcements are applied immediately or upon “comparison with a goal.” Several reinforcement models are considered, including the “linear asymptotic” model used extensively by Bush and Mosteller, two simple “absorbing boundary” models, and a “nonlinear asymptotic” model currently being investigated by Bush, Galanter, and Luce. A sketch is given of the Harris-Bellman-Shapiro analysis of the linear asymptotic model. Contrasted with this, a complete analysis is given of the simpler absorbing boundary model, with explicit proof of eventual absorption, and formulae for probability of absorption in n trials, and the expected number of steps to absorption. Finally, a special example is given of the second absorbing boundary model to show how its structure differs from the others.

••

Brooklyn College

^{1}TL;DR: Methods for calculation of the informational content, i.e. complexity, of interacting systems of parts in mechanisms and circuits of self-reproducing systems are presented.

Abstract: This paper presents methods for calculation of the informational content, i.e. complexity, of interacting systems of parts in mechanisms and circuits. The methods have been developed primarily in order to describe self-reproducing systems (Jacobson, 1958) but can be applied with generality to any mechanism or circuit.

••

TL;DR: The nature of these exceptional systems is investigated; they are characterized by some extremal properties and a simplified proof of a classical result of Neyman and Pearson is obtained incidentally.

Abstract: When the amount of information passing through a channel is estimated on the basis of a sample, situations of two kinds can arise: Either (A) the statistical structure of the source of information is unknown or (B) it is known. In the present note only discrete sources and channels without probability aftereffects are discussed. In either kind of situation the estimator proposed is found to be in general asymptotically normal with a variance of the order of the reciprocal of the sample size. There are, however, important exceptions: For some systems (i.e. combinations of channels and matching sources) the estimator in question has a variance of the order of the reciprocal of the squared sample size; it is shown that then the corresponding asymptotic distribution is that of a quadratic form in Gaussian random variables. The nature of these exceptional systems is investigated; they are characterized by some extremal properties. A simplified proof of a classical result of Neyman and Pearson is obtained incidentally. Finally, the situation in which the channel is known is treated briefly at the end.

••

••

General Electric

^{1}TL;DR: This article contains a description of the strategies a machine or player might use in a simple penny matching game, where the adversary is a simple indifferent opponent, who has the characteristic that his probability of playing heads is independent of the outcome of the game.

Abstract: A problem currently facing engineers is the design of machines capable of making decisions. This article contains a description of the strategies a machine or player might use in a simple penny matching game. The adversary in this game is a simple indifferent opponent, who has the characteristic that his probability of playing heads is independent of the outcome of the game. Throughout the game the player attempts to select heads or tails in a manner that maximizes his expecte net gain. If the player does have this criterion, his decisions depend upon his estimate of his opponents probability of playing heads. Under certain conditions, optimum decisions can be performed with a single analogue storage element. If the opponent plays according to conditional probabilities, theplayer should estimate the two quantities Ph(h) , the probability a head follows a head, and Pt(h) , the probability a head follows a tail. If the player is unaware that his opponent is playing conditional probabilities and assumes that the opponent's probabilities are independent of previous selections, he may suffer a loss. The amount of this loss, if it does occur, is a function of Ph(h) and Pt(h) .

••

TL;DR: A mathematical technique is developed for measuring the information in a message where no information is transmitted by the order in which the symbols composing the message are received by calculating the information transmitted in recall and recognition tasks by a set of subjects in an actual experiment.

Abstract: In psychological experiments on recognition and recall, the subjectclearly conveys information. If a technique could be developed for measuring the information transmitted in these situations, meaningful comparisons could be made between subjects' performances under different conditions. With this end in view, a mathematical technique is developed for measuring the information in a message where no information is transmitted by the order in which the symbols composing the message are received. The theory is presented in four stages. At each stage assumptions or, as at stage IV, approximations are made which enable the information transmitted by a subject to be estimated by performing fewer experiments than at the previous stage. A crucial assumption at stage III involves minimizing the information subject to certain parameters being held constant. The assumptions are discussed, and the theory is illustrated by calculating the information transmitted in recall and recognition tasks by a set of subjects in an actual experiment. Further applications are briefly discussed.

••

TL;DR: Recent studies suggest the possibility that in the visual system of man over-all temporal patterns of activity may be used for the transmission of certain sensory information and that this temporal coding may arise from the differing nerve impulse conduction velocities that exist in a nerve fiber bundle.

Abstract: Neurone theory and the concept of the all-or-none nerve impulseare fundamental to modern neurophysiology; but the simplest ideas derived from these lead to the difficulty as to how complex sensory perception is possible through such limited structure. Recent studies suggest the possibility that in the visual system of man over-all temporal patterns of activity may be used for the transmission of certain sensory information. It is possible that this temporal coding may arise from the differing nerve impulse conduction velocities that exist in a nerve fiber bundle. Such a coding system may have a significant part to play in the general functioning of the nervous system.

••

••

TL;DR: The behavior of a forced oscillatory system which is linearly damped but nonlinear in the restoring force is investigated according to the author’s previous papers and under which conditions the system may contain subharmonics of order 1/2.

Abstract: The behavior of a forced oscillatory system which is linearly damped but nonlinear in the restoring force is investigated according to the author’s previous papers (Magiros, 1957, 1958). It is shown under which conditions the system may contain subharmonics of order 1/2. The amplitudes of the subharmonics and their components, and the bounds for the amplitude of the external force, are given in terms of the coefficients of the differential equation of the system, which are not necessarily very small, as well as the regions in the (c 1/c 3, I)-plane, where we have subharmonics with two, one, or neither amplitudes. Also discussed are the stability of the subharmonics, the free vibrations of the system, and the case when one of the coefficients of the nonlinear terms is zero.

••

TL;DR: The proposal extends the range of the propositional variables so that residue class check symbols may be used in error detection, so that individual logical elements may be designed to process binary inputs with arbitrary reliability and nonzero channel capacity.

Abstract: A method of error detection is proposed for noisy logical computer elements. The proposal extends the range of the propositional variables so that residue class check symbols may be used in error detection. The principal consequence is that individual logical elements may be designed to process binary inputs with arbitrary reliability and nonzero channel capacity.

••

IBM

^{1}TL;DR: It is shown that, provided no input-output relation is deterministic, this system will eventually become completely disorganized in the sense that it will behave like a collection of independent parts with completely unpredictable behavior.

Abstract: Consider a circle of devices, each characterized by a one-bit output which is statistically correlated with the outputs of its right and/or left neighbor at the preceding time period. It is shown that, provided no input-output relation is deterministic, this system will eventually become completely disorganized in the sense that it will behave like a collection of independent parts with completely unpredictable behavior. The rate at which this disorganization occurs under various conditions is calculated. A concrete version of this model is a sequential circuit of imperfect relays. A possible application to the determination of certain reliability parameters is given. The relation to a similar model for statistical mechanics by Kac is shown.