Showing papers in &quot;Information &amp; Computation in 1959&quot;

Some inequalities satisfied by the quantities of information of Fisher and Shannon

TL;DR: A sequence of restrictions that limit grammars first to Turing machines, then to two types of system from which a phrase structure description of the generated language can be drawn, and finally to finite state Markov sources are shown to be increasingly heavy.

...read moreread less

Abstract: A grammar can be regarded as a device that enumerates the sentences of a language. We study a sequence of restrictions that limit grammars first to Turing machines, then to two types of system from which a phrase structure description of the generated language can be drawn, and finally to finite state Markov sources (finite automata). These restrictions are shown to be increasingly heavy in the sense that the languages that can be generated by grammars meeting a given restriction constitute a proper subset of those that can be generated by grammars meeting the preceding restriction. Various formulations of phrase structure description are considered, and the source of their excess generative power over finite state sources is investigated in greater detail.

...read moreread less

1,330 citations

Journal Article•DOI•

[...]

A. J. Stam

Approximating probability distributions to reduce storage requirements

TL;DR: A certain analogy is found to exist between a special case of Fisher's quantity of information I and the inverse of the “entropy power” of Shannon and this constitutes a sharpening of the uncertainty relation of quantum mechanics for canonically conjugated variables.

...read moreread less

Abstract: A certain analogy is found to exist between a special case of Fisher's quantity of information I and the inverse of the “entropy power” of Shannon (1949, p. 60). This can be inferred from two facts: (1) Both quantities satisfy inequalities that bear a certain resemblance to each other. (2) There is an inequality connecting the two quantities. This last result constitutes a sharpening of the uncertainty relation of quantum mechanics for canonically conjugated variables. Two of these relations are used to give a direct proof of an inequality of Shannon (1949, p. 63, Theorem 15). Proofs are not elaborated fully. Details will be given in a doctoral thesis that is in preparation.

...read moreread less

792 citations

Journal Article•DOI•

[...]

P. M. Lewis¹, P. M. Lewis²•Institutions (2)

Massachusetts Institute of Technology¹, General Electric²

A note on approximations to discrete probability distributions

TL;DR: This paper considers the possibility of storing several of the lower order component distributions and using this partial information to form an approximation to the actual high order distribution.

...read moreread less

Abstract: The measurement and/or storage of high order probability distributions implies exponential increases in equipment complexity. This paper considers the possibility of storing several of the lower order component distributions and using this partial information to form an approximation to the actual high order distribution. The approximation method is based on an information measure for the “closeness” of two distributions and on the criterion of maximum entropy. Approximations consisting of products of appropriate lower order distributions are proved to be optimum under suitably restricted conditions. Two such product approximations can be compared and the better one selected without any knowledge of the actual high order distribution other than that implied by the lower order distributions.

...read moreread less

178 citations

Journal Article•DOI•

[...]

David T. Brown¹•Institutions (1)

Massachusetts Institute of Technology¹

A note on a class of skew distribution functions: Analysis and critique of a paper by H. A. Simon

TL;DR: An iterative method is presented which gives an optimum approximation to the Joint probability distribution of a set of binary variables given the joint probability distributions of any subsets of the variables (any set of component distributions).

...read moreread less

Abstract: An iterative method is presented which gives an optimum approximationto the joint probability distribution of a set of binary variables given the joint probability distributions of any subsets of the variables (any set of component distributions) The most significant feature of this approximation procedure is that there is no limitation to the number or type of component distributions that can be employed Each step of the iteration gives an improved approximation, and the procedure converges to give an approximation that is the minimum information (ie maximum entropy) extension of the component distributions employed

...read moreread less

107 citations

Journal Article•DOI•

[...]

Benoit B. Mandelbrot¹•Institutions (1)

IBM¹

A note on phrase structure grammars

TL;DR: This note is a discussion of H. A. Simon's model (1955) concerning the class of frequency distributions generally associated with the name of G. K. Zipf, showing thatSimon's model is analytically circular in the case of the linguistic laws of Estoup-Zipf and Willis-Yule.

...read moreread less

Abstract: This note is a discussion of H. A. Simon's model (1955) concerning the class of frequency distributions generally associated with the name of G. K. Zipf. The main purpose is to show that Simon's model is analytically circular in the case of the linguistic laws of Estoup-Zipf and Willis-Yule. Insofar as the economic law of Pareto is concerned, Simon has himself noted that his model is a particular case of that of Champernowne; this is correct, with some reservation. A simplified version of Simon's model is included.

...read moreread less

81 citations

Journal Article•DOI•

[...]

Noam Chomsky¹•Institutions (1)

Massachusetts Institute of Technology¹

A note on a result in the theory of code construction

79 citations

Journal Article•DOI•

[...]

Raj Chandra Bose¹, S. S. Shrikhande¹•Institutions (1)

University of North Carolina at Chapel Hill¹

Entropy and the uncertainty principle

TL;DR: This note establishes a connection between Hadamard matrices H4t and the maximal binary codes M( 4t, 2t; 8t), M(4t — 1, 2T; 4t) and M(3t — 2, 3t; 2t) in two symbols 0 and 1.

...read moreread less

Abstract: This note establishes a connection between Hadamard matrices H4t and the maximal binary codes M(4t, 2t; 8t), M(4t — 1, 2t; 4t) and M(4t — 2, 2t; 2t) in two symbols 0 and 1, where by M(n, d; m) we mean a set of m n-place sequences with 0 and 1 such that the Hamming distance between any two sequences is greater than or equal to d. The structure of these maximal codes is also studied in this paper.

...read moreread less

61 citations

Journal Article•DOI•

[...]

Roy Leipnik

On the coding theorem and its converse for finite-memory channels

TL;DR: A minimum principle is obtained for the sum of entropies of two distributions related as the absolute squares of a Fourier transform pair and a generalized uncertainty principle, for any set of observables not simultaneously measurable, is conjectured.

...read moreread less

Abstract: A minimum principle is obtained for the sum of entropies of two distributions related as the absolute squares of a Fourier transform pair. The minimum is shown to be attained for a Gaussian pair. The joint entropy is calculated for two other Fourier pairs of interest. Applications to the uncertainty principle are made by defining a joint entropy for position and momentum. A generalized uncertainty principle, for any set of observables not simultaneously measurable, is conjectured.

...read moreread less

51 citations

Journal Article•DOI•

[...]

Amiel Feinstein¹•Institutions (1)

Stanford University¹

The design of conditional probability computers

TL;DR: A new definition for the capacity C of a (discrete or semicontinuous) channel with finite memory is given, and the strong converse of the coding theorem is shown to hold for a particular finite-memory channel recently considered by Wolfowitz.

...read moreread less

Abstract: A new definition for the capacity C of a (discrete or semicontinuous) channel with finite memory is given. In terms of this definition both the coding theorem and its weak converse are easily established. In particular, all questions of the ergodicity of the source-channel distribution are avoided, and we are able to show for discrete channels that both the ergodic and stationary capacities (as given by the Shannon-McMillan definition) coincide with that given here. Finally, the strong converse of the coding theorem is shown to hold for a particular finite-memory channel recently considered by Wolfowitz.

...read moreread less

44 citations

Journal Article•DOI•

[...]

Albert M. Uttley¹•Institutions (1)

National Physical Laboratory¹

The bang bang servo problem treated by variational techniques

TL;DR: A special purpose computer is described which calculates conditional probabilities and uses the illogical principle of induction and it can imitate many forms of animal learning.

...read moreread less

Abstract: A special purpose computer is described which calculates conditional probabilities. The input to the computer is a set of channels which are in either an active or inactive state. At any instant a particular set of channels will, in general, be active; the computer calculates the conditional probability of all the other channels, based on what has happened in the past. The computer can be extended to forecast the probability of future signals and the past can be weighed in any desired manner. Such a computer uses the illogical principle of induction and it can imitate many forms of animal learning. Full details are given for the construction of such machines.

...read moreread less

41 citations

Journal Article•DOI•

[...]

Charles A. Desoer¹•Institutions (1)

University of California, Berkeley¹

A note on syntactic symmetry and the manipulation of formal systems by machine

TL;DR: Variational techniques are used to establish the following result: a dynamical system governed by the differential equation y ˙ = A y + f ( t ) ( 1 ) where A is a real constant matrix with distinct eigenvalues, which implies the concept of switching surfaces.

...read moreread less

Abstract: In this paper variational techniques are used to establish the following result: suppose a dynamical system is governed by the differential equation y ˙ = A y + f ( t ) ( 1 ) where A is a real constant matrix with distinct eigenvalues. Suppose that these eigenvalues are further restricted to have nonpositive real parts but are not required to be purely real. Finally let each component φ(t) of the vector forcing functing f(t) satisfy, for all t , the conditions | ϕ i ( t ) | ≦ γ i ( i = 1 , 2 , ⋅ ⋅ ⋅ , n ) ( 2 ) where the γi's are preassingned constants. It is shown that, given an arbitrary initial condition y (0), the forcing function that will bring the system to its equilibrium position in the shortest possible time is such that φi(t)=±γi , and the instants of time at which φi(t) changes from ± γi to ± γi are obtained by considering the output of the adjoint system. Further relationship between the given system and the adjoint system is discussed in the paper. It is also shown that this solution, obtained by variational techniques, implies the concept of switching surfaces.

...read moreread less

Journal Article•DOI•

[...]

Herbert L. Gelernter¹•Institutions (1)

IBM¹

Two notes on machine “Learning”

TL;DR: The solution to the problem of efficient machine manipulation of formal systems in which the predicates display a high degree of symmetry is embodied in a theorem and a rule of syntactic symmetry.

...read moreread less

Abstract: In the past few years, digital computer programs that depart from the traditional numerical computation and data processing for which these machines were conceived have become increasingly commonplace. In many of these programs, the computer is called upon to manipulate a complex formal logistic system as a tool to implement the solution of a problem. This paper is concerned with the problem of efficient machine manipulation of formal systems in which the predicates display a high degree of symmetry. The solution to the problem, embodied in a theorem and a rule of syntactic symmetry is specified in Section I. The theorem is in fact a metatheorem concerning formal systems, and is used in the synthesis of proofs. On the other hand, the rule is an invaluable aid in the search for a proof by the so-called analytic method. In Section II, the set of all syntactic symmetries for a given set of formulas is constructed and displayed in a form conducive to minimum effort programming for a computer.

...read moreread less

Journal Article•DOI•

[...]

Henrik H. Martens¹•Institutions (1)

Bell Labs¹

The application of some basic inequalities for entropy

TL;DR: The plan of a program that enables a computer to “learn” to play tic-tac-toe, and related 3×3 board games, is described and the definition of an L-automaton is introduced via a formal, behaviouristic definition, in an attempt to give an abstract characterization of machine “learning”.

...read moreread less

Abstract: The plan of a program that enables a computer to “learn” to play tic-tac-toe, and related 3×3 board games, is described The programmed computer has no built-in knowledge of the game to be played, except for a rule for determining legal moves It specifically does not “know” that constitutes a win, loss, or draw, but must be informed of the outcome at the end of each play Experience indicates that a fair competence in tic-tac-toe playing is reached after 30 to 50 plays Generalizing from this example of a “learning machine”, thenotion of an L-automaton is introduced via a formal, behaviouristic definition, in an attempt to give an abstract characterization of machine “learning” A solution to the design problem for a general class of L-automata is presented

...read moreread less

Journal Article•DOI•

[...]

Juris Hartmanis¹•Institutions (1)

General Electric¹

Restoring organs in redundant automata

TL;DR: An optimal way to estimate the higher order distributions is suggested and the results are applied to a coding problem to determine the delay required before encoding a message in order to achieve a prescribed fraction of the optimal comparison given by Shannon's coding theorem.

...read moreread less

Abstract: This paper is part of a general study of efficient information selection, storage and processing. It is assumed that the information is contained in binary time series generated by a stochastic source. The main problem is to determine how to approximate the statistical properties of this information source by lower order probability distributions. First, it is determined what restrictions are imposed by known lower order probability distributions on the higher order distributions which are to be determined or estimated. This study suggests an optimal way to estimate the higher order distributions. In the second part the entropy changes which occur in going from lower to higher order probability distributions are studied. The upper and lower bounds for the entropy of the higher order distributions are computed in terms of the entropies of the lower order distributions. These results allow the computation of the “strength” of the conditions which are imposed on the higher order distribution and are not induced by the lower order ones. From this one can compute the importance of knowing the higher order probability distributions for information processing. In conclusion, these results are applied to a coding problem to determine the delay required before encoding a message in order to achieve a prescribed fraction of the optimal comparison given by Shannon's coding theorem.

...read moreread less

Journal Article•DOI•

[...]

Oscar Lowenschuss

Inevitable experimental errors, determinism, and information theory

TL;DR: It is shown that the class of possible restoring organs is wider than that discussed by von Neumann, and makes it possible to write truth-tables for restoring organs, and to synthesize them by means of two-valued or multivalued devices.

...read moreread less

Abstract: Redundant automata, as described by von Neumann, use “restoring organs” in order to remove the effect of malfunctions. It is shown that the class of possible restoring organs is wider than that discussed by von Neumann. If “triplication” is used, a class of majority elements for the multivalued case provides restoring action in approximately the same amount as von Neumann's two-valued majority organs. If “multiplexing” is used, possible restoring organs can be classified in terms of their “length.” The length-1 restoring organs are majority elements; the length-2 restoring organs are derived from the majority organ concept, etc. This approach makes it possible to write truth-tables for restoring organs, and to synthesize them by means of two-valued or multivalued devices.

...read moreread less

Journal Article•DOI•

[...]

Leon Brillouin¹•Institutions (1)

Columbia University¹

On the mechanical simulation of habit-forming and learning

TL;DR: The new modern “matter of fact” point of view states the impossibility to prove that experimental errors are inevitable and that it is impossible to go to the limit of infinitely small errors.

...read moreread less

Abstract: Classical physics was based upon the assumption that experimental errors were just accidental and should be ignored by the theory. Modern physics realizes that errors are inevitable and that it is impossible to go to the limit of infinitely small errors. The uncertainty principle and the negentropy principle of information prove that the smaller the error, the greater the price that must be paid for the observation. There is no exact limit to the accuracy, but its high cost makes it unattainable. Classical physics assumed complete determinism. The new modern “matter of fact” point of view states the impossibility to prove this assumption. If experimental errors are very small, it takes a longer time to reach the final statistical distribution, but this final state of statistical equilibrium will always be reached. Many examples are discussed, and the “matter of fact” point of view is compared with similar ideas presented by the Vienna school, by M. Born and by the Copenhagen school.

...read moreread less

Journal Article•DOI•

[...]

Saul Gorn¹•Institutions (1)

University of Pennsylvania¹

The informational content of mechanisms and circuits

TL;DR: This paper discusses digital techniques by which habit-forming and learning may be simulated, focusing attention upon reinforcement, and uses the language of computer programming to describe the flow of control and mathematical probability to analyze the effect of various reinforcement functions on the asymptotic behavior of simulating programs.

...read moreread less

Abstract: This paper discusses digital techniques by which habit-forming and learning may be simulated. After classifying the types of simulation mechanisms it discusses types of habit-forming and learning to be simulated, focusing attention upon reinforcement. It uses the language of computer programming to describe the flow of control, and the language of mathematical probability to analyze the effect of various reinforcement functions on the asymptotic behavior of simulating programs. It shows further, again in programming terms, how the “delayed random selector” part of the simulating process may be “factored out” as a separate unit applicable either to habit-forming or learning, which latter are distinguished by whether the reinforcements are applied immediately or upon “comparison with a goal.” Several reinforcement models are considered, including the “linear asymptotic” model used extensively by Bush and Mosteller, two simple “absorbing boundary” models, and a “nonlinear asymptotic” model currently being investigated by Bush, Galanter, and Luce. A sketch is given of the Harris-Bellman-Shapiro analysis of the linear asymptotic model. Contrasted with this, a complete analysis is given of the simpler absorbing boundary model, with explicit proof of eventual absorption, and formulae for probability of absorption in n trials, and the expected number of steps to absorption. Finally, a special example is given of the second absorbing boundary model to show how its structure differs from the others.

...read moreread less

Journal Article•DOI•

[...]

Homer Jacobson¹•Institutions (1)

Brooklyn College¹

The asymptotic distributions of estimators of the amount of transmitted information

TL;DR: Methods for calculation of the informational content, i.e. complexity, of interacting systems of parts in mechanisms and circuits of self-reproducing systems are presented.

...read moreread less

Abstract: This paper presents methods for calculation of the informational content, i.e. complexity, of interacting systems of parts in mechanisms and circuits. The methods have been developed primarily in order to describe self-reproducing systems (Jacobson, 1958) but can be applied with generality to any mechanism or circuit.

...read moreread less

Journal Article•DOI•

[...]

Z. A. Lomnicki¹, S.K. Zaremba•Institutions (1)

Boulton Paul Aircraft¹

On the uncertainty relation for real signals: Postscript

TL;DR: The nature of these exceptional systems is investigated; they are characterized by some extremal properties and a simplified proof of a classical result of Neyman and Pearson is obtained incidentally.

...read moreread less

Abstract: When the amount of information passing through a channel is estimated on the basis of a sample, situations of two kinds can arise: Either (A) the statistical structure of the source of information is unknown or (B) it is known. In the present note only discrete sources and channels without probability aftereffects are discussed. In either kind of situation the estimator proposed is found to be in general asymptotically normal with a variance of the order of the reciprocal of the sample size. There are, however, important exceptions: For some systems (i.e. combinations of channels and matching sources) the estimator in question has a variance of the order of the reciprocal of the squared sample size; it is shown that then the corresponding asymptotic distribution is that of a quadratic form in Gaussian random variables. The nature of these exceptional systems is investigated; they are characterized by some extremal properties. A simplified proof of a classical result of Neyman and Pearson is obtained incidentally. Finally, the situation in which the channel is known is treated briefly at the end.

...read moreread less

Journal Article•DOI•

[...]

I. Kay, Richard A. Silverman

Journal Article•DOI•

Penny matching machines

[...]

Gerald M. White¹•Institutions (1)

General Electric¹

The information content of nonsequential messages

TL;DR: This article contains a description of the strategies a machine or player might use in a simple penny matching game, where the adversary is a simple indifferent opponent, who has the characteristic that his probability of playing heads is independent of the outcome of the game.

...read moreread less

Abstract: A problem currently facing engineers is the design of machines capable of making decisions. This article contains a description of the strategies a machine or player might use in a simple penny matching game. The adversary in this game is a simple indifferent opponent, who has the characteristic that his probability of playing heads is independent of the outcome of the game. Throughout the game the player attempts to select heads or tails in a manner that maximizes his expecte net gain. If the player does have this criterion, his decisions depend upon his estimate of his opponents probability of playing heads. Under certain conditions, optimum decisions can be performed with a single analogue storage element. If the opponent plays according to conditional probabilities, theplayer should estimate the two quantities Ph(h) , the probability a head follows a head, and Pt(h) , the probability a head follows a tail. If the player is unaware that his opponent is playing conditional probabilities and assumes that the opponent's probabilities are independent of previous selections, he may suffer a loss. The amount of this loss, if it does occur, is a function of Ph(h) and Pt(h) .

...read moreread less

Journal Article•DOI•

[...]

B.R. Judd¹, N.S. Sutherland¹•Institutions (1)

University of Oxford¹

A note on temporal coding as a mechanism in sensory perception

TL;DR: A mathematical technique is developed for measuring the information in a message where no information is transmitted by the order in which the symbols composing the message are received by calculating the information transmitted in recall and recognition tasks by a set of subjects in an actual experiment.

...read moreread less

Abstract: In psychological experiments on recognition and recall, the subjectclearly conveys information. If a technique could be developed for measuring the information transmitted in these situations, meaningful comparisons could be made between subjects' performances under different conditions. With this end in view, a mathematical technique is developed for measuring the information in a message where no information is transmitted by the order in which the symbols composing the message are received. The theory is presented in four stages. At each stage assumptions or, as at stage IV, approximations are made which enable the information transmitted by a subject to be estimated by performing fewer experiments than at the previous stage. A crucial assumption at stage III involves minimizing the information subject to certain parameters being held constant. The assumptions are discussed, and the theory is illustrated by calculating the information transmitted in recall and recognition tasks by a set of subjects in an actual experiment. Further applications are briefly discussed.

...read moreread less

Journal Article•DOI•

[...]

P. Scott, K.G. Williams

A paradox concerning rate of information: Corrections and additions

TL;DR: Recent studies suggest the possibility that in the visual system of man over-all temporal patterns of activity may be used for the transmission of certain sensory information and that this temporal coding may arise from the differing nerve impulse conduction velocities that exist in a nerve fiber bundle.

...read moreread less

Abstract: Neurone theory and the concept of the all-or-none nerve impulseare fundamental to modern neurophysiology; but the simplest ideas derived from these lead to the difficulty as to how complex sensory perception is possible through such limited structure. Recent studies suggest the possibility that in the visual system of man over-all temporal patterns of activity may be used for the transmission of certain sensory information. It is possible that this temporal coding may arise from the differing nerve impulse conduction velocities that exist in a nerve fiber bundle. Such a coding system may have a significant part to play in the general functioning of the nervous system.

...read moreread less

Journal Article•DOI•

[...]

I.J. Good, K. Caj Doog

On a problem of nonlinear mechanics

Journal Article•DOI•

[...]

Demetrios G. Magiros¹•Institutions (1)

Republic Aviation¹

A note on error detection in noisy logical computers

TL;DR: The behavior of a forced oscillatory system which is linearly damped but nonlinear in the restoring force is investigated according to the author’s previous papers and under which conditions the system may contain subharmonics of order 1/2.

...read moreread less

Abstract: The behavior of a forced oscillatory system which is linearly damped but nonlinear in the restoring force is investigated according to the author’s previous papers (Magiros, 1957, 1958). It is shown under which conditions the system may contain subharmonics of order 1/2. The amplitudes of the subharmonics and their components, and the bounds for the amplitude of the external force, are given in terms of the coefficients of the differential equation of the system, which are not necessarily very small, as well as the regions in the (c 1/c 3, I)-plane, where we have subharmonics with two, one, or neither amplitudes. Also discussed are the stability of the subharmonics, the free vibrations of the system, and the case when one of the coefficients of the nonlinear terms is zero.

...read moreread less

Journal Article•DOI•

[...]

Murray Eden¹•Institutions (1)

Massachusetts Institute of Technology¹

Circle networks of probabilistic transducers

TL;DR: The proposal extends the range of the propositional variables so that residue class check symbols may be used in error detection, so that individual logical elements may be designed to process binary inputs with arbitrary reliability and nonzero channel capacity.

...read moreread less

Abstract: A method of error detection is proposed for noisy logical computer elements. The proposal extends the range of the propositional variables so that residue class check symbols may be used in error detection. The principal consequence is that individual logical elements may be designed to process binary inputs with arbitrary reliability and nonzero channel capacity.

...read moreread less

Journal Article•DOI•

[...]

Manfred Kochen¹•Institutions (1)

IBM¹