scispace - formally typeset
Search or ask a question

Showing papers in "Information & Computation in 1961"


Journal ArticleDOI
TL;DR: The definition of a family (~ of automata derived from the family a0 of the finite one-way one-tape automata is discussed, which is a very elementary modification of G0 and it is not.
Abstract: In this note we discuss the definition of a family (~ of automata derived from the family a0 of the finite one-way one-tape automata (Rabin and Scott, 1959). In loose terms, the automata from (~ are among the machines characterized by the following restrictions: (a) Their output consists in the acceptance (or reieetion ) of input words belonging to the set F of all words in the letters of a finite alphabet X. (b) The automaton operates sequentially on the sueessive letters of the input word without the possibility of coming back on the previously read letters and, thus, all the information to be used in the further computations has to be stored in the internal memory. (c) The unbounded part of the memory, V~¢, is the finite dimensional vector space of the vectors with N integral coordinates; this part of the memory plays only a passive role and all the control of the automaton is performed by the finite part. (d) 0n ly elementary arithmetic operations are used and the amount of computation allowed for each input letter is bounded in terms of the total number of additions and subtractions. (e) The rule by which it is decided to accept or reject a given input word is submitted to the same type of requirements and it involves only the storage of a finite amount of information. Thus the family (~ is a very elementary modification of G0 and it is not

724 citations


Journal ArticleDOI
Jay M. Berger1
TL;DR: Some new codes are described which are separable and are perfect error detection codes in a completely asymmetric channel and the new code is found to compare favorably in error detection capability in several cases.
Abstract: Some new codes are described which are separable and are perfect error detection codes in a completely asymmetric channel. Some results are given of comparisons between one simple form of the code in which the check bits correspond to the sum of ones in the information bits and the four out of eight code. The new code is found to compare favorably in error detection capability in several cases. In addition, some more complex codes of this type are indicated.

429 citations


Journal ArticleDOI
TL;DR: A very restricted class of transducers, i.e. of autthe transformation consisting in the replacement of every input letter x by an output word (x) which is eventually the empty word e sub Y is proposed.
Abstract: : A very restricted class of transducers, i.e. of autthe transformation consisting in the replacement of every input letter x by an output word (x) which is eventually the empty word e sub Y. (Author)

86 citations


Journal ArticleDOI
TL;DR: A functional defined by means of entropy is considered and it is shown that it is a distance in the set of discrete probability distributions.
Abstract: A functional defined by means of entropy is considered. It is shown that it is a distance in the set of discrete probability distributions.

75 citations


Journal ArticleDOI
TL;DR: A unified approach to sampling theorems for (wide sense) stationary random processes rests upon Hilbert space concepts and shows that (almost) arbitrary linear operations on x (t) can be reproduced by linear combinations of the samples.
Abstract: A unified approach to sampling theorems for (wide sense) stationary random processes rests upon Hilbert space concepts. New results in sampling theory are obtained along the following lines: recovery of the process x(t) from nonperiodic samples, or when any finite number of samples are deleted; conditions for obtaining x (t) when only the past is sampled; a criterion for restoring x(t) from a finite number of consecutive samples; and a minimum mean square error estimate of x(t) based on any (possibly nonperiodie) set of samples. In each case, the proofs apply not only to the recovery of x(t), but are extended to show that (almost) arbitrary linear operations on x (t) can be reproduced by linear combinations of the samples. Further generality is attained by use of the spectral distribution function F(. ) of x(t), without assuming F(.) absolutely continuous.

75 citations


Journal ArticleDOI
TL;DR: In this paper, the authors present procedures for solving general state-identification problems by experiments of various specified characteristics, including simple, simple, preset or adaptive experiments, whenever simple or non-minimal experiments are realizable, and whenever simple experimentation is not realizable.
Abstract: Work done in the past on the subject of state-identification in finite automata has been limited to devising procedures for some special-purpose identification experiments, and to estimating the lengths of more general experiments. The purpose of the present paper is to complement this work by presenting procedures for solving general state-identification problems by experiments of various specified characteristics. Classifying identification experiments into preset or adaptive and into simple or multiple, the results contain procedures for identifying the initial state of an automaton by minimal, simple, preset or adaptive experiments, whenever simple, preset experimentation is realizable, and by multiple, preset or adaptive experiments, whenever simple experimentation is not realizable. Also, procedures are described for identifying the final state of an automaton by minimal or nonminimal, preset or adaptive experiments. Bounds associated with the various procedures are determined, and the applicability of the results to machine-identification problems is discussed.

74 citations


Journal ArticleDOI
TL;DR: Data which had previously been published to describe the statistical characteristics of English words was examined and a detailed empirical study was made of two special types of English word: subject words and proper names.
Abstract: Data which had previously been published by several authors ( Ohlman, 1958; Pratt, 1942 ; Ohaver, 1933 ; Smith, 1943 ; Cox, 1947 ; Griffith, 1949 ; Gaines, 1956 ; Luhn, 1958 ) to describe the statistical characteristics of English words was examined to show the extent of their agreement. In addition, a detailed empirical study was made of two special types of English word: subject words and proper names. The data for the subject words and proper names was compared with previously reported data on subject words, proper names, and continuous text material. The statistical parameters which were measured and compared are: the distribution of initial letters, the distribution of terminal letters, the composite or total distribution of letters, the distribution of characters for each letter position, the distribution of bigrams, and the distribution of word lengths.

37 citations


Journal ArticleDOI
TL;DR: The purpose of this paper is to indicate a systematic way in which the theory of dynamic programming can be used to provide a computational solution to the determination of optimal and suboptimal testing policies.
Abstract: The problem of ascertaining the minimum number of weighings which suffice to determine the defective coin in a set of N coins of the same appearance, given an equal arm balance and the information that there is precisely one defective coin present, is well known. A large number of ingenious solutions exist, some based upon sequential procedures and some not. The problem in the case where there are known to be two or more defective coins is far more complex because we cannot draw any simple definite conclusions at the end of a single test. We shall analyze this in detail in the following paper. The purpose of this paper is to indicate a systematic way in which the theory of dynamic programming can be used to provide a computational solution to the determination of optimal and suboptimal testing policies. We shall illustrate this by means of some numerical results obtained using a digital computer.

35 citations


Journal ArticleDOI
TL;DR: Two results on the correction of multiple bursts of errors are presented and a theorem is given which increases the feasibility of correcting such errors in codes over GF(2) by constructing cyclic codes of a given weight.
Abstract: Two results on the correction of multiple bursts of errors are presented. In Section II, a theorem is given which increases the feasibility of correcting such errors in codes over GF(2) by constructing cyclic codes of a given weight. In Section III, a method is given for constructing quasi-cylic codes over GF(pk) which will correct multiple bursts of errors.

30 citations



Journal ArticleDOI
TL;DR: The authors' 1959 objections to Simon's 1955 model for the Pareto-Yule-Zipf distributions are valid quite irrespectively of the sign of ρ — 1, so that most of Simon's (1960) reply was irrelevant.
Abstract: We shall restate in detail our 1959 objections to Simon's 1955 model for the Pareto-Yule-Zipf distributions. Our objections are valid quite irrespectively of the sign of ρ — 1, so that most of Simon's (1960) reply was irrelevant. We shall also analyze the other points brought up in that reply.

Journal ArticleDOI
TL;DR: Recognition of a pattern is achieved by comparing the normalized signal values for an unknown pattern with sets of stored values representing known patterns; a “best match” of such values provides a basis for recognition of imperfect patterns in the presence of noise and measurement error.
Abstract: Signals representing coefficients in a series approximation to the two-dimensional density function represented by a pattern are directly measured by focusing an image of the pattern on a small number of weighted masks with a photocell behind each, or alternatively by passing the image of the pattern past a small number of weighted slits. The functions weighting the masks or slits are determined so as to enable subsequent normalization of the signals to make them independent of the incidentals of the configuration of the pattern; such incidentals may include variations in position, density, scale, orientation, and viewing perspective. Finally, recognition of a pattern is achieved by comparing the normalized signal values for an unknown pattern with sets of stored values representing known patterns; a “best match” of such values provides a basis for recognition of imperfect patterns in the presence of noise and measurement error. An example of one application of the method is discussed, as are the results of a computer simulation of a character reader based on the application.

Journal ArticleDOI
TL;DR: The present “Reply” refutes the almost entirely new arguments introduced by Dr. Mandelbrot in his “Final Note,” and demonstrates again the adequacy of the models in 1955.
Abstract: Dr. Mandelbrot's original objections (1959) to using the Yule process to explain the phenomena of word frequencies were refuted in Simon (1960) , and are now mostly abandoned. The present “Reply” refutes the almost entirely new arguments introduced by Dr. Mandelbrot in his “Final Note,” and demonstrates again the adequacy of the models in 1955.

Journal ArticleDOI
TL;DR: Dr. Mandelbrot has proposed a new set of objections to my 1955 models of the Yule distribution that are invalid, and these are invalid.
Abstract: Dr. Mandelbrot has proposed a new set of objections to my 1955 models of the Yule distribution. Like his earlier objections, these are invalid.

Journal ArticleDOI
TL;DR: Theorems 1, 4, and the first par t of Theorem 3, however, carry over without difficulty to the more general setup described b y Feinstein (1958) and Wolfowitz (1961) where the output alphabet belongs to any space with a given Borel field.
Abstract: Since we use the s tandard terminology of coding and information theory as it can be found in Feinstein (1958) or Wolfowitz (1961) we shall be brief in describing the setup. Consider a situation where a sender can t ransmit n symbols over a (noisy) channel s. The symbols are to be chosen from an input alphabet which is assumed to be the set [1, 2, . -. , a} for all channels under consideration. The channel s may be any one f rom a given set S and remains the same for all n letters (this is the meaning of the te rm compound channel, this name being introduced in Wolfowitz, 1961). The choice of the transmission channel cannot be influenced by sender or receiver but in some circumstances (el. Section I I I ) may be known to one or bo th of them. The symbols received by the receiver belong to an output alphabet which may depend on s but which (by definition of the te rm semieontinuous) m a y be infinite. In order to make life easier we assume the output alphabet to be the set of integers for all s e S. Theorems 1, 4, and the first par t of Theorem 3, however, carry over without difficulty to the more general setup described b y Feinstein (1958) and Wolfowitz (1961) where the output alphabet belongs to any space with a given Borel field. If a sequence u = ( i l , • • • , in) of n letters is t ransmit ted, the received sequence of n letters, say, v ( u ) = ( Y l ( u ) , . . . , Y n ( u ) ) is a random variable. We assume the channels in S to be stationary, memoryless, and without anticipation, i.e., there exist channel probabil i ty functions

Journal ArticleDOI
TL;DR: My criticism has not changed since I first had the privilege of commenting upon a draft of Simon 1955.
Abstract: My criticism has not changed since I first had the privilege of commenting upon a draft of Simon 1955.

Journal ArticleDOI
TL;DR: A new burst error detecting code is described which has the form of the sum codes in that the check bits are determined from the algebraic sum of suitably weighted information bits.
Abstract: A new burst error detecting code is described which has the form of the sum codes in that the check bits are determined from the algebraic sum of suitably weighted information bits. With the use of approximately k + log 2 ( n/k ) redundancy bits,where n is the number of information bits, the resultant code will detect all bursts of errors of length k or less in any channel and will also be a perfect error detection code in a completely asymmetric channel.

Journal ArticleDOI
TL;DR: Processes of this type possess some interesting and novel aspects and present some complex analytic and computational questions.
Abstract: Consider a discrete stochastic control process in which the state of the system at time n is specified by the state vector χn, the control vector is yn, and the change of state is determined by the relation x n + 1 = g ( x n , y n , r n ) , x 0 = c , where rn is a sequence of independent random vectors with common known distribution. If we ask that feedback control, as represented by the yz, be applied in such a way as to minimize the expected value of some prescribed function of the terminal state χN, a straight-forward dynamic programming treatment yields an algorithm for the solution. Suppose we now consider the foregoing process with the added feature that there is a chance at any particular stage that the true state of the system will not be known to the decision maker. We shall call this an interrupted stochastic control process. The problem of determining optimal control in a situation of this type was posed to us by J. Craig of The RAND Corporation. As we shall see, processes of this type possess some interesting and novel aspects and present some complex analytic and computational questions.

Journal ArticleDOI
TL;DR: Two rules of derivation are exhibited and shown to yield valid metalinguistic theorems concerning phrase structure Grammars (type 2 or context-free grammars, in Chomsky's notation).
Abstract: Two rules of derivation are exhibited and shown to yield valid metalinguistic theorems concerning phrase structure grammars (type 2 or context-free grammars, in Chomsky's notation).

Journal ArticleDOI
G. Herdan1

Journal ArticleDOI
Satosi Watanabe1
TL;DR: The method of informationtheoretical correlation analysis (Watanabe, 1960) (hereafter referred to as ITCA) provides a powerful tool in producing mechanizable models of a certain type of cognitive and recognitive processes, such as concept formation, formation of association, pattern recognition, indexing, taxonomical and other classification, identification of "c lus ters ," medical diagnosis, etc.
Abstract: The aim of this note is to point out that the method of informationtheoretical correlation analysis (Watanabe, 1960) (hereafter referred to as ITCA) provides a powerful tool in producing mechanizable models of a certain type of cognitive and recognitive processes, such as concept formation, formation of association, pattern recognition, indexing, taxonomical and other classification, identification of \"c lus ters ,\" medical diagnosis, etc. One of the advantages of the present method over the competing methods stems from the additivity of the quanti ty called \"correlation\" (Watanabe, 1960). Namely, all kinds of correlation, including two-object relations such as similarity and dissimilarity as well as characteristically more-than-two-object relations such as \"exclusionof-the-third,\" can M1 be added on an equal footing to form the total sum of correlation, which is a constant and characteristic of a given set of objects. Conversely, the constant total correlation in a given set of objects can be decomposed into various portions bearing clear meanings, according to the usage to which the method is applied. I t is hereby assumed tha t a set of objects is given, together with the information as to whether each element of the set possesses or does not possess each of a given set of properties. The method consists of extracting a set of multivariate probabilities from this list of data, and applying the method of ITCA to these probabilities. Starting from these data, one can further pass on to a set of object groups and a set of group properties (emergent properties) and apply the same method to analyze the relation among the object groups. Let X = {x~},i = 1, 2 , . . m, be a set of objects and Y = {yj}, j = 1, 2, . . . n, be a set of predicates, such tha t each predicate Ys can be meaningfully applied to each object x~, either affirmatively or negatively. The objective-predicate matrix T is an (m X n)-matrix, whose

Journal ArticleDOI
TL;DR: Lower bounds are obtained for | M(n, d) | for special values of n and d for maximal set of n-place binary sequences such that the Hamming distance between any two sequences of the set is at least d.
Abstract: M(n, d) denotes a maximal set of n-place binary sequences with entries 0 and 1 such that the Hamming distance between any two sequences of the set is at least d.| M(n, d) | denotes the number of sequences in the set M(n, d). In this paper we obtain some lower bounds for | M(n, d) | for special values of n and d. The results are better than the known results due to Gilbert.

Journal ArticleDOI
P. M. Lewis1
TL;DR: It is shown that networks consisting of diodes in the form of “and” gates and resistors in the forms of summation elements can realize any probability approximation satisfying the maximum entropy criterion.
Abstract: Frequently engineering considerations place limitations on the size of decision making systems and on the resources of the system designer. The pertinent high order probability distributions may be unknown and it may not be possible to measure and/or store these distributions in their entirety; some type of approximation is then necessary. One type of approximation that has been studied previously involves measuring and storing several of the lower order component distributions and using these to approximate the high order distribution, using the criterion of maximum entropy. This note considers the related realization problem for binary distributions. By a realization of such an approximation is meant a physical network with the following properties: (1) Its inputs are the variables on which the decision is to be based. (2) Stored within it are the lower order component distributions on which the approximation is to be based. (3) Its outputs (one for each possible decision) are approximations to the high order distributions sufficient to make the decisions. The realization problem for maximum entropy approximations is particularly simple because of their functional form. Maximum entropy distributions are always products of functions of the variables in the given component distributions. Therefore, the logarithms of these distributions are sums of functions of these same variables and hence can be easily realized. It is shown that networks consisting of diodes in the form of “and” gates and resistors in the form of summation elements can realize any probability approximation satisfying the maximum entropy criterion.

Journal ArticleDOI
TL;DR: As a contribution to the mathematics of the philosophy of psychology, explicata are obtained for the net amount of deciding contained in a mental event, F, in favor of an act or of a class of acts.
Abstract: As a contribution to the mathematics of the philosophy of psychology, explicata are obtained for (1) the net amount of deciding contained in a mental event, F, in favor of an act or of a class of acts; (2) the decisionary effort contained in F, with respect to a class of acts. It is found, for example, that much deciding can be done effortlessly, and on the other hand a small amount of deciding can consume a lot of effort. A by-product of the discussion is a contribution to the axiomatics of what Kullback calls the “divergence” between two probability distributions. The meanings of “decision” and “conclusion” are briefly considered. My purpose throughout is clarification only, not specific application.

Journal ArticleDOI
TL;DR: It is thought that this short-cut procedure may be of value to scanning-type logical computers for working with large numbers of variables.
Abstract: A simple scanning-type binary logical computer starts by assigning the truth-value 0 to all the variables (propositions) and systematically executes a binary count until the truth value 1 is assigned to all variables. With n variables a scan of 2n steps is then needed. At each step, all the logical constraints usually are tested simultaneously and only when all are satisfied is a solution to the problem obtained. The principle, and the machine, described here give a greatly compressed procedure; they are such that the constraints are examined individually, not simultaneously; if one or more is unsatisfied, a simple criterion is acted upon which greatly reduces the total number of steps required to find the first solution or any number of solutions. A further criterion is established when the last solution has been found, and the machine may be stopped. It is thought that this short-cut procedure may be of value to scanning-type logical computers for working with large numbers of variables.

Journal ArticleDOI
TL;DR: A computational method is described which will generate codes of any given word length correcting any arbitrarily chosen set of error patterns, and the resulting codes have been compared to other known codes.
Abstract: The problem of constructing systematic error correcting codes has been stated as follows, “Construct a group code such that each word representing an error pattern to be corrected lies in a separate coset.” A computational method is described which will generate such codes of any given word length correcting any arbitrarily chosen set of error patterns. The method suggested by Sacks (1958) turns out to be a special case of the method here described, where the set of error patterns are the set of all n -tuple errors. Codes having up to 10 check bits have been constructed using this method for correcting double and triple errors as well as burst-type errors of 3 digit width. The computation for each code took an LGP-30 computer 3 to 4 hr. The resulting codes have been compared to other known codes.

Journal ArticleDOI
TL;DR: This work introduces the so-called jump-shift register codes, single error-correcting codes for p, a prime for which 2 has multiplicative order p — 1, which are placed in a pseudo-cyclic setting and are easily encodable and decodable.
Abstract: Error correcting codes of all ( k, p ) group codes ( p odd), i.e., linear mappings of k -tuples of zeros and ones into p -tuples of zero and ones, are viewed as a purely algebraic problem. This problem concerns the zeros of certain polynomials on p th roots of unity. These polynomials are parameterized via elements of subgroups of the smallest field containing the p th roots of unity. We introduce, in addition, the so-called jump-shift register codes. These are (( p + 1)/2, p ) single error-correcting codes for p , a prime for which 2 has multiplicative order p — 1. These noncyclic codes are placed in a pseudo-cyclic setting and are easily encodable and decodable.

Journal ArticleDOI
TL;DR: By using a somewhat different approach, one can reduce the problem of finding opt imal policy for an in ter rupted stochastic control process t o the same problem for a nonin te r rup ted control process hav ing a larger n u m b e r of states.
Abstract: The not ion of an in te r rupted control process in t roduced in the pape r by Belhnan and Ka laba is a very significant one, since it subs tant ia l ly enlarges the class of control processes which can be t rea ted by the techniques of dynamic programming. B y apply ing the principle of opt imal i ty , Bel lman and Ka laba arrive at a functional equat ion with an implicit s t ructure which, as t hey observe, is different f rom the usual funct ional equat ions of dynamic p rogramming . The substance of our remark is t ha t by using a somewhat different approach which is sketched in the sequel, one can reduce the problem of finding opt imal policy for an in ter rupted stochastic control process t o the same problem for a nonin te r rup ted control process hav ing a larger n u m b e r of states. The simplification, however, is largely conceptual ix nature , and we do not claim tha t it reduces computa t iona l labor. More specifically, consider the same type of process as is t rea ted in the paper by Bel lman and Kalaba , and let p ( x t + l / x t , Y t ) , t ~0,1, • • • , N 1, denote the condit ional dis tr ibut ion of xt+i ,t the state a t t ime t + l , given x t , the state at t ime t, and y t , the input at t ime t. We assume tha t bo th xt and yt range over finite sets, xt = q l , " ' " , q~ and g, = a l , • • " , a s , t -0,1, • • • , N. The criterion function, C, is t aken to be the expected value of a reward funct ion h defined on the states at t ime N ; i.e., C = E{h(x~0}. Fur thermore , the probabi l i ty of nonobserr a t i o n of xt at t ime t (t = 0,1, . . . , N ) is assumed to be a fixed cons tant p. Let us enlarge the state space of the process by adding to the set of s tates {q~, . . , q~}, all states of the form (q~ ; a~,, a ~ , . . , a~k) 1 =<_ /c = N l , i = 1, . , n, where the symbol (q~ ; a , , . . . , a~-k)

Journal ArticleDOI
TL;DR: The notions of “rank” and “nullity” are introduced into coding theory as possible new tools and various results demonstrate their relevance and suggest their potential utility.
Abstract: The notions of “rank” and “nullity” are introduced into coding theory as possible new tools. Various results demonstrate their relevance and suggest their potential utility.