scispace - formally typeset
Search or ask a question

Showing papers in "Biometrika in 1955"


Journal ArticleDOI
TL;DR: In this paper, the authors analyse a class of distribution functions that appear in a wide range of empirical data-particularly data describing sociological, biological and economic phenomena-and look for an explanation of the observed close similarities among the five classes of distributions listed above.
Abstract: It is the purpose of this paper to analyse a class of distribution functions that appears in a wide range of empirical data-particularly data describing sociological, biological and economic phenomena. Its appearance is so frequent, and the phenomena in which it appears so diverse, that one is led to the conjecture that if these phenomena have any property in common it can only be a similarity in the structure of the underlying probability mechanisms. The empirical distributions to which we shall refer specifically are: (A) distributions of words in prose samples by their frequency of occurrence, (B) distributions of scientists by number of papers published, (C) distributions of cities by population, (D) distributions of incomes by size, and (E) distributions of biological genera by number of species. No one supposes that there is any connexion between horse-kicks suffered by soldiers in the German army and blood cells on a microscope slide other than that the same urn scheme provides a satisfactory abstract model of both phenomena. It is in the same direction that we shall look for an explanation of the observed close similarities among the five classes of distributions listed above. The observed distributions have the following characteristics in common: (a) They are J-shaped, or at least highly skewed, with very long upper tails. The tails can generally be approximated closely by a function of the form

2,630 citations


Journal ArticleDOI

817 citations


Journal ArticleDOI
E. S. Page1

674 citations


Journal ArticleDOI
TL;DR: In this paper, a large-sample test for marginal homogeneity is derived and illustrated, and its solution is a special case of the large sample solution of the more general 2K classification problem given by Cochran (1950).
Abstract: There are several circumstances in which we may wish to test the homogeneity of the two sets of marginal probabilities in a two-way classification. For example, a sample from a bivariate distribution (say height of father, height of son) may be classified into a two-way table with identical (height) groupings in each margin. Or a similar classification may be possible for a non-measurable variable (say strength of right hand - strength of left hand). Again, in surveys of the same sample (a 'panel') on two different occasions, the interrelation of the results on the two occasions may be displayed in a two-way table, with one margin corresponding to each occasion. In all these cases, the question may arise: are the two sets of marginal probabilities identical? If the variable is measurable, we may test the difference between the means of the two marginal distributions by a large-sample standard-error test. However, we may be interested in the overall distributions, rather than only in their means. For the more stringent hypothesis of homogeneity, a test exists if we have two completely independent samples. when an ordinary x2 test of homogeneity may be applied (Cramer, 1946, p. 445). This test does not meet the essentially bivariate situations described above, where non-independence of the marginal distributions is a fundamental feature of the problem. When the classification is a double dichotomy, the problem of testing marginal homogeneity is simple, and its solution is a special case of the large-sample solution of the more general 2K classification problem given by Cochran (1950). Bowker (1948) gave a largesample test for complete symmetry in a two-way classification, a more restrictive hypothesis which is concerned with the entire set of probabilities in the classification, and not only with the marginal probabilities as we are here. In the present paper, a large-sample test for marginal homogeneity is derived and illustrated.

448 citations




Journal ArticleDOI

227 citations



Journal ArticleDOI

209 citations


Journal ArticleDOI

164 citations







Journal ArticleDOI
TL;DR: The problem of estimating the parameters of the truncated Poisson distribution has been studied in this article, where methods for estimation of the parameters have been proposed for different types of truncated data.
Abstract: (1920), Fisher (1941), Haldane (1941), Anscoinbe (1950) and Bliss & Fisher (1953), and is extensively used for the description of data too heterogeneous to be fitted by a Poisson distribution. Observed samples, however, may be truncated, in the sense that the number of individuals falling into the zero class cannot be determined. For example, if chromosome breaks in irradiated tissue can occur only in those cells which are at a particular stage of the mitotic cycle at the time of irradiation, a cell can be demonstrated to have been at that stage only if breaks actually occur. Thus in the distribution of breaks per cell, cells not susceptible to breakage are indistinguishable from susceptible cells in which no breaks occur. Methods for estimation of the parameters of the truncated distribution are considered in this paper. The corresponding problem of estimation of the truncated Poisson distribution has been discussed by David & Johns-on (1952), who also discuss the present problem.

Journal ArticleDOI
M. B. Wilk1
TL;DR: The generalized randomized block (GRB) as discussed by the authors is a generalization of the random block design, and it includes both completely randomized and randomized block designs, with the restriction that each treatment appears with p units in each of the r blocks.
Abstract: Suppose that t treatments are given whose properties (yields, responses, effects, etc.) we wish to compare when they interact with a given set of rs experimental units, the latter being classified into r blocks, each containing s = pt units. Suppose, further, that an experiment is carried out in which the treatments are applied at random to the experimental units, with the restriction that each treatment appears with p units in each of the r blocks. We refer to this design as the generalized randomized block design and note that it includes as special cases the completely randomized design (r = 1, p > 1) and the randomized block design (r> 1, p = 1). The object of this paper is to study the basis for statistical inference which is provided by the randomization procedure.

Journal ArticleDOI
TL;DR: The problem of biserial correlation arises when one is sampling from a bivariate normal population in which one of the variables has beeii dichotomized, giving rise to only two observable values, say 0 and 1, and one wishes to use this dichotomised sample to estimate, or to test hypotheses concerning, the correlation coefficient p of the original bivariate norm distribution.
Abstract: The problem of biserial correlation arises when one is sampling from a bivariate normal population in which one of the variables has beeii dichotomized, giving rise to only two observable values, say 0 and 1, and one wishes to use this dichotomized sample to estimate, or to test hypotheses concerning, the correlation coefficient p of the original bivariate normal distribution. The problem of biserial correlation occurs frequently in psychological work, especially in test construction and validation. The term biserial correlation was introduced by Karl Pearson (1909), who was the first to perceive the statistical importance of this particular type of problem. He proposed as an estimator the sample biserial correlation coefficient. The asymptotic variance of this estimator was derived by Soper (1913). Much literature exists on the subject of how best to compute Pearson's coefficient. In this connexion the reader should see Du Bois (1942), Dunlap (1936) and Royer (1941). Prof. Harold Hotelling realized some years ago that the existing methods for dealing with the problem of biserial correlation were far from satisfactory, and suggested to the author that the whole situation be reconsidered. The results of this examination are contained in the present paper. ? 2 contains a list of most of the notation which has been adopted, and ? 3 deals with the mathematical model. In ? 4 the question of maximum likelihood is treated. Asymptotic variances are derived for the estimators (^O and p. The asymptotic variance for , is compared with the approximate expression arrived at by Maritz (1953) when he considered a somewhat restricted model. Both expressions are shown to achieve their minimum value at ()O = 0 when p is fixed. Matters concerning asymptotic normality and asymptotic efficiency are also considered. An appraisal of r*, the sample biserial correlation coefficient, is given in detail in ? 5. It is shown to have asymptotic efficiency for estimating p which is 1 when p = 0but which approaches 0 when I p I approaches 1. The well-known fact that r* may be greater than I is pointed out and some notion of the magnitude of r* is obtained by a consideration of the product-moment correlation coefficient r. Asymptotic normality of r* is verified by the use of a theorem of Cramer. The asymptotic standard deviation is tabulated at the end of the paper (Table 2). A proof is given for the customarily assumed fact that the asymptotic variance has a minimum for fixed p when w = 0. For the case O = 0 an approximate variance stabilizing transformation is derived. Calculations pertaining to this transformation may



Journal ArticleDOI
TL;DR: In this article, the authors use the negative binomial distribution to detect a mosaic variation in density of plants in a field. But this approach is not suitable for the field of ecology, as it will not give any evidence of trends, or indicate the pattern of the distribution over the area or the way in which this pattern may have arisen, all factors of prime importance in the study of the structure of a plant community.
Abstract: The term 'point processes ', referring to stochastic processes in which events occur at more or less irregular intervals and which are represented by points on the time-axis, is of comparatively recent origin, although the existence of such processes has in fact been well known for a long time. They have been discussed fairly extensively in such diverse applications as the counting of radioactive impulses, telephone calls and cases of contagious diseases. Wold (1949) developed a statistical theory for treating processes of this type, and also mentioned briefly how the events could take place in a two-dimensional or higher field. Such a generalization, from events with no time extension to those with no 'space' extension (i.e. specifically of a point character), has a suitable field of application in the ecological study of the distributional pattern of plants. If we can assume to a first approximation that the plants have the dimensions of a point, then we shall see that it is possible to discuss precisely probability relationships between the numbers of plants in different areas of the region under investigation. The main aims of quantitative ecology are the precise description of a community of plants with interpretations in terms of the biology of the species, and the correlation of vegetational and environmental data, and ecologists have used several methods in an attempt to achieve these aims. In most of the initial work on field sampling for ecological data, the procedure was to take 'quadrats' (sample areas small in relation to the total area of the region) scattered at random over the area, and study statistics derived from the frequency distribution of the numbers of plants per quadrat. While this approach is useful to some extent, in that any given type of distribution function may be fitted to the data, it does not necessarily furnish the kind of information required by an ecologist. It will not give any evidence of trends, or indicate the pattern of the distribution over the area or the way in which this pattern may have arisen, all factors of prime importance in the study of the structure of a plant community. We only have to cite the negative binomial distribution, which is known to arise in at least four different ways, all based on widely differing assumptions, to illustrate this point. In recent years ecologists have become aware of the need for a more satisfactory approach to the problem, and Greig-Smith (1952) provided a potentially great advance on the statistical side when he recommended the use of a grid of contiguous quadrats over some portion or portions of the region. The advantage, of course, in arranging the quadrats in a grid is that the analysis of variance technique may be employed, either for the detection of trends, or, more importantly, for the detection of a mosaic variation in density (due to ecological causes connected with the spread of the plants) by a 'nested sampling' type of analysis of variance, associating the quadrats into successively larger blocks and comparing the component block variances. The details and applications of this method are described at length by Greig-Smith, together with the results from sampling experiments on artificial








Journal ArticleDOI
TL;DR: The problem of estimating directly the spectral density function (or g(A)), previously considered by Daniell (see the discussion to the paper by Bartlett, 1946), Bartlett (1950), and Grenander (1951), will be discussed further in this paper.
Abstract: The problem of estimating directly the spectral density function f(A) (or g(A)), previously considered by Daniell (see the discussion to the paper by Bartlett, 1946), Bartlett (1950) and Grenander (1951), will be discussed further in this paper. Grenander & Rosenblatt (1952, 1954) have recently investigated also the problem of constructing an entire confidence band for the integrated density function g(A); for further suggestions on this problem, which will not be considered here, see Bartlett (1954).