scispace - formally typeset
Search or ask a question

Showing papers in "Biometrics in 1961"


Journal Article•DOI•

5,406 citations


Journal Article•DOI•

1,803 citations


Journal Article•DOI•
TL;DR: In this article, the authors define a family of special cases of the Gompertz curve, including the exponential curve (6 → 0 through positive values), the logistic curve, and the exponential exponential curve with a fixed and a linear function of 0.
Abstract: Special cases were originally proposed by Putter [1920] for various types of animal growth (see e.g. von Bertalanffy [1957]), and recently Richards [1959] has exemplified the general form of the curves and suggested that they may be useful for the empirical description of plant growth. For further details of the history of these curves and their mathematical properties reference should be made to Richards' paper. It suffices to say here that the family defined by (1) includes as special cases several curves which have been used empirically for the description of growth, including the 'monomolecular' (diminishing returns) curve (6 = -1), the exponential curve (6 -> 0 through positive values), the logistic curve (6 = 1), and the Gompertz curve (6 a) with A fixed and K a linear function of 0).

392 citations


Journal Article•DOI•

291 citations


Journal Article•DOI•
TL;DR: This paper investigated the relationship between these three correlations on the basis of a linear model, and demonstrates the situations in which this explanation is correct, and showed that the explanation is the result of a negative environmental correlation in the records of the two traits.
Abstract: and in many instances the estimate of a phenotypic correlation is reported smaller in magnitude than that of the corresponding genetic correlation, e.g. with certain poultry records, in Lerner & Cruden [1948], sheep records in Morley [1951] and with certain dairy records in VanVleck [1960] and Searle [1961]. Such results may seem a little unexpected at first sight since phenotype includes genotype and one might anticipate the correlation between phenotypes to be larger than that between genotypes. When estimates have not followed this pattern the explanation is sometimes given that a phenotypic correlation less than a genetic correlation is the result of a negative environmental correlation in the records of the two traits. This paper investigates the relationship between these three correlations on the basis of a linear model, and demonstrates the situations in which this explanation is correct. Other comparisons are also made.

253 citations


Journal Article•DOI•
TL;DR: The class of augmented experimental designs known as augmented designs was introduced by the author in 1955 to fill a need arising in screening new strains of sugar cane and soil fumigants used in growing pineapples as mentioned in this paper.
Abstract: One of the principal problems in plant breeding and in biochemical research of new pesticides, herbicides, soil fumigants, drugs, etc., is the evaluation of the new strain or chemical. Efficient experimental designs and efficient screening procedures are necessary in order to make the most efficient use of available resources. In some instances sufficient material of a new strain or a new chemical is available for only one or two observations (plots). Hence, the experimenter should use an experimental design and a screening procedure suitable for these conditions. In other cases, the experimenter may wish to limit his observations to a single observation on the new material. In still other cases (e.g., in physics), a single observation on new material may be desirable because of relatively low variability in the experimental material. Furthermore, it may be desired to combine screening experiments on new material and preliminary testing experiments on promising material. The experimental design should be selected to meet the requirements of such experiments rather than selecting the material and experiments to meet the requirements of the experimental design. The experimnental designs described in the present paper were developed to satisfy requirements such as those described above. The class of experimental designs known as augmented designs was Introduced by the author in 1955 to fill a need arising in screening new strains of sugar cane and soil fumigants used in growing pineapples2 (Federer [1956a, 1956b, 1956c, 1958]). An augmented experimental design is any standard design augmented with additional treatments in the complete block, the incomplete block, the row, the column, etc.

212 citations


Journal Article•DOI•
TL;DR: In this paper, the authors considered some aspects of the classification problem when the data are qualitative, each measurement taking only a finite (and usually small) number of distinct values, which they shall call states.
Abstract: Since 1935, when Fisher's discriminant function appeared in the literature, methods for classifying specimens into one of a set of universes, given a series of measurements made on each specimen, have been extensively developed for the case in which the measurements are continuous variates. This paper considers some aspects of the classification problem when the data are qualitative, each measurement taking only a finite (and usually small) number of distinct values, which we shall call states. Our interest in the problem arose from discussions about the possible use of discriminant analysis in medical diagnosis. Some diagnostic measurements, particularly those from laboratory tests, give results of the form: -, + (2 states); or -, doubtful, + (3 states); or (with a liquid), clear, milky, brownish, dark (4 states). With qualitative data of this type an optimum rule for classification can be obtained as a particular case of the general rule (Rao, [1952], Anderson, [1958]). The rule is exceedingly simple to apply (Section 2). In practice, qualititative data are frequently ordered, as with -, doubtful, +. The classification rule discussed in this paper takes no explicit advantage of the ordering, as might be done, for instance, by assigning scores to the different states so as to produce quasi-continuous data. The best method of handling ordered qualitative data is a subject worth future investigation.

99 citations


Journal Article•DOI•
TL;DR: In this article, the authors used the terminology of the medical follow-up study as a matter of convenience, where a group of individuals with some common morbidity experience are followed from a well-defined zero point, such as date of hospital admission.
Abstract: Statistical studies falling into the general category of life testing and medical follow-up have as their common immediate objective the estimation of life expectation and survival rates for a defined population at risk. Usually such a study must be brought to a close before all the information on survival (of patients, electric bulbs, automobiles, etc.) is complete, and thus the study is said to be truncated. Whether the investigation is basically concerned with life testing or with medical follow-up, the nature of the problem is the same, although differences in sample size may call for different approaches. Thus methods developed for life testing may be applied to follow-up studies when the underlying conditions are met, and vice versa. In this study cancer survival data utilizing a large sample will be used as illustrative material, and we shall accordingly use the terminology of the medical follow-up study as a matter of convenience. We are concerned then with a typical follow-up study in which a group of individuals with some common morbidity experience are followed from a well-defined zero point, such as date of hospital admission. Perhaps we wish to evaluate a certain therapeutic measure by comparing the expectation of life and survival rates of treated and untreated patients. Or we may wish to compare the expectation of life of treated and presumably cured patients with that of normal persons. When the period of observation is ended, there will usually remain a number of individuals on whom the mortality data in a typical study will be incomplete. Of first importance among these are the 'This study was completed while the author was a Special Research Fellow of the National Heart Institute, Public Health Service, U. S. Department of Health, Education and Welfare. 2Parts of this paper were presented at the joint meeting of the American Statistical Association, the Institute of Mathematical Statistics, and the Biometric Society in Washington, D. C., December 29, 1959.

95 citations


Journal Article•DOI•
TL;DR: To estimate the density of a specific organism in a suspension, a common method is to form a series of dilutions of the original suspension, then from each dilution a specified volume, hereafter referred to as a dose, is placed in each of several tubes.
Abstract: To estimate the density of a specific organism in a suspension, a common method is to form a series of dilutions of the original suspension. Then from each dilution a specified volume, hereafter referred to as a dose, is placed in each of several tubes. Later the tubes are examined for evidence of growth of the organism. Suppose k + 1 dilutions are used with concentrations of zi = a-'zoI i = 0, + 1, +2, * , + k. z0 is the highest concentration of original suspension used and a > 1 is the dilution factor. The following notation will be used:

88 citations


Journal Article•DOI•
TL;DR: In this article, a two-phase regression of a growth measurement y on time x can often be reasonably represented by two intersecting straight lines, one being appropriate when x takes values below and the other when X takes values above a certain fixed but often unknown value corresponding to the intersection, and the value of x at which it occurs being called the changeover value.
Abstract: The regression of a growth measurement y on time x can often be reasonably represented by two intersecting straight lines, one being appropriate when x takes values below and the other when x takes values above a certain fixed but often unknown value corresponding to the intersection. Such regressions are here called two-phase regressions, the intersection of the phases being referred to as the changeover point, and the value of x at which it occurs being called the changeover value. Situations in which such regressions might occur include the onset of a disease resulting in a reduced growth rate; the application of a treatment having an immediate stimulating or inhibiting effect; the occurrence of an extremely hot or cold day or some other change in external conditions; physical injury of an organism. In a study of the compatibility of peach scions on plum rootstocks Garner and Hammond [1938] noted that the peach variety Hale's Early developed at constant but different rates on compatible and incompatible rootstock-scion unions up to a certain date. After that date the growth rate in the compatible case continued at a new constant rate, whilst in the incompatible case all growth then ceased. Thus for a compatible union there was a typical two-phase regression, whilst for the incompatible union a rather special case occurred in which the slope of the second phase was zero. A further example is given in Section 4 in which the date of phase change in relation to time elapsed after application of treatments is of interest. If x and y are growth measurements on two different parts of the same organism, and Huxley's allometric growth law operates (i.e. there is a linear relation between log x and log y) it has sometimes been found that sudden changes in slope occur. Reeve and Huxley [1945] have discussed this situation with reference to changes in growth equilibrium in crustacea at sexual maturity. Skellam et al. [19591 also

75 citations



Journal Article•DOI•
TL;DR: A number of nonparametric tests based on ranks have been proposed for the comparison of treatments in a completely random design as mentioned in this paper, including the Wilcoxon-Mann-Whitney test and the rank test with per-comparison error rate.
Abstract: A number of nonparametric tests based on ranks have been proposed for the comparison of treatments in a completely random design. For example, we have the Wilcoxon-Mann-Whitney test [21, 10], basically a two-sample test with a per-comparison error rate. Also, Kruskal and Wallis [9] have proposed a rank test which is an analogue of Snedecor's F-test. This test provides evidence concerning the presence of real differences but is of limited use in locating them. Steel [16, 17] has presented rank tests for comparing treatments against control and for all pairwise comparisons. Both of these tests use experiment-wise error rates. Pfanzagl [13], as part of a more general theory, has discussed a two-step nonparametric decision process based on ranks, for testing the null hypothesis that lk samples come from the same population and, if this is rejected, for deciding which one of the samples comes from a different population. No tables are given but it is suggested that they might be obtained by random sampling. It is also shown that the limiting distribution of the multivariate criterion is multinormal. The per-comparison error rate test is sometimes criticized, particularly when all possible paired comparisons are made, because it will almost certainly lead to false declarations of significance when the experiment includes many treatments and if customary significance levels are used. It is also deemed inappropriate when the experiment is considered to be the conceptual unit. The experimentwise error rate test is sometimes criticized because it

Journal Article•DOI•
TL;DR: In this article, the authors considered the maximum likelihood estimation of repeatability and heritability from records subject to culling and found that the more usual regression estimators are often very inefficient compared with these maximum likelihood estimators.
Abstract: In any animal breeding selection programme, estimates of repeatability and heritability are needed to choose between the various selection schemes available and also to ensure that the highest possible genetic gains are obtained from the chosen scheme. Estimates of repeatability and heritability are generally subject to large sampling errors. Therefore, the most efficient methods of estimation should be used even if they do involve rather lengthy computations. In this paper, the maximum likelihood estimation of repeatability and heritability from records subject to culling will be considered. The more usual regression estimators are often very inefficient compared with these maximum likelihood estimators. Suppose that we wish to estimate the repeatability of lactation yield in a herd of dairy cattle. We shall assume that only first and second lactation yields are available and that, if there had been no culling (i.e., if all the cows had had second records as well as first records), the first and second records would have been normally distributed over the herd with means i,u and u2 , variances oC and o? and covariance between the two records of the same cow 12 . The first and second records of cow i will be written yi, and Yi2 respectively. The assumption of normality for the distributions will probably be a reasonably good approximation unless the herd can be split into groups so that any two cows in the same group are much more alike than two cows in different groups. These groups may, for example, be groups of daughters of the same sire or groups of cows according to the year in which they gave their first record or the month in which they calved. Methods are available for making allowances for such groupings, but they will be assumed absent in the rest of this paper. Very rarely will the culling


Journal Article•DOI•
TL;DR: The Poisson Pascal distribution as mentioned in this paper is a natural complement of the Poisson Binomial and has been shown to be able to describe the situations which occur in a number of biological phenomena.
Abstract: Elementary distributions such as the Poisson, the Logarithmic and the Binomial which can be formulated on the basis of simple models have been found to be inadequate to describe the situations which occur in a number of phenomena. The Neyman Type A (cf. Evans [5]), the Negative Binomial (cf. Bliss and Fisher [3]), and the Poisson Binomial (cf. McGuire et al. [8]), which combine two of the elementary distributions through the processes of compounding and generalizing (cf. Gurland [7]), have been fitted with varying degrees of success to data from a number of biological populations. The aim of this paper is to study what may be called the Poisson Pascal distribution which includes the Neyman Type A and Negative Binomial as particular limiting cases and serves as a natural complement of the Poisson Binomial.

Journal Article•DOI•
TL;DR: In this paper, the authors investigate the nature and magnitude of the correlation coefficient p, and compare the efficiencies of the various techniques with respect to data from a population of poultry, and conclude that the weighted technique is close to optimal.
Abstract: The problem of optimal estimation of the coefficient of regression of offspring on parent (in the sense of minimum variance), when the number of progeny per parent is arbitrary, was completely solved by Kempthorne and Tandon [1953]. Prior to this paper, two methods were (and still are) commonly used: (1) the regression of the phenotypic mean of all offspring of a given parent on the parent's record; (2) the regression of offspring on parent, in which the parent's record is repeated for each of its progeny. Kempthorne and Tandon's technique, which they refer to as (3) the weighted regression technique, assigns weights to the progeny means which are functions of the number of progeny and a guessed value of a correlation coefficient p between deviations from regression associated with two progeny of the same parent. The difficulty here lies in the fact that p is unknown. The success of the general technique depends upon guessing p accurately. Presumably if the guessed value of p is close to p, the weighted technique is close to optimal. The precise effect of a poor guess for p does not seem to be known. The purposes of this paper are (1) to investigate the nature and magnitude of the correlation coefficient p, and (2) to compare the efficiencies of the various techniques with respect to data from a population of poultry.


Journal Article•DOI•
TL;DR: In this paper, it is shown that the information contained in an analysis of experimental data is the sum of the a priori information built into the statistical model employed and the a posteriori information contained within the data itself.
Abstract: It is a fundamental fact of statistical inference that the information contained in an analysis of experimental data is the sum of the a priori information built into the statistical model employed and the a posteriori information contained in the data itself. For this reason the biometrician must be concerned not only with the efficiency of his estimation procedures but also with the adequacy of his descriptive model. Much care must be taken to ensure that all relevant a priori information is utilized in the construction of the model. In some cases it is possible to derive rather sophisticated theoretical models on the basis of acquired knowledge and intelligent hypothesizing. These models are often conveniently found as solutions of differential equations. In other cases little more may be known than that the biological process in question is continuous. In this latter case one may resort to polynomial models where the degree of the polynomial is either found empirically or by prior consideration of the number of "bends" which one can reasonably assume to take place in the process being studied. In certain other cases it may be known that the process approaches some asymptotic value. This is especially true in those cases known as "growth" processes. The general ineptness of polynomial models for purposes of describing such asymptotic situations has been repeatedly pointed out, although polynomials in the reciprocals may sometimes be used conveniently. Stevens [1951] and Pimentel-Gomes [1953], writing in this journal, have discussed inferential methods related to one form of transcendental asymptotic model, the so-called exponential model,


Journal Article•DOI•
TL;DR: In this article, the analysis for unbalanced data from a two-way classification with one classification considered fixed is presented, and simplified computing procedures are presented suitable for a large number of levels in the random classification and a reasonable number of level in the fixed classification.
Abstract: Methods for estimating variance components from unbalanced data are given, in Henderson [1953] for both the random model and the mixed model. The calculations involved in the latter case are somewhat tedious and usually not computationally feasible for data having many classes. This paper outlines the analysis for unbalanced data from a two-way classification with one classification considered fixed. Simplified computing procedures are presented suitable for a large number of levels in the random classification and a reasonable number of levels of the fixed classification.




Journal Article•DOI•
TL;DR: A wider class of host-pathogen systems will be considered and some assumptions will be relaxed, including the assumptions of random association of the host and pathogen, equal number of varieties of thehost and races of thepathogen, and constant fitness functions will be dropped.
Abstract: This paper is a sequel to a paper, "A Model of a Host-pathogen System with Particular Reference to the Rusts of Cereals", which appeared in Biometrical Genetics [1960]. The former paper was restricted to the summer stage of the rusts of cereals, but in this paper a wider class of host-pathogen systems will be considered and some assumptions will be relaxed. Specifically, the assumptions of random association of the host and pathogen, equal number of varieties of the host and races of the pathogen, and constant fitness functions will be dropped. The results of this paper are intended to apply to host-pathogen systems satisfying the following conditions: 1. The pathogen reproduces on the host.. 2. The host may be differentiated into varieties on the basis of its resistance to the races of the pathogen. 3. The pathogen may be differentiated into races on the basis of its ability to grow on a set of host varieties. 4. Host resistance to a particular race of the pathogen is genetically controlled. 5. The damage to the host caused by the pathogen in a given time interval is directly related to the increase in number in the pathogen population during the given time interval. It should be pointed out that conditions (1) and (4) imply that no assumptions are made with respect to the mode of reproduction of the pathogen and the mode of inheritance of host resistance to the pathogen. Many economic crop plants and their foliar diseases caused by pathogenic fungi are examples of host-pathogen systems satisfying the above conditions. These host-pathogen systems are characterized by frequent shifts in the racial frequencies of the pathogen population, making it difficult to maintain host resistance to the pathogen. It seems plausible that damage to the host in such host-pathogen systems

Journal Article•DOI•
TL;DR: A method devised by Brodman et al.
Abstract: In recent years, several methods have been proposed for making medical diagnoses by machine (Ledley and Lusted [1959], Crumb and Rupe [1959]). A method devised by Brodman et al. [1959, 1960] has been used to program a high-speed electronic computer for making presumptive medical diagnoses using only information relating to the age, sex, and responses of patients to a standardized health questionnaire. The method assigns patients to none, one, or several of 60 selected disease categories. The 60 diseases most frequently diagnosed by hospital physicians in men and the 60 diseases in women were chosen for study. The method was developed with data referring to 5,929 consecutive adult white patients (2,718 men and 3,211 women) admitted to the outpatient departments of The New York Hospital, a large general hospital, during the 18-month period beginning July 1, 1948. It was tested with data referring to 2,745 consecutive adult white patients (1,280 men and 1,465 women) admitted during the 12-month period beginning January 1, 1956. Each patient's symptoms were elicited through a printed form, the Cornell Medical Index-Health Questionnaire (CMI). The CMI was devised to collect diagnostically important elements of the medical history given by general medical patients, without expenditure of a physician's time. Solely with these data, a physician can often correctly predict which diseases will be found in subsequent examination (Brodman et al. [1951]). Additional data abstracted for analysis from the hospital records include each patient's sex and age, along with the diagnoses made by hospital physicians after eliciting a history and performing physical and laboratory examinations. These diagnoses for the 1948-1949 data were coded according to the U. S. Public Health Service Manual for Coding Causes of Illness [1944] and form the standard against which the method is evaluated.

Journal Article•DOI•
TL;DR: In this article, two methods that have already been proposed for testing for non-additivity are developed from a somewhat different viewpoint, in an attempt to clarify their properties; and a generalization is given of a method that has been suggested for finding an appropriate transformation of the data.
Abstract: The problem relating to additivity in the analysis of variance is twofold. In the first place we wish to know whether we can remove any of the non-additivity present in our data, and in the second place we wish to know, given that it can be done, how to do so. Here two methods that have already been proposed for testing for non-additivity are developed from a somewhat different viewpoint, in an attempt to clarify their properties; and a generalization is given of a method that has been suggested for finding an appropriate transformation of the data.

Journal Article•DOI•
TL;DR: In this article, Neyman et al. set up formulae for certain statistics for descrete distributions of three types: Type A: gA(z) = exp {h(z), Type B: gB(Z) = {h (z) 1', Type C: gc(z)) = c log {H(z}) 1', where h(z ) is a probability generating function (p.g.f.).
Abstract: Families of descrete distributions have been developed and studied by many authors, including, Neyman [1939], Feller [1943], Skellam [1952], Beall and Rescia [1953] and Gurland [1957, 1958]. These families are of three types: Type A: gA(z) = exp {h(z)}, Type B: gB(Z) = {h(z) 1', Type C: gc(z) = c log {h(z)}, where g(z) represents a probability generating function (p.g.f.) and h(z) is a p.g.f., except possibly for additive and multiplicative constants. The aim of this paper is to set up formulae for certain statistics for these types. It is hoped that these will be of use to reseach workers in practical fields, who will be formulating compound and generalised distributions of these types by using specific forms of h(z).


Journal Article•DOI•
TL;DR: In this article, the analysis of this type of data when it can be arranged in a multiway classification as for example, a factorial arrangement is given in Table 1, where the treatment effects are investigated with regard to both main effects and interactions.
Abstract: Data collected in many types of research often consist of the proportion of experimental units having a specified attribute. In this paper we shall be concerned with the analysis of this type of data when it can be arranged in a multiway classification as for example, a factorial arrangement. An example of this type of data is given in Table 1. Several methods may be used in analyzing these data. Regardless of the method, a model relating the proportion having the attribute to the treatments must be assumed for a meaningful interpretation of the experimental results. The problem of the most appropriate model becomes particularly acute when some of the treatments are applied at several levels and it is desired to investigate the nature of treatment effects with regard to both main effects and interactions. If the sample sizes in the cells are equal, the observed proportions are often analyzed by the analysis of variance. A more desirable procedure is to analyze the arc sine transformation of the proportion. It is well known that this transformation stabilizes the variance if the sample sizes are equal and not too small, and, for a large class of data, it provides a unit of measurement on which treatment effects are approximately linear except at values of the proportion near zero or one. In bioassay both the logit and the probit transformations have a long history of use. With appropriate extensions these models can be used in analyzing data of the type being discussed. Since the proportion responding has been found to increase sigmoidally with increasing stimulus for many phenomena, these transformations are particularly effective in providing a scale on which treatment effects are linear. Dyke and Patterson [1952] gave an example of how the logit trans