scispace - formally typeset
Search or ask a question

Showing papers in "Biometrics in 1971"


Journal Article•DOI•
TL;DR: A general coefficient measuring the similarity between two sampling units is defined and the matrix of similarities between all pairs of sample units is shown to be positive semidefinite.
Abstract: A general coefficient measuring the similarity between two sampling units is defined. The matrix of similarities between all pairs of sample units is shown to be positive semidefinite (except possibly when there are missing values). This is important for the multidimensional Euclidean representation of the sample and also establishes some inequalities amongst the similarities relating three individuals. The definition is extended to cope with a hierarchy of characters.

4,204 citations


Journal Article•DOI•

1,416 citations


Journal Article•DOI•
TL;DR: In this paper, the authors consider the case of a single quantitative variate and assume that the response, if any, of the variate to the substance is a change in the mean.
Abstract: SUMMARY Information on the biological activity of a substance is often obtained from experiments in which the treatments comprise a series of doses of the substance and a zero dose control. The aim of such experiments, particularly in toxicity stuidies, may be to determine the lowest dose, if any, at which there is activity. A new test procedure is proposed for this situation in the case where the activity of iinterest is a change in the mean of a single response variate. Tables are given whereby this procedure can be used when all treatments are equally replicated. The advantages of this test over other established tests are discussed. In experiments designed to assess the biological activity of a substance the treatments often comprise a series of doses of the substance and also a control treatment equivalent to zero dose. Usually more than one variate is recorded, but statements are required on the effect of the substance on each of the recorded variates, or given functions of them, and separate univariate analyses are appropriate rather than a multivariate analysis. In this paper we consider the case of a single quantitative variate and we assume that the response, if any, of the variate to the substance is a change in the mean. Two distinct situations arise depending on whether the response is desirable or undesirable. If the response is desirable, for example if it is a measure of the efficacy of a drug or a pesticide, then interest usually centres on estimating the dose level at which the response attains a given magniitude. The most appropriate analysis is to fit a dose response regression and to estimate from this the dose at which the regression achieves a certain level, usually defiled in terms of a difference from the control mean. This paper is concerned with the second situation where the response is undesirable. The most common examples of this occur in studies of the

684 citations



Book Chapter•DOI•

392 citations


Journal Article•DOI•
TL;DR: In this paper, a taxonomy of incomplete data problems is provided and a unified method of analysis is developed. The emphasis is on techniques which are natural extensions of the complete-data analysis and which will handle rather general classes of incomplete-data problems as opposed to custom-made techniques for special problems.
Abstract: In this paper, we attempt to provide a simple taxonomy for incomplete-data problems and at the same time develop unified methods of analysis. The emphasis is on techniques which are natural extensions of the complete-data analysis and which will handle rather general classes of incomplete-data problems as opposed to custom-made techniques for special problems. The principle of estimation is either maximum likelihood or is at least based on maximum likelihood.

308 citations


Journal Article•DOI•
TL;DR: In this article, the relative yield total (RTVT) was introduced as a measure of dry matter yield in a mixture of diallel experiments and the standard errors of various interpretive parameters derived.
Abstract: When grown in a mixture, plants of the dominant component show a greater dry matter yield than they do in a monoculture of the same overall density. Plants of the other component usually show a decrease, relative to their own monoculture. Recent studies suggest that the proportional changes are commonly very similar. An earlier analysis, which was based on absolute increases and decreases being approximately the same, is amended in the light of these studies. Central to the new analysis is the concept of Relative Yield Total introduced by de Wit and van den Bergh [1965]. The random error structure of mixture diallel experiments is examined and the standard errors of various interpretive parameters derived. A numerical example is given.

221 citations


Journal Article•DOI•
TL;DR: The method of classification based on minimizing the determinant of the within-group dispersion matrix is discussed from the viewpoint of the practical user and it is shown that these can be resolved to give a satisfactory answer to the question of the number of natural clusters and their bounds.
Abstract: The method of classification based on minimizing the determinant of the within-group dispersion matrix is discussed from the viewpoint of the practical user Certain difficulties arise both in the computation and in the interpretation of the results It is shown that these can be resolved to give a satisfactory answer to the question of the number of natural clusters and their bounds The sampling behaviour of the method when applied to a uniform distribution is investigated, but no exact test of significance is derived

219 citations


Journal Article•DOI•
TL;DR: In this paper, a number of methods are described which utilize spacing distances instead of fixed-area plots for estimating plant densities and the distributions and moments of these estimators are derived.
Abstract: SUMMARY In various journals a number of methods are described which utilize spacing distances instead of fixed-area plots for estimating plant densities. Some of these methods are studied in this paper. Maximum likelihood (ML) estimators for the forest density are given and the distributions and moments of these estimators are derived. Other distance estimators of density have been used in the past, but they do not have the advantages of ML estimators, and it even seems difficult to determine their moments. A distance method has the advantage over a fixed plot method that the sample size does not depend upon the density being measured. There are many practical difficulties in using them, however, and some of these are discussed. The trees in a natural forest will not be distributed uniformly at random, and some care must be exercised in using a distance method. The difficulty can sometimes be overcome by using a stratified sampling method. The model is described in terms of trees and forests, but it is clear that it can be applied to many different types of problem.

206 citations


Journal Article•DOI•
TL;DR: Mass selection is likely to be best for comparing response from alternative selection programmes or populations, and several methods of estimating heritability are compared, of these the realised heritability has least variance.
Abstract: (i) Formulae are derived for the sampling variance of selection response and for estimates of realised heritability and realised genetic correlation. (ii) If a control population is maintained, or divergent selection practised, the greater part of the sampling variance comes from genetic drift and depends primarily on the total number of individuals recorded in the whole experiment, rather than on its duration. (iii) The optimal selection intensity for estimating realised heritabilities is investigated-proportions selected of about 15 per cent should be satisfactory. Similar designs will also be efficient for estimation of realised genetic correlations. (iv) Several methods of estimating heritability are compared, of these the realised heritability has least variance. (v) Some selection indices for improving a single trait are evaluated. Mass selection is likely to be best for comparing response from alternative selection programmes or populations.

185 citations


Journal Article•DOI•
TL;DR: In this paper, the authors consider the case of a single locus with k alleles, the frequencies of which are assumed to be multinomially distributed with mean pi and variance pi(l pi)/n for the ith allele, and covariance -pip1/n for each ith and jth alleles.
Abstract: In connection with distance measures based on attribute data, Balakrishnan and Sanghvi [1968] have criticized the measure of genetic distance 'E' described by Edwards and Cavalli-Sforza [1964], and promoted several other measures as substitutes. Their criticism is, however, based on a misunderstanding of the properties of E, which are in fact precisely of the kind required. Whether one is trying to measure distance between multinomial samples, or between populations with differing gene frequencies supposedly generated by random genetic drift, the problem is to transform the sample or population space so that 'a distance should have the same significance in whatsoever direction and in whatsoever part of the new space it is measured' (Balakrishnan and Sanghvi [1968]; c.f. Edwards and Cavalli-Sforza [1964]). Since in both cases the original distribution is multinomial, or approximately so (Kimura [1955]), we may confine our attention to this distribution. In fact, E was suggested for the genetic case, and not, as indicated by Balakrishnan and Sanghvi, for the case of attribute data. Since there is no disagreement as to how to combine distances from independent loci (to use the genetical terminology), we may base the discussion on the case of a single locus with k alleles, the frequencies of which are assumed to be multinomially distributed with mean pi and variance pi(l pi)/n for the ith allele, and covariance -pip1/n for the ith and jth alleles. In these expressions n may be regarded simply as a constant whose value depends on the precise application. We consider only the case in which the size of each sample, or, in genetical terms, the effective size of each population, is the same. The property stipulated by Balakrishnan and Sanghvi requires a transformation of the multinomial that will be spherically symmetrical and of


Journal Article•DOI•
TL;DR: In this article, data-based transformations of multivariate observations to enhance the normality of their distribution and also possibly simplify the model are proposed, where power transformations of the original variables are estimated to effect both marginal and joint normality.
Abstract: SUMMARY Methods, which are extensions of the techniques of Box and Cox [19641, are proposed for obtaining data-based transformations of multivariate observations to enhance the normality of their distribution and also possibly to simplify the model (e.g. improve additivity, homoscedasticity, etc.). Specifically, power transformations of the original variables are estimated to effect both marginal and joint normality. A method for improving directional normality is also described. Examples are included to illustrate some properties of the methods.


Journal Article•DOI•
TL;DR: In this article, a study involving 121 patients treated for cancer of the breast by surgery and/or X-ray therapy is analyzed, and from the estimates of the parameters of the underlying life distributions (assumed to be Weibull with unequal shape constants) the relevant probabilities in competing risk theory are obtained.
Abstract: Suppose that in a life-testing situation the failure of an individual can be classified into one of k( >1) mutually exclusive classes, usually causes of failure. It is assumed that associated with each cause of failure there is a characteristic life distribution belonging to a specific class of distributions. A general likelihood function is obtained which allows for dependence of the causes, for both censored and uncensored data. The method of maximum likelihood is used to estimate the parameters when the underlying life distributions are independent Weibulls with equal shape constants (the exponential being a special case) and independent Weibulls with unequal shape constants. A study involving 121 patients treated for cancer of the breast by surgery and/or X-ray therapy is analyzed, and from the estimates of the parameters of the underlying life distributions (assumed to be Weibull with unequal shape constants) the relevant probabilities in competing risk theory are obtained.



Journal Article•DOI•
TL;DR: A general iterative formula for the covariance matrix of the adjusted treatment means in block designs is given and its relation to design patterns discussed in this article, where it is shown that the general formula becomes very simple if the design has some degree of balance.
Abstract: SUMMARY A general iterative formula for the covariance matrix of the adjusted treatment means in block designs is given and its relation to design patterns discussed. It is shown that the general formula becomes very simple if the design has some degree of balance. Particularly interesting are those designs that combine features of balance and orthogonal designs. A matrix derived in a simple way from the incidence matrix proves useful in describing some properties of block designs. The simplification of the general formula depends entirely on the pattern of this matrix which also determines the efficiency of the design. Furthermore, its particular relation to treatment contrasts is helpful in designing block experiments of desirable properties. The construction of some useful designs that are simple in analysis and practical in application is illustrated by several examples.


Journal Article•DOI•
TL;DR: The conditions are defined under which collapsing multidimensional contingency tables, by adding over variables, will affect the apparent interaction between the remaining variables, which leads to a simple method of distinguishing those log-linear models for which the cell estimates may be obtained by direct multiplication, from those requiring iterative fitting.
Abstract: The conditions are defined under which collapsing multidimensional contingency tables, by adding over variables, will affect the apparent interaction between the remaining variables. This leads to a simple method of distinguishing those log-linear models for which the cell estimates may be obtained by direct multiplication, from those requiring iterative fitting. The implications of fitting over-parametrized models are discussed with particular reference to the 'partial association' model used implicitly (a) when information from separate two-dimensional tables is combined to test the association between the two variables, and (b) when rates are adjusted by indirect standardization.

Journal Article•DOI•
TL;DR: In this article, the authors presented a double sampling scheme for estimating from binomial data with misclassifications, where a sample of n units was classified by both an expensive true measuring device which made no errors and a relatively inexpensive fallible device which was subject to misclassification errors.
Abstract: Tenenbein [1970] presented a double sampling scheme for estimating from binomial data with misclassifications. At the first stage a sample of n units was classified by both an expensive true measuring device which made no errors and a relatively inexpensive fallible device which was subject to misclassification errors. The estimation of p was discussed, and the optimum sample sizes were derived which minimize the variance of estimation, subject to fixed measurement costs, and the cost, subject to a fixed variance of estimation. The optimum sample sizes depend upon the unknown probabilities of misclassification. In this paper we present practical methods of determining n and N by taking a preliminary sample of m true-fallible data pairs order to estimate the unknown parameters, and thus to estimate n and N. The resulting 3-stage scheme is discussed and recommendations for determining the value of m are made.

Journal Article•DOI•
TL;DR: The statistical model indicated is thus the bivariate structural relationship, for the slope of which confidence limits can now be obtained from convenient general expressions if assumptions are made concerning the ratio X of the so-called 'error variances'.
Abstract: SUMMARY Total body weight XI and standard oxygen consumption X2 of 252 white rats have positively-skewed marginal distributions and heteroscedastic conditional distributions, but Y, = logio Xi and Y2 = log10 X2 do not depart from bivariate normality. While log1o (body weight) does not fluctuate rapidly like log lo (oxygen consumption), it cannot necessarily be assumed to have a small or negligible 'error' component; both variates are therefore assumed to be subject to independent random fluctuations in addition to being related to body size. The statistical model indicated is thus the bivariate structural relationship, for the slope of which confidence limits can now be obtained from convenient general expressions if assumptions are made concerning the ratio X of the so-called 'error variances'. In the present study assumptions ranging from X = 2 to X = 2 are considered as realistic and, under such assumptions, standard oxygen consumption can be inferred to be proportional either to power 2 or to power 3 of total body weight. The non-linear fitting of allometric regressions directly to untransformed data yields point and interval estimates which differ very little

Journal Article•DOI•
TL;DR: In this paper, the authors empirically investigate the relative efficiency of MINQUE over weighted least squares (WLS) estimators (using the sample variance s2 to estimate o-2) and maximum likelihood estimators.
Abstract: A recent method, called MINQUE, is applied to two important problems: (1) combining k independent estimators gi (i = 1, * *, k) of a parameter A, where Yi is the arithmetic mean of ni (>1) observations normally and independently distributed with mean ,u and variance a,; (2) estimating the parameters a and f in the regression model (with replicates) Yij = a + ,xii + eii, j = 1, * , ni, i-1, k * *X, where the xi are known constants and the eii are normally and independently distributed with mean 0 and varianceo2. Two simple modifications of MINQUE which guarantee positive estimates of oa are given. We empirically investigate the relative efficiency of MINQUE over weighted least squares (WLS) estimators (using the sample variance s2 to estimate o-2) and maximum likelihood (ML) estimators. A major conclusion is that MINQUE (with modifications) lead to large gains in efficiency over WLS estimators when ni = m is small and k is relatively large. Another important result is that MINQUE may not lead to substantial gains in efficiency when m is >8, especially for small k. A first order approximation for estimator of variance performed better than that for WLS estimator.

Journal Article•DOI•
TL;DR: In this paper, contingency tables are used for analysis of variance in a mixed model, where the hypothesis of equality of the mean scores over the first-order marginals is investigated.
Abstract: This paper is concerned with contingency tables which are analogous to the well-known mixed model in analysis of variance. The corresponding experimental situation involves exposing each of n subjects to each of the d levels of a given factor and classifying the d responses into one of r categories. The resulting data are represented in an r X r X ... X r contingency table of d dimensions. The hypothesis of priincipal interest is equality of the one-dimensioinal marginal distributions. Alternatively, if the r categories may be quantitatively scaled, then attention is directed at the hypothesis of equality of the mean scores over the d first order marginals. Test statistics are developed in terms of minimum Neyman X2 or equivalently weighted least squares analysis of underlying linear models. As such, they bear a strong resemblance to the Hotelling T2 procedures used with continuous data in mixed models. Several numerical examples are given to illustrate the use of the various methods discussed.

Journal Article•DOI•
TL;DR: In this paper, Markov chain methods are applied to chain binomial models in epidemics and it is shown that the susceptibles and susceptibles together with infectives respectively form Markov chains.
Abstract: In this paper, Markov chain methods are applied to chain binomial models in epidemics. In both Greenwood and Reed-Frost chain binomial models, it is shown that the susceptibles and susceptibles together with infectives respectively form Markov chains. These chains are used to obtain probabilities for the duration time and the total number of cases in an epidemic. A study of chain binomial models as Markov chains imbedded in continuous time processes is made. A practical application of the effects of inoculation on an epidemic is carried out, and some numerical results for the mean duration times and mean numbers of cases given.

Journal Article•DOI•
TL;DR: The period since 1940 abounds with successful applications of mathematical modeling techniques to problems of control in engineering and business, and it is hoped that some of the techniques that have been used so successfully can be translated into health.
Abstract: The period since 1940 abounds with successful applications of mathematical modeling techniques to problems of control in engineering and business. 'Feedback Control' and 'Adaptive Systems' are almost household words. The 1950's saw linear programming techniques sweep industry by storm. Techniques allied to the calculus of variations were used to calculate optimal interplanetary trajectories for rockets. Industrial inventories with part numbers running in the hundreds of thousands were managed by principles derived from mathematical modeling. Since most of us are convinced that a science of medicine exists and that a science of the delivery of health services is emerging, one may wonder why no control theories similar to those developed for engineering and industrial management exist for health services delivery. The emergence of the field 'Management Science' leads us to hope that some of the techniques that have been used so successfully can be translated into health. On the surface, it would seem that the problems associated with the delivery of health services are at least as manageable as those of modern industry. The principal uses that have been made of optimization techniques have been to the control of real-time systems and to the allocation of scarce resources among competing requirements. Both of these problems exist in the health field in abundance. Perhaps the most common example of the first class of problems is the health delivery system geared to the elimination of a particular health problem. Stated somewhat more formally, the health delivery team is charged with the task of achieving present health goals with the health delivery system subject to constraints on the total resources that can be expended on this task. With very little change in language, this problem could have been stated as a problem concerned with the control of a chemical processing plant.

Journal Article•DOI•
TL;DR: In this paper, the best quadratic unbiased estimators (BQUE'S) of variance components from unbalanced data in the 1-way classification random model are derived under zero mean and normality assumptions.
Abstract: Best quadratic unbiased estimators (BQUE'S) of variance components from unbalanced data in the 1-way classification random model are derived under zero mean and normality assumptions. An estimator of the between-class variance is also suggested for the non-zero mean case. These estimators are functions of the ratio of the population variances, p = o/-c2. Numerical studies indicate that for badly unbalanced data and for values of p larger than 1 estimators of o-2 having variance less than that of the analysis of variance estimator can be obtained by substituting even a rather inaccurately predetermined value of p into the BQUE of oa.


Journal Article•DOI•
TL;DR: A survival distribution is developed for two-orgain systems such as the lungs or kidneys in the human body and a Monte Carlo simulation of two numerical procedures is conducted to obtain the maximum likelihood estimates of the parameters.
Abstract: A survival distribution is developed for two-orgain systems such as the lungs or kidneys in the human body. The assumption is made that if one of the two organs fails (no repair possible) the other organ may be subject to a different failure rate. Both failure rates are assumed constant with respect to time. The maximum likelihood estimation equations are developed along with the large-sample variance-covariance matrix for the parameters. A Monte Carlo simulation of two numerical procedures, the Newton-Raphson procedure and the Method of Scoring, is conducted to obtain the maximum likelihood estimates of the parameters. The results of these procedures are compared.

Journal Article•DOI•
TL;DR: In this paper, the authors extended Mantel's approach for testing space-time clustering of a single set of points to test for clustering between two such sets of points.
Abstract: SUMMARY Mantel's approach for testing space-time clustering of a single set of points is extended to test for clustering between two such sets of points. Randomization tests are proposed for two situations: (a) one set is considered fixed and the other random and (b) both sets of points are considered raindom. For each situation an empirical randomization test and its normal approximation are given. In either case a variety of space and time distance measures may be used. The adequacy of the normal approximation for a contrived example was determined empirically using two favored measures, the 0, 1 indicator function, and the reciprocal of distance (time) plus a constant. As examples for actual data, the reciprocal measure is used to test for space-time clustering between a set of dog and a set of cat lymphoma cases, and the iindicator measure is used for underground nuclear tests and earthquakes.