scispace - formally typeset
Search or ask a question

Showing papers in "Biometrics in 1977"


Journal Article•DOI•
TL;DR: A general statistical methodology for the analysis of multivariate categorical data arising from observer reliability studies is presented and tests for interobserver bias are presented in terms of first-order marginal homogeneity and measures of interob server agreement are developed as generalized kappa-type statistics.
Abstract: This paper presents a general statistical methodology for the analysis of multivariate categorical data arising from observer reliability studies. The procedure essentially involves the construction of functions of the observed proportions which are directed at the extent to which the observers agree among themselves and the construction of test statistics for hypotheses involving these functions. Tests for interobserver bias are presented in terms of first-order marginal homogeneity and measures of interobserver agreement are developed as generalized kappa-type statistics. These procedures are illustrated with a clinical diagnosis example from the epidemiological literature.

64,109 citations


Journal Article•DOI•

11,905 citations


Journal Article•DOI•
Landis1, Koch Gg•
TL;DR: A subset of 'observers who demonstrate a high level of interobserver agreement can be identified by using pairwise agreement statistics betweeni each observer and the internal majority standard opinion on each subject.
Abstract: This paper presents a general statistical methodology for the anialysis of mnultivariate categorical data involving agreement among nmore than two observers Since these situations give rise to very large contingency tables in which mi0ost of the observed cell frequencies are zero, procedures based on indicator variables of the raw data for individual subjects are used to genierate first-order margins and main diagonal sums from the conceptual multidinmenisional contingency table From these quantities, estimates are generated to reflect the strenlgth of'an internlal mlajority decision on each subject Moreover, a subset of 'observers who demonstrate a high level of interobserver agreement can be identified by using pairwise agreement statistics betweeni each observer and the internal majority standard opinion on each subject These procedures are all illustrated within the context of'a clinical diagnosis examiiple involving seven pathologists

2,870 citations


Journal Article•DOI•
TL;DR: This paper is concerned with the analysis of multivariate categorical data which are obtained from repeated measurement experiments and appropriate test statistics are developed through the application of weighted least squares methods.
Abstract: This paper is concerned with the analysis of multivariate categorical data which are obtained from repeated measurement experiments. An expository discussion of pertinent hypotheses for such situations is given, and appropriate test statistics are developed through the application of weighted least squares methods. Special consideration is given to computational problems associated with the manipulation of large tables including the treatment of empty cells. Three applications of the methodology are provided.

597 citations


Journal Article•DOI•
TL;DR: In this paper, a component of variance model for categorical data from unbalanced designs is proposed, which is directly analogous to a one-way random effects ANO VA model for quantitative data.
Abstract: A components of variance model for categorical data from unbalanced designs which is directly analogous to a one-way random effects ANO VA modelfor quantitative data is proposed. The variance components provide separate reliability measures for each of the response categories and disagreement measures between pairs of response categories in terms of (within subject) intraclass and interclass correlation coefficients. The estimation procedures involve usual MA NOVA calculations which can be expressed as compounded functions of the multinomial observations. Thus, the variances of these estimates can be obtainedfrom linearized Taylor series results. These procedures are illustrated with data from a psychiatric diagnosis study.

418 citations


Journal Article•DOI•
TL;DR: It is found that the test based upon chi2 with continuity correction agrees best overall with Gart's (1971) test and is therefore recommended for use when computational facilities needed for carrying out the latter are not available.
Abstract: In this paper, the role of significance testing in establishing equivalence between treatments is described. In place of the more customary null hypothesis of no difference between treatments, the hypothesis that the true difference is equal to a specified delta is tested. The particular case of comparing two binomial samples is described in detail. Results using a method due to Gart and approximations based on the chi2 and normal distributions are compared. It is found that the test based upon chi2 with continuity correction agrees best overall with Gart's (1971) test and is therefore recommended for use when computational facilities needed for carrying out the latter are not available.

362 citations


Journal Article•DOI•
TL;DR: Increasing doses of the substaLnce are given to groups of animals; a group of control animals receiving a zero dose are also included in the experiment.
Abstract: The toxicity of substances such as drugs, food additives, artificial foods aild pesticides ate usually assessed at some stage by animal experiments. Increasing doses of the substaLnce are given to groups of animals; a group of control animals receiving a zero dose are also included in the experiment. The experimenter wishes to know if there is evidence of toxicity, Lind if so,

304 citations


Journal Article•DOI•
TL;DR: In this paper, a multivariate Ornstein-Uhlenbeck diffusion process is proposed for home range studies of wild animals and birds, assuming that such data are generated by a continuous, stationary, Gaussian process possessing the Markov property.
Abstract: In home range studies of wild animals and birds, statistical analysis of radio telemetry data poses special problems due to lack of independence of successive observations along the sample path. Assuming that such data are generated by a continuous, stationary, Gaussian process possessing the Markov property, then a multivariate Ornstein-Uhlenbeck diffusion process is necessarily the source and is proposed here to be a workable model. Its characterization is given in terms of typical descriptive properties of home range such as center of activity and confidence regions. Invariance of the model with respect to choice of an observational coordinate system is established, while data for twin deer are used to illustrate the manner in which the model may be used for study of territorial interaction. An approximate maximum likelihood procedure is proposed with results being reported for deer, coyote, and bird tracking data.

237 citations


Journal Article•DOI•
TL;DR: In this paper, a logistic model for grouped observations is introduced and illustrated with a numerical example, leading back to Cox's proportional failure rates when the lengths of the grouping intervals approach zero.
Abstract: Assuming a model of proportional failure rates, Cox (1972) presents a systematic study of the use of covariates in the analysis of life time. The treatment of tied observations is a particularly troublesome point in both theory and application. It appears that grouping rather than discrete time is the right way to handle ties. This paper studies methodology for grouped observations. A logistic model, which makes explicit use of Cox's earlier binary data methods, is introduced and illustrated with a numerical example. The model leads back to Cox's proportional failure rates when the lengths of the grouping intervals approach zero. This limiting process provides some enlightenment on controversial issues such as ignoring intervals in which no failures occur, determining whether the covariates may be functions of time, and treating ties.

207 citations


Journal Article•DOI•
TL;DR: The maximum likelihood ML approach is suggested which overcomes some of the limitations of the existing methodology and finds wide application in the shelf-life prediction of biological products where the same statistical methods are appropriate.
Abstract: A high level of stability is essential for any biological standard and is desirable in most other biological products. It is in general impossible to observe directly the rate of degradation of a biological standard since no independent scale of measurement is available. An indirect method is therefore required. The most common approach is the accelerated degradation test in which samples are stored for a time at elevated temperatures and then compared with samples stored continuously at low temperature. The relative degradation rates are used to fit the Arrhenius equation (relating degradation rate to temperature) and hence to predict stability under normal storage conditions. Previous statistical work on this problem is reviewed and a maximum likelihood ML approach is suggested which overcomes some of the limitations of the existing methodology. The accelerated degradation test also finds wide application in the shelf-life prediction of biological products where the same statistical methods are appropriate.

180 citations


Journal Article•DOI•
TL;DR: In this article, the authors used unbiased estimation theory to obtain a minimum variance unbiased estimwator for this family of diversity measures, which is then used to partition the variation in sample diversity betweeni randomii samiipling error and local variation community diversity.
Abstract: A Jamlily oJfspecies diversity mceasures proposed by Hurlbert (1971) is defined as the expected niumiiber oJ species in a random sample of m individuals Jrom a population. For m = 2 this miieasure is equivalent to Simpson 's diversity index. For larger m, the mneasure is increasingly sensitive to rare species. In this paper we use unbiased estimation theory to obtain a minimum variance unbiased estimwator Jor this family of diversity measures. A n unbiased estimator oJ the samiipling varianice is also obtained. These results are then used to partition the variation in sam^ple diversity betweeni randomii samiipling error and local variation community diversity.

Journal Article•DOI•
TL;DR: In this paper, the validity of several statistics to test whether a small sample comes from a population having the binomial proportions p2AA, 2pqAa, q2aa, where q = I p.
Abstract: An investigation is made of the validity of several statistics to test whether a small sample comes from a population having the binomial proportions p2AA, 2pqAa, q2aa, where q = I p. In particular the significance levels (P-values) indicated by these tests are compared to those of a well-known exact test (Haldane 1954). It is found that, for sample sizes of 20 or greater and significance level 0.15 or less for the exact test, a useful approximation to the significance level is obtained when the X2-statistic, with Yates' correction, is averaged with a similar statistic that used conditional expectations, and the result referred to the x2-distribution. For other situations, or for greater accuracy, a recursive relation is given that reduces the amount of computation necessary to determine the exact significance level.

Journal Article•DOI•
TL;DR: Three different models, a two-way factorial model for familiarity, an orthogonalizing transform of this model to a diallel model, and a bio model more representative of the biological situation, are interrelated in terms of their components of variance and covariance.
Abstract: Three different models, a two-way factorial model for familiarity, an orthogonalizing transform of this model to a diallel model, and a bio model more representative of the biological situation, are interrelated in terms of their components of variance and covariance. It is clarified that there are five components that can be reckoned with in the analysis of reciprocal crosses, including distinct maternal and paternal variances. Estimation of the components and tests of hypotheses concerning them are outlined for two types of mating designs with reciprocals. One deisgn involves a factorial mating design between two distinct sets of parents or parental lines and the other a diallel of all crosses from a single set of parents or parental lines. Both designs provide the same types of information and similar tests of hypotheses. At least some parts of the analyses corresponding to the factorial model are required to separate the maternal and paternal variances. A least squares partitioning of the sums of squares according to the diallel model, but with expectations expressed in terms of the bio model, provides most of the tests of hypotheses of interest. Worked examples are given.


Journal Article•DOI•
TL;DR: A unified approach for the statistical analysis of a two-period repeated measurements crossover design is presented that clarifies the testing procedures and assumptions employed under different conditions.
Abstract: The two-period repeated measurements crossover design is often employed in clinical trials. This paper presents a unified approach for the statistical analysis of such a design that clarifies the testing procedures and assumptions employed under different conditions. It is shown how the data may be transformed so that it could be analyzed under the framework of a completely randomized repeated measurements design. Applications are given to a comparative bioavailability trial for attainment of steady state levels and to a clinical trial to compare the effects of two hypolipidemics.

Journal Article•DOI•
TL;DR: In this article, the authors compared the performance of three discriminant functions, the quadratic, best linear, and Fisher's linear discriminant function, to classify individuals into two normally distributed populations with unequal covariance matrices.
Abstract: A Monte Carlo study (Wahl 1971) is compared to the study of Marks and Dunn (1974) which investigated the ability of three discriminant functions, the quadratic, best linear, and Fisher's linear discriminant function, to classify individuals into two multivariate normally distributed populations with unequal covariance matrices. Parameters that were varied in all of the studies include the distance between populations, covariance matrices, number of variables, sample size and population proportion. Our results, when related to those of Marks and Dunn, indicate sample size to be a critical factor in choosing between the quadratic and linear functions.

Journal Article•DOI•
TL;DR: An extensive evaluation of why adaptive designs are rarely used in clinical trials is presented, and suggestions are offered for reorienting this area of research into directions that are potentially more useful for clinical trials.
Abstract: Summary This paper provides a general review of adaptive experimental designs which utilize accumulating information for assigning the best treatment to the most patients in clinical trials. The historical development of such methods is traced. Though the statistical literature on adaptive designs has developed rapidly and continues to grow, the methods are almost totally unused in practice. An extensive evaluation of why adaptive designs are rarely used in clinical trials is presented. It is asserted that most published methods have important deficiencies that render them unsuitable for application. Suggestions are offeredfor reorienting this area of research into directions that are potentially more useful for clinical trials. The term adaptive treatment assignment as used in this paper in the context of clinical trials refers to methods which utilize accumulating information for assigning the best treatment to the most patients. The literature on such methods has developed rapidly in recent years, but the methods are almost totally unused in practice. A recent paper in a prominent medical journal (Weinstein 1974) concluded that for both statistical and ethical reasons, adaptive designs should be used more often. Why are these adaptive methods not used? This paper is an attempt to answer the question. The methods are briefly reviewed and their historical development traced. A more detailed review of the numerous published

Journal Article•DOI•
TL;DR: In this article, the authors developed confidence intervals and hypothesis tests for dose-response relations based on dichotomous data from animal carcinogenicity experiments, using the Armitage-Doll multistage carcinogenesis model.
Abstract: Confidence intervals and hypothesis tests are developed for dose-response relations based on dichotomous data from animal carcinogenicity experiments. The functional form of the dose-response curve comes from the Armitage-Doll multistage carcinogenesis model and involves a polynomial in the dose-rate, with non-negative coefficients. Asymptotic distributions of the maximum likelihood estimators of these coefficients are used to construct confidence bounds on risk at a given dose and on the dose corresponding to a given risk. Likelihood ratio tests are developed for the presence of a positive dose-related effect and for the existence of a positive slope to the dose-response curve at zero dose. The latter test is of practical importance since a positive slope of the dose-response curve at zero dose rules out any "threshold-like" behavior and would often mean that any concentration low enough to insure a negligibly low cancer risk (e.g., 10(-6)) would be too low to be economically useful for applications such as food additives. Simulation experiments are performed to provide guidelines for applying the theory.

Journal Article•DOI•
TL;DR: A simple nonparametric test is given for testing the equality of two populations when the observations are bivariate and the alternative of interest is that the conditional c.d.f. of Y given X for one population dominates that for the other for every value of X.
Abstract: The purpose of this paper is to present a simple nonparametric procedure which is useful for testing equality of two populations when the observations (X, Y) are bivariate and is applicable in situations where suitable parametric approaches do not exist. A random sample of JK animals is taken from each of two populations. J experimental groups are formed where each group contains K animals from each sample. Two measurements, denoted (X, Y), are taken on each animal. The two populations are to be compared by testing the null hypothesis, HO , that within each group the joint distributions of (X, Y) are the same for both populations vs. the alternative, HA , that the conditional c.d.f. of Y given X for one population dominates the corresponding c.d.f. for the other population for all values of X, uniformly over the groups.

Journal Article•DOI•
TL;DR: In this article, the authors present a statistical methodology for carcinogenic safety testing based on a data-based method of estimating "safe" doses, which is applicable to any specified permissible risk (of exceeding the spontaneous rate) and the latter must of course be specified by F.D.
Abstract: The statistical methodology for carcinogenic safety testing here developed has the following advantages: (1) Rather than making possibly unwarranted assumptions about a minimum slope in a dose response relationship, the present method represents an objective data based method of estimating "safe" doses. It is applicable to any specified permissible risk (of exceeding the spontaneous rate) and the latter must of course be specified by F.D.A. (2) Although the model which is used for the estimation of safe doses and their lower confidence points is parametric it comprises an adequately large number of parameters to allow for differences in the idiosyncracies of suspected carcinogens, host species and to a limited degree for variations in the experimental protocols. (3) The same computer program will cover the analysis of an experiments in which times to tumor have been recorded as well as experiments in which only tumor incidence rates have been recorded or mixtures of the two. "Better experimentation" is "rewarded" in that the lower confidence limits for the "safe doses" should be higher and more closely approach the true safe dose as the experimental effort increases. (4) The maximum likelihood estimation procedure is sophisticated and has asymptotic optimality properties. It utilizes the latest techniques of "convex programming" and the computer algorithm is straightforward and fast. The methodology proposed here also has the following shortcomings: (1) The model is (multi)parametric. However, it is of the form of the product model for age specific hazard rates which is now widely accepted. (2) Robustness studies on the effect of model breakdown on the estimated safe doses are as yet limited and should be followed up with more extensive studies. (3) Obviously no estimates of "safe doses" can be made if the spontaneous incidence rate is zero and the experimental dose levels have been chosen too small and no tumors have been observed. Similarly if the experimental doses are too small and the tumor incidence rates are all comparable with the spontaneous rate the estimation procedure is afflicted by extremely large errors. The situation improves slightly if the incidence for the highest dose level is higher than that of the lower dose levels which are all approximately equal. In such situations more satisfactory experimental data are needed. Some general recommendations are as follows: (1) Whenever possible it is preferable to record times to tumor and not just incidence rates. However, for experiments of sufficiently long duration necropsies following the varying times of death will provide adequate information on the time dependence of tumor incidence. It may also be advisable to deliberately vary the times of sacrifice to two or three different times. (2) Other considerations being equal it is preferable to have a large number of dose levels rather than more animals per dose level...

Journal Article•DOI•
TL;DR: It is shown that if one has data which fall into ordered categories, then the discrete Kolmogorov-Smirnov test is an exact test which uses the information from the ordering and can be used for small sample sizes.
Abstract: We review the advantages and disadvantages of several goodness-of-fit tests which may be used with discrete data: the multinomial test, the likelihood ratio test, the X2 test, the two-stage X2 test and the discrete Kolmogorov-Smirnov test. Although the X2 test is the best known and most widely used of these tests, its use with small sample sizes is controversial. If one has data which fall into ordered categories, then the discrete Kolmogorov-Smirnov test is an exact test which uses the information from the ordering and can be used for small sample sizes. We illustrate these points with an example of several analyses of health impairment data.

Journal Article•DOI•
TL;DR: The asymptotic behaviour of certain estimators for the mean of the offspring distribution of a Galton-Watson process is studied when the true underlying model is in fact a multitype branching process or a branching process with a random environment.
Abstract: Certain estimators for the mean of the offspring distribution of a Galton-Watson process are considered. The asymptotic behaviour of each of these estimators is studied when the true underlying model is in fact a multitype branching process or a branching process with a random environment. It is revealed which of the estimators remain consistent indicators of whether or not the process is subcritical, under these alternative underlying models. It is then indicated how this "robustness" result might influence the choice of an estimator by considering the problem of estimating the level of immunity required in a community in order to prevent major epidemics. The application is illustrated with references to smallpox using data from an outbreak in Sao Paulo, Brazil.



Journal Article•DOI•
TL;DR: A stimulation procedure is described which provides, for different mortality rates and different patterns of patient enrollment, the correct critical regions corresponding to specified frequencies of looks at the data over the course of the study.
Abstract: When the data from long-term clinical trials are reviewed continually over time for evidence of adverse or beneficial treatment effects, the classical significance tests are not appropriate. A stimulation procedure is described which provides, for different mortality rates and different patterns of patient enrollment, the correct critical regions corresponding to specified frequencies of looks at the data over the course of the study. The power of the test and the robustness of the critical regions for differences in pattern of enrollment, length of study, mortality model, and sample size are discussed. An application is made to a drug trial in coronary heart disease.

Journal Article•DOI•
TL;DR: This paper generalized the Bradley-Terry model for paired comparisons to account for the effect of the order of presentation of the objects within a pair and proposed a multiplicative order effect as an alternative to the additive order effect proposed by Beaver and Gokhale.
Abstract: The Bradley-Terry model for paired comparisons is generalized to account for the effect of the order of presentation of the objects within a pair. A multiplicative order effect is suggested as an alternative to the additive order effect proposed by Beaver and Gokhale (1975). The multiplicative order effect is then incorporated into the tie models given by Rao-Kupper and Davidson (1967, 1970). The associated estimation and testing procedures are presented with emphasis on likelihood methods as well as on the weighted least squares methods recently applied to paired comparison situations by Imrey, Johnson, and Koch (1976) and by Beaver (1977). Two numerical examples are provided, one with ties and one without.

Journal Article•DOI•
TL;DR: In this article, the authors proposed a bivariate extension of the SB distribution, the SBB, which allows for the generation of bivariate frequencies for diameter and height, whereas the current approach only provides marginal frequency for diameter.
Abstract: Hafley and Schreuder (1977) have shown that the marginal SB distribution fits diameter and height data consistently better than the Weibull, beta, gamma, lognormal, and normal distributions. The bivariate extension of the SB distribution, the SBB, is both more realistic and provides more usable information than the currently accepted approach in describing even-aged forest stand height-diameter data. The SBB allows for the generation of bivariate frequencies for diameter and height, whereas the current approach only provides marginal frequencies for diameter. In addition, the SBB implies a new height-diameter relationship which is comparable in fit to the most commonly used height-diameter regression model. Application of the SBB to two data sets is presented.

Journal Article•DOI•
TL;DR: The problem is reformulated itn non-linear programming terms, and a new algorithm for seeking the minimum sum of squared distances about the g centroids is described, and an efficient hybrid algorithmi is introduced.
Abstract: A ni analysis of surface pollen samples to discover iJ'they fall naturally in1to distinct groups oJ' simiiilar samiples is an example of a classification problem. In Euclidean classification, a set of n objects can be represented as n points in Euclidean space of p dimensions. The sum oJ'squares criterion defines the optimal partition of the points in1to g disjoint groups to be the partition which mninimzizes the total within-group sumn of squared distances about the g centroids. It is not usually Jeasible to examiine all possible partitions oJ'the objects into g groups. A critical review is mnade of algorithmiis which have been proposedfor seeking optimal partitions. The problem is reformulated itn non-linear programming terms, and a new algorithm for seeking the minimum sumi1 oJ'squares is described. The performance of this algorithm in analyzing the pollen data is Jound to compare vell vith the perJormance of three oJ the existing algorithms. An efficient hybrid algorithmi is introduced.

Journal Article•DOI•
TL;DR: The non-central procedure is compared to existing procedures and it is shown that this procedure ields results in close agreemnent to those of the widely employed Halperini (twp-tailed) procedure.
Abstract: Samiiple size determnination Jor r X c comparative trials is described based on existing tables of the nzon-central x2 distribution and the expression Jbr the non-centrality paramneter of the appropriate x2 statistic. In addition, the optimal allocation of sample elements among the treatmiient groups is explored through the evaluation oJ the expression Jor noncentrality under the comparative trial miiodel. It is shown that a general rulehJor sample allocation, such as the "square root rule," does niot applv unijborinly. The non-central procedure is then compared to existing procedures Jor samiiple size determninations Jor 2 X 2 trials and it is shown that this procedure ields results in close agreemnent to those Jor the widely employed Halperini (twp-tailed) procedure. Finiallv, samtiple size determinations Jor other contingency table mnodels are discussed.

Journal Article•DOI•
TL;DR: It is concluded that pair-matching may not be the optimal choice in many, if not most, research situations.
Abstract: Pair-matching is undoubtedly one of the most popular techniques for controlling variation in both medical and other investigations involving human populations. Given the obvious advantages of ease of implementation and comprehension, apparent efficiency and simplicity of analysis, this popularity is understandable. Despite this appeal, however, pair-matching frequently involves high cost (particularly in the loss of unmatchable units), cannot claim efficiency when appropriate comparisons of precision are made, and suffers some important limitations in the analysis which directly affect inference. The persistence of the technique in the face of these limitations is discussed with reference to the disjunction between theoretical models and practical research constraints. It is concluded that this design may not be the optimal choice in many, if not most, research situations.