scispace - formally typeset
Search or ask a question

Showing papers in "Biometrics in 1973"


Journal Article•DOI•

9,528 citations


Journal Article•DOI•

2,566 citations



Journal Article•DOI•

965 citations


Journal Article•DOI•
TL;DR: This paper used residual analysis for quantal response and for complete and incomplete r X c contingency tables, and used it for analysis of quasi-independence models in incomplete R X c tables and for use in logit analysis.
Abstract: In regression analysis and analysis of variance, examination of residuals forms an important part of good statistical practice (see Draper and Smith [1966]). More general uses of residuals have also been explored by Cox and Snell [1968]. In this paper, methods of residual analysis developed for loglinear models for cross-classified tables by Haberman [1972b] are used to examine models for quantal response and for complete and incomplete r X c contingency tables. Section 2 contains a brief summary of results of Haberman [1972b] which are used in this paper. A more complete summary is provided in the appendix. In section 3, residual analysis is used in the hypothesis of row and column independence in a complete r X c table. In section 4, methods are developed appropriate for analysis of quasi-independence models in incomplete r X c tables, and in section 5, methods for use in logit analysis are considered.

914 citations


Book Chapter•DOI•
Donald B. Rubin1•
TL;DR: In this article, several matching methods that match all of one sample from another larger sample on a continuous matching variable are compared with respect to their ability to remove the bias of the matching variable.
Abstract: Several matching methods that match all of one sample from another larger sample on a continuous matching variable are compared with respect to their ability to remove the bias of the matching variable. One method is a simple mean-matching method and three are nearest available pair-matching methods. The methods' abilities to remove bias are also compared with the theoretical maximum given fixed distributions and fixed sample sizes. A summary of advice to an investigator is included.

867 citations



Book Chapter•DOI•
TL;DR: In this paper, the ability of matched sampling and linear regression adjustment to reduce the bias of an estimate of the treatment eff ect in two sample observational studies is investigated for a simple matching method and five simple estimates.
Abstract: The ability of matched sampling and linear regression adjustment to reduce the bias of an estimate of the treatment eff ect in two sample observational studies is investigated for a simple matching method and five simple estimates. Monte Carlo results are given for moderately linear exponential response surfaces and analytic results are presented for quadratic response surfaces. The conclusions are (1) in general both matched sampling and regression adjustment can be expected to reduce bias, (2) in some cases when the variance of the matching variable differs in the two populations both matching and regression adjustment can increase bias, (3) when the variance of the matching variable is the same in the two populations and the distributions of the matching variable are symmetric the usual covariance adjusted estimate based on random samples is almost unbiased, and (4) the combination of regression adjustment in matched samples generally produces the least biased estimate.

574 citations


Journal Article•DOI•

557 citations



Journal Article•DOI•
TL;DR: Maximum likelihood (ML) estimation for the beta-binomial distribution (BBD) is considered as a model for the incidence in households of noninfectious disease and alternative modes of infection are discussed.
Abstract: In part I, maximum likelihood (ML) estimation for the beta-binomial distribution (BBD) is considered. The BBD can be used as a model for the incidence in households of noninfectious disease. Typically households in which there are no cases of disease will not be included in the data. It is then necessary to fit a truncated BBD. Alternative modes of infection are discussed in part II. These give rise to a variety of models for the household distribution of the number of cases of a disease. The BBD is fitted to some data on the common cold and influenza. Other models have been fitted by previous authors to the same data. Independent epidemiological evidence would be necessary for choosing among these models.

Journal Article•DOI•
TL;DR: A method of generating all such minimum mutation fits is described, which is the assignment which permits representation of the data in a minimum number of symbols, which seems compelling in its own right.
Abstract: SUMMARY A number of objects, such as species, lie at the ends of a known evolutionary tree. A variable taking a finite number of possible values is specified on this set of objects. How can the values of the variable be estimated for the ancestors of the objects? One way is to assign to the ancestors those values which have the minimum number of mutations (or changes) in going from ancestors to their immediate descendants. In this paper, a method of generating all such minimum mutation fits is described. An evolutionary model for a set of objects is a family tree of possibly hypothetical ancestors through which each object may be traced back to the same primordial ancestor. Evolutionary models are used in the classification of plant and animal life, languages, motor cars, cultures, religions. The construction of the family tree is a difficult problem requiring synthesis of many types of knowledge. Suppose that the family tree is given, and that a variable V (such as number of limbs, for animals) is given for the set of objects (such as species, or families) at the ends of the tree. What values will V take for the hypothetical ancestors? A complete answer to this question is a probability distribution over the set of all possible values that the ancestors might take. A more modest answer is to assign values of V to the ancestors in such a way that the minimum number of changes in V occur, between ancestors and their immediate descendants. This "minimum mutation" fit is most likely under some reasonable probability models, but seems compelling in its own right. It is the assignment which permits representation of the data in a minimum number of symbols. Camin and Sokal [1965] consider the problem Qf finding an evolutionary tree when each variable has an ordered set of values, and mutation can only take place from a lower to a higher value. Estabrook [1968] extends this structure on the values of the variable to be a partial order with tree structure-for each variable, an evolutionary tree is known connecting the values. In both of these formulations, the minimum mutation fit to a given tree is not a serious problem. The optimal value for an ancestor is always the most primitive value in its descendants. Cavalli-Sforza and Edwards [1967] consider minimum mutation fits 53


Journal Article•DOI•
TL;DR: In this article, the authors derived the "asymptotic" biases of the Jolly-Seber estimates arising from a failure of the hypothesis of equal catchability, and discussed the dependence of these biases on the parameters of the model, and of the distribution of catchability.
Abstract: If the number of immigrants per inter-sample period, and the probabilities of survival, capture and death on capture are all assumed constant in time, the "asymptotic" biases of the Jolly-Seber estimates arising from a failure of the hypothesis of equal catchability can be derived analytically. The dependence of these biases on the parameters of the model, and of the distribution of catchability, is discussed for stable populations with no deaths on capture, For smaller populations, simulation leads to conclusions consistent with these results and provides information on the suitability of Jolly's formulae for the estimated variances.

Journal Article•DOI•
TL;DR: It is brought out that in a large prospective study in which comparatively few cases of disease have occurred, computational problems can be so burdensome as to preclude a comprehensive and imaginative analysis of the data.
Abstract: SUMMARY Prospective and retrospective approaches for estimating the influence of several variables on the occurrence of disease are discussed. The assumptions under which these approaches would tend to yield the same estimates as would be given by an ideal but unattainable experimental design approach are stated. It is then brought out that in a large prospective study in which comparatively few cases of disease have occurred, computational problems can be so burdensome as to preclude a comprehensive and imaginative analysis of the data. The prospective study can be converted into a synthetic retrospective study by selecting a random sample of the cases and a random sample of the noncases, the sampling proportion being small for noncases, but essentially unity for cases. It is demonstrated that such sampling will tend to leave the dependence of the log odds on the variables unaffected except for an additive constant. The use of a discrimination function noniterative method of analysis is noted and is indicated to be not generally appropriate. The reverse suggestion is made that normal data can be analyzed by a log-odds approach, this yielding alternative tests to those ordinarily used for comparing two or several means or mean vectors, or two or several variances or variance-covariance matrices.



Journal Article•DOI•
TL;DR: In this article, the best quadratic estimators of the variance components in a general linear model are presented for each of several classes of estimators in the form Y'A Y.
Abstract: Best quadratic estimators of the variance components in a general linear model are presented for each of several classes of estimators of the form Y'A Y. Of the classes examined, three contain biased estimators and the others contain only unbiased estimators. Attainable lower bounds on mean squared errors of estimators in each class are described. In the classes of unbiased estimators, necessary and sufficient conditions for estimability are established. Normality is assumed throughout.


Journal Article•DOI•
TL;DR: In this article, the problem of maximizing the sustainable yield from a population of constant size by altering the age composition of a crop is investigated using a simple extension of the Leslie matrix model.
Abstract: SUMMARY The problem of maximizing the sustainable yield from a population of constant size by altering the age composition of a crop is investigated using a simple extension of the Leslie matrix model. The cropping strategy that maximizes this yield is found to involve the complete removal of one age class, effectively reducing the age of maximum longevity of the population, and a partial removal of a second age class. A simple algorithm to find the optimum cropping strategy is described, and its application to typical demographic data is given.

Journal Article•DOI•
TL;DR: Details of how regression-type arguments can be used in a multi-group experiment to find simple relationships between the treatment applied to each group and the value of the third (treatment dependent) Weibull coefficient for that group are given.
Abstract: In 1966, Pike suggested that continuous-carcinogenesis experiments be analyzed by fitting appropriate Weibull distributions. Unfortunately, the Weibull distribution seems nearly degenerate with respect to the two of its three parameters which do not depend on treatment, and one of these may therefore have to be fixed arbitrarily. Since both degenerate parameters have definite physical meaning, the choice of a sensible pair of values is possible. When this is done, Pike's method is excellent for separating quantitative carcinogenic response from intercurrent mortality. In this paper we also give details of how regression-type arguments can be used in a multi-group experiment to find simple relationships between the treatment applied to each group and the value of the third (treatment dependent) Weibull coefficient for that group.

Journal Article•DOI•
TL;DR: The importance of a control group is demonstrated by distinguishing between the true treatment effect and the regression effect for the case of bivariate normal distribution, truncated with regard to the predictor or "before treatment" measurement.
Abstract: SUMMARY When analyzing data from situations in which before and after treatment measurements have been obtained, without a parallel control group, it is important that the investigator be aware of possible changes in the observed variable due to regression toward the mean. This phenomenon can be misleading and is sometimes overlooked in the interpretation of data. This paper demonstrates the importance of a control group by distinguishing between the true treatment effect and the regression effect for the case of bivariate normal distribution, truncated with regard to the predictor or "before treatment" measurement. The regression effect is noted to be especially significant when the correlation between the before and after treatment measurement is small. A method for estimating the treatment and regression effects is presented and applied to data from an actual study.


Journal Article•DOI•
TL;DR: Completely general methods are developed to obtain the average sample size and the probability of accepting the hypothesis p = po for binomial probabilities in mnultiple stage sampling procedures.
Abstract: SUMMARY Completely general methods are developed to obtain the average sample size and the probability of accepting the hypothesis p = po for binomial probabilities in mnultiple stage sampling procedures. An example illustrating the use of a multiple-stage plan of this type for a drug screen is considered. Data on the actual performance of the screen is also given.

Journal Article•DOI•


Journal Article•DOI•
TL;DR: In each of the previously mentioned papers, primary emphasis was given to the formulation of models and the problems of analysis under various conditions of "no interaction" (see Roy and Kastenbaum [1956] or Bhapkar).
Abstract: One area of application which has become increasingly important to statisticians and other researchers is the analysis of categorical data. Often the principal objective in such investigations is either the testing of appropriate hypotheses or the fitting of simplified models to the multi-dimensional contingency tables which arise when frequency counts are obtained for the respective cross-classifications of specific qualitative variables. Grizzle, Starmer, and Koch [1969] (subsequently abbreviated GSK) have described how linear regression models and weighted least squares can be used for this purpose. The resulting test statistics belong to the class of minimum modified chi-square due to Neyman [1949] which is equivalent to the general quadratic form criteria of Wald [1943]. As such, they have central x2-distributions when the corresponding null hypotheses are true. Two alternative approaches to this methodology are that based on maximum likelihood as formulated by Bishop [1969; 1971] and Goodman [1970; 1971a, b] and that based on minimum discrimination information as formulated by Ku et al. [1971]. In each of the previously mentioned papers, primary emphasis was given to the formulation of models and the problems of analysis under various conditions of "no interaction" (see Roy and Kastenbaum [1956] or Bhapkar



Journal Article•DOI•
TL;DR: A mathematical model utilizing the geometric distribution for analyzing time-toresponse data takes into account concomitant information by means of a multivariate logistic function and results in two distinct exponential models involving regressor variables.
Abstract: SUMMARY A mathematical model utilizing the geometric distribution for analyzing time-toresponse data is described. This model takes into account concomitant information by means of a multivariate logistic function. A parameter introduced into the model to make the regression estimates independent of choice of time scale also serves as a model specification parameter. Allowing this parameter to take on its limiting values of zero and infinity results in two distinct exponential models involving regressor variables as described by other authors. Data on advanced breast cancer are used to illustrate the usefulness of this model in examining the relationship of sites of involvement and disease-free interval to survival subsequent to diagnosis of dissemination. The basic data are presented so that they may be available for illustrating and testing alternative statistical procedures.