# Showing papers in "Biometrics in 1975"

â€¢â€¢

TL;DR: A new general procedure for treatment assignment is described which concentrates on minimizing imbalance in the distributions of treatment numbers within the levels of each individual prognostic factor.

Abstract: In controlled clinical trials there are usually several prognostic factors known or thought to influence the patient's ability to respond to treatment. Therefore, the method of sequential treatment assignment needs to be designed so that treatment balance is simultaneously achieved across all such patients factor. Traditional methods of restricted randomization such as "permuted blocks within strata" prove inadequate once the number of strata, or combinations of factor levels, approaches the sample size. A new general procedure for treatment assignment is described which concentrates on minimizing imbalance in the distributions of treatment numbers within the levels of each individual prognostic factor. The improved treatment balance obtained by this approach is explored using simulation for a simple model of a clinical trial. Further discussion centers on the selection, predictability and practicability of such a procedure.

2,078Â citations

â€¢â€¢

TL;DR: Methods for dealing with most data available to animal breeders, however, do not meet the usual requirements of random sampling and are likely to yield biased estimates and predictions.

Abstract: Mixed linear models are assumed in most animal breeding applications. Convenient methods for computing BLUE of the estimable linear functions of the fixed elements of the model and for computing best linear unbiased predictions of the random elements of the model have been available. Most data available to animal breeders, however, do not meet the usual requirements of random sampling, the problem being that the data arise either from selection experiments or from breeders' herds which are undergoing selection. Consequently, the usual methods are likely to yield biased estimates and predictions. Methods for dealing with such data are presented in this paper.

1,901Â citations

â€¢â€¢

TL;DR: Using the binary (positive-negative) case as a model, some of the proposed indexes for measuring agreement between two judges on a categorical scale are presented and critically evaluates.

Abstract: At least a dozen indexes have been proposed for measuring agreement between two judges on a categorical scale. Using the binary (positive-negative) case as a model, this paper presents and critically evaluates some of these proposed measures. The importance of correcting for chance-expected agreement is emphasized, and identities with intraclass correlation coefficients are pointed out.

491Â citations

â€¢â€¢

TL;DR: In this article, a parametric method for analyzing binary response data from completely randomized experiments in which the experimental units are animal litters is described, where responses within a litter are assumed to form a set of Bernoulli trials whose success probability varies between litters in the same treatment group according to a two parameter beta distribution.

Abstract: This paper describes a parametric method for analyzing binary response data from completely randomized experiments in which the experimental units are animal litters. Responses within a litter are assumed to form a set of Bernoulli trials whose success probability varies between litters in the same treatment group according to a two parameter beta distribution. The parameters of the beta distribution for each treatment are estimated by maximum likelihood and treatment differences are tested by asymptotic likelihood ratio tests.

476Â citations

â€¢â€¢

TL;DR: In this article, the authors approximate the distribultion of Fisher's statistic in order to combiine one-sided tests of location when all the variables are not jointly inidependent.

Abstract: Littell and Folks [.1971, 1973] show that Fisher's method of combining independenlt tests of significance is asymptotically optimal among essentially all methods of combining independent tests. By assuming a joint multivariate normal density for the variables, we approximate the distribultion of Fisher's statistic in order to combiine one-sided tests of location when all the variables are not jointly inidependent. The probability associated with this test is simpler to evaluate than that of the equivalent likelihood ratio test.

414Â citations

â€¢â€¢

TL;DR: In this article, the problem of testing a comparison c stuggested by the data is first identified with the mtultiple comparisons problem for the class C of a priori equally plauisible comparisonis.

Abstract: SUMMARY The problem of testing a comparison c stuggested by the data is first identified with the mtultiple comparisons problem for the class C of a priori equally plauisible comparisonis. A multiple comparisons dilemma is then illustrated by a simple example E emphasizing the impossibility of developing an acceptable a-level Neyman-Peaison test for c. A recent empirical Bayes additive losses and exchangeable priors (EBALEP) approach is discussed and illutstrated in getting a simple solution for E. A brief review is giveii of the use of the same approach by Waller, Dixon and Dunican in getting the k-ratio t test and interval for the all-pairwise differences problem. These rules are showii to apply to the all contrasts problem as well and several worked examples are given for their application. The k-ratio rtules are seen to have considerably increased power and low type-i error probabilities in dealiing with heterogeneous and near honmogeneous treatments, respectively.

298Â citations

â€¢â€¢

270Â citations

â€¢â€¢

[...]

TL;DR: In this paper, the optimal augmented block and optimal augmented row-column designs for estimating certain contrasts of new treatments are presented, which may be used for estimating contrasts of check effects, new variety effects, of new variety versus checks, or of all check and new varieties simultaneously.

Abstract: When some treatments (checks) are replicated r times and other treatments (new treatments) are replicated less than r times, an augmented desigii may be used. These designs may be minimum variance designs for estimating contrasts of check effects, of new variety effects, of new variety versus checks, or of all check and new varieties simultaneously. In this paper optimal augmented block and optimal augmented row-column designs for estimating certain contrasts of new treatments are presented.

256Â citations

â€¢â€¢

TL;DR: In this article, the performance of six hierarchical clustering methods (given by one algorithm, Wishart [1969j] ) were compared on bivariate and multivariate normal Monte Carlo samples.

Abstract: The performance of six hierarchical clustering methods (given by one algorithm, Wishart [1969j) are compared on bivariate and multivariate normal Monte Carlo samples. The methods are stopped with the correct number of clusters and compared with respect to correct classification (placing pairs of points in the same or different clusters correctly or incorrectly) and with each other (both methods agree or disagree in placing a pair of points in the same or differing clusters).

255Â citations

â€¢â€¢

TL;DR: A notation which allows one to define competing risk models easily and to examine underlying assumptions is introduced, comparing it with other models and giving useful variance formulae both for the case when times of death are available and for the cases when they are not.

Abstract: We have introduced a notation which allows one to define competing risk models easily and to examine underlying assumptions. We have treated the actuarial model for competing risk in detail, comparing it with other models and giving useful variance formulae both for the case when times of death are available and for the case when they are not. The generality of these methods is illustrated by an example treating two dependent competing risks.

â€¢â€¢

TL;DR: The statistical efficiency of a study design which matches each case with k controls is compared with that of the standard matched-pairs setup and the Pitman efficiency is compared for large-sample dichotomous situations.

Abstract: The statistical efficiency of a study design which matches each case with k controls is compared with that of the standard matched-pairs setup. For the situation in which the distribution of measurements on the factor under investigation is continuous, the comparison criterion used is the reciprocal of the variance ratio of the corresponding case-control difference estimators; for dichotomous observations, the Pitman efficiency of Miettinen's test [1969] or of Pike and Morrow's test [1970] relative to McNemar's test is used. When the variances for continuously distributed case and control measurements are equal to the same result, 2k/(k + 1), is obtained for both situations and, more generally, the efficiency of k1 controls relative to k2 controls is given by k1(k2 + 1)/k2(k1 + 1). The Pitman efficiency is compared with the "practical efficiency" for large-sample dichotomous situations.

â€¢â€¢

TL;DR: In this paper, the distrubution of the reciprocal of von Neumann's ratio is considered under the random walk model, and it is shown how this quantity can be used to test for density dependence in animal populations.

Abstract: The distrubution of the reciprocal of von Neumann's ratio is considered under the random walk model, x(t+1) rwis;d xt + et, and it is shown how this quantity can be used to test for density dependence in animal populations. Another test is described which is robust under superimposed errors of measurement. The methods are used to analyze data on Canadian fur-bearing mammals.

â€¢â€¢

TL;DR: In this article, a family of linear-plateau models is proposed for fitting fertilizer response data which exhibit a plateau effect, and techniques for fitting, parameter estimation, and economic interpretation are described.

Abstract: For many cropping situations, especially in developinig countries, quadratic surfaces do not fit the responses of cer tain crops to fertilizer. Use of the seconid or der designs with stanidard statistical anid econiomic interpretive techniques may result in costly biases in the estimates of the optimal fertilizer rate. Also, there is a potential pollution problem. A family of linear-plateau models, coinsistiing of intersecting straight linies, is proposed for fitting fertilizer response data which exhibit a plateau effect. The regression coefficients are easily comptuted uisinig a desk calcutlator or computer, and the economic interpretations are simple. Techniques for fitting, parameter estimation, and economic interpretation are described. For mtulti-nutrtiienit experiments, a complete factorial experiment with a nulmber of levels of each nutrient is considered to be the best design foi both evaltuating the model and theii estimating the optimal ntutrient levels. Preliminary information may provide a basis for deciding which fertilizer nutrients are apt to produce response. In maniy soil-crop situations, only NTP or AT experiments are required, becatuse the other nutrients are already at adequiate levels; hence, the amotunt of experimental material may be redistributed by havinig fewer factors, bult more levels of each factor stu.died. Two cutrrently used fertilizer response designs, based on preliminary information on optimal nutrient levels, are described; a one-factor-at-a-time design has the disadvantage of providing no estimate of interaction. Several other designs are suggested. We recommend concenitratiing seveial tieatment levels in the vicinity of the aniticipated optimtum. Since the sloping phase of the response pattern is mole important thani the plateatu phase, it shouild receive more attention when distributitng treatment levels.

â€¢â€¢

TL;DR: Empirical Bayes procedure is employed in simultaneous estimation of vector parameters from a number of Gauss-Markoff linear models and it is shown that with respect to quadratic loss function, empirical Bayes estimators are better than least squares estimators.

Abstract: Empirical Bayes procedure is employed in simultaneous estimation of vector parameters from a number of Gauss-Markoff linear models. It is shown that with respect to quadratic loss function, empirical Bayes estimators are better than least squares estimators. While estimating the parameter for a particular linear model, a suggestion has been made for distinguishing between the loss due to decision maker and the loss due to individual. A method has been proposed but not fully studied to achieve balance between the two losses. Finally the problem of predicting future observations in a linear model has been considered.

â€¢â€¢

TL;DR: This paper suggests a method which utilizes a step-up procedure for choosing the most important variables associated with survival, and maximum likelihood estimates are utilized, and the likelihood ratio is employed as the criterion for adding significant concomitant variables.

Abstract: Multivariate concomitant information on a subject's condition usually accompanies survival time data. Using a model in which each subject's lifetime is exponentially distributed, this paper suggests a method which utilizes a step-up procedure for choosing the most important variables associated with survival. Maximum likelihood (ML) estimates are utilized, and the likelihood ratio is employed as the criterion for adding significant concomitant variables. An example using multiple myeloma survival data and sixteen concomitant variables is discussed in which three variables are chosen to predict survival.

â€¢â€¢

â€¢â€¢

TL;DR: Starting from a review of four major experimental areas in medicine, some of the special statistical problems arising in the design and analysis of clinical experiments are reviewed and an alternative formulation of the inference/decision problem is proposed.

Abstract: Starting from a review of four major experimental areas in medicine (Polio, Coronary Surgery, Diabetes, Breast Cancer), some of the special statistical problems arising in the design and analysis of clinical experiments are reviewed, and the limitations of current formulations are emphasized. Particular attention is given to the ethical dilemma and an alternative formulation of the inference/decision problem is proposed.

â€¢â€¢

TL;DR: The study of the dependence of response-time data on a multivariate regressor variable in the presence of arbitrary censoring has been approached in a number of ways, with logistic regression methods that allow the underlying hazard to be a function of time but the relative effects of the covariates are independent of time.

Abstract: The study of the dependence of response-time data on a multivariate regressor variable in the presence of arbitrary censoring has been approached in a number of ways. The exponential regression model proposed byr Feigl and Zelen [1965] and extended by Zippin and Armitage [1966] and by Mantel and Myers [1971] to the case of arbitrarily right censored data relates the reciprocal of the exponential parameter, i.e. the expected survival time, to a linear function of the regressor variables. Later, Glasser [1967] proposed an exponential model in which the logarithm of the exponential parameter was assumed to be a linear function of the regressor variables. In both formulations the rather stringent assumption of a constant hazard may be dropped by the assumption of a more general response-time distribution such as the Weibull, gamma or Gompertz, each of which contains the exponential as a special case. The nonparametric model proposed by Cox [1972] admits an arbitrary response-time distribution and, for discrete data, becomes a logistic regression model. An alternative version of Cox's discrete model has beenl proposed by Kalbfleisch and Prentice [1973]. These approaches have the advantage of not specifying the hazard function in advance and, as such, are more robust than the above parametric methods. Their major drawback, however, is the computational difficulties in the presence of tied response times. In many practical situations the data are recorded in such a way as to make this a very real problem and serious enough to implv that an alternative procedure may be desirable. This logistic regression model was also used by Myers et al. [1973] in conjunction with the assumption of a constant hazard. The model they considered incorporated concomitant information by assuming that the probability of responding within a unit time period followed a logistic regression function, while the actual time to response followed a particular distributional form. They chose a form which assumed a time-independent risk of responding-the exponential for a continuous time process or geometric for discrete time. This approach was extended by Hankey and Mantel [1974] by the addition of a time function to the logistic regression function. This tinme function was approximated by a low order polynomial. Inherent in these exponential and logistic regression models is the assumption that the effects of the covariates are independent of time. The exponential model of Feigl and Zelen relates the expected survival time to the concomitant information and, since the exponential distribution is "without memory," the expected remaining survival timne given survival up to some time point T has the same relationship to the concomitant information no matter what the value of T. The logistic regression methods that have been proposed allow the underlying hazard to be a function of time but the relative effects of the covariates

â€¢â€¢

â€¢â€¢

TL;DR: A new technique for making inferences about finite population "parameters' is developed and shown to be applicable for any survey design, including the estimation of strata- and population means in stratified sampling.

Abstract: Frequently it is reasonable for a sample surveyor to view the finite population of interest as an independent sample of size N from an infinite super-population. This super-population viewpoint is contrasted to the classical frequentist theory of finite population sampling and the classical theory of infinite population sampling. A new technique for making inferences about finite population "parameters' is developed and shown to be applicable for any survey design. Two example applications are given: the estimation of strata- and population means in stratified sampling and the use of the so-called regression estimators for the same purpose.

â€¢â€¢

TL;DR: In this article, a new technique called the generalized gap test (GFT) is presented for the detection of multivariate outliers. It is based on the observation that the distribution of the lengths of the edges of minimum spanning trees (based on a matrix of distances between all pairs of points) is quite sensitive to the presernce of observations separated from the main mtultivariate cloud of points.

Abstract: A new technique called the "generalized gap test" for the detection of multivariate outliers is presented. It is based on the observation that the distribution of the lengths of the edges of minimum spanning trees (based on a matrix of distances between all pairs of points) is quite sensitive to the presernce of observations separated from the main mtultivariate cloud of points. If the data are multivariate normal then the distribution of squared edge lengths follows the gamma distribution quite closely. Thus departure from expectation can be detected using gamma quantile plots. A table of critical values is also given for testing whether the maximum squared edge length divided by the mean squared edge length is too large.

â€¢â€¢

TL;DR: In this article, the problem of removing the maximum harvest in terms of number of animals or biomass or any linear combination of the proportions harvested from the separate groups, subject to the conditions that a fixed population size and age structure be maintained after every time interval.

Abstract: SUMMARY Under favourable conditions, natural animal populations have a tendency to increase in numbers. It is, however, possible to remove some animals and maintain a constant population size. If the population is divided into groups by age or stage in the life cycle, different proportions of the various groups can be harvested. This paper examines the problem of removing the maximum harvest in terms of number of animals or biomass or any linear combination of the proportions harvested from the separate groups, subject to the conditions that a fixed population size and age structure be maintained after every time interval. Harvesting before and after reproduction and seasonal variation in vital rates are considered. Also, some attentioii is given to the problem of maximizing a fisheries yield where nets of fixed mesh size take a constant proportion of all individuals larger and no individuals smaller than a certain minimum size.

â€¢â€¢

TL;DR: In this article, the authors compared eleven tests of the null hypothesis Ho: P12 = P13 where pij is the population correlation coefficient between the ith variable Xi and the jth variable Xi; X1 is the dependent variable and X2 and X3 are the independent variables, and where the data consists of a sample from a trivariate normal distribution.

Abstract: This study compares eleven tests of the null hypothesis Ho: P12 = P13 where pij is the population correlation coefficient between the ith variable Xi and the jth variable Xi; X1 is the dependent variable and X2 and X3 are the independent variables, and where the data consists of a sample from a trivariate normal distribution. The methods include the likelihood ratio test, Aitken's test, Hotelling's test, and Williams' test. Asymptotic comparisons of level of significance are made; small sample simulation results are given on level of significance and power. For small or moderate samples, Williams' statistic emerges as the best choice; for very large samples, the likelihood ratio test wouild be preferred except for computational difficulties.

â€¢â€¢

TL;DR: In this article, the authors compared the two solutions, for simplicity, to the case of two groups of data points drawn from a population that consists of a known number of k-variate normal distributions, with the same, unknown, dispersion matrix but different means.

Abstract: Suppose a set of data is believed to have been drawn from a population that consists of a known number of k-variate normal distributions, with the same, unknown, dispersion matrix but different means. It is desired to estimate the parameters of the distributions, and assign the data points to them with minimum probability of misclassification. This is a well-known problem in cluster analysis. Recently, two solutions have been proposed, by Day [1969] and Scott and Symons [1971]. Both are based on ML estimation, but they are not, in general, the same. In this note, the two solutions will be compared, discussion being restricted, for simplicity, to the case of two groups.

â€¢â€¢

TL;DR: Methods are outlined for analyzing data on genotype frequencies at several codominant loci in random mating diploid populations, and a succession of models of assumed independence of gene frequency are fitted, based on those used in multi-dimensional contigency tables, and tests for association, made using likelihood ratios.

Abstract: Methods are outlined for analyzing data on genotype frequencies at several codominant loci in random mating diploid populations. Maximum likelihood (ML) methods are given for estimating chromosomal frequencies. Using these, a succession of models of assumed independence of gene frequency are fitted. These are based on those used in multi-dimensional contigency tables, and tests for association (linkage disequilibrium), made using likelihood ratios. The methods are illustrated with an example.