scispace - formally typeset
Search or ask a question

Showing papers in "Biometrics in 1978"


Journal Article•DOI•
TL;DR: In this paper, the authors present a survey of statistical and data analysis methods for probability distributions and their application to statistical quality control problems, including one and two Sided Tests of Hypotheses.
Abstract: 1. Introduction to Statistics and Data Analysis 2. Probability 3. Random Variables and Probability Distributions 4. Mathematical Expectations 5. Some Discrete Probability Distributions 6. Some Continuous Probability Distributions 7. Functions of Random Variables (optional) 8. Fundamental Distributions and Data Description 9. One and Two Sample Estimation Problems 10. One and Two Sided Tests of Hypotheses 11. Simple Linear Regression 12. Multiple Linear Regression 13. One Factor Experiments: General 14. Factorial Experiments (Two or More Factors) 15. 2k Factorial Experiments and Fractions 16. Nonparametric Statistics 17. Statistical Quality Control 18. Bayesian Statistics

1,984 citations


Journal Article•DOI•
TL;DR: It is argued that the problem of estimation of failure rates under the removal of certain causes is not well posed until a mechanism for cause removal is specified, and a method involving the estimation of parameters that relate time-dependent risk indicators for some causes to cause-specific hazard functions for other causes is proposed for the study of interrelations among failure types.
Abstract: Distinct problems in the analysis of failure times with competing causes of failure include the estimation of treatment or exposure effects on specific failure types, the study of interrelations among failure types, and the estimation of failure rates for some causes given the removal of certain other failure types. The usual formation of these problems is in terms of conceptual or latent failure times for each failure type. This approach is criticized on the basis of unwarranted assumptions, lack of physical interpretation and identifiability problems. An alternative approach utilizing cause-specific hazard functions for observable quantities, including time-dependent covariates, is proposed. Cause-specific hazard functions are shown to be the basic estimable quantities in the competing risks framework. A method, involving the estimation of parameters that relate time-dependent risk indicators for some causes to cause-specific hazard functions for other causes, is proposed for the study of interrelations among failure types. Further, it is argued that the problem of estimation of failure rates under the removal of certain causes is not well posed until a mechanism for cause removal is specified. Following such a specification, one will sometimes be in a position to make sensible extrapolations from available data to situations involving cause removal. A clinical program in bone marrow transplantation for leukemia provides a setting for discussion and illustration of each of these ideas. Failure due to censoring in a survivorship study leads to further discussion.

1,429 citations


Journal Article•DOI•
TL;DR: Application to breast cancer data, from the National Cancer Institute-sponsored End Results Group, indicates that previously noted race differences in breast cancer survival times are explained to a large extent by differences in disease extent and other demographic characteristics at diagnosis.
Abstract: Use of the proportional hazards regression model (Cox 1972) substantially liberalized the analysis of censored survival data with covariates. Available procedures for estimation of the relative risk parameter, however, do not adequately handle grouped survival data, or large data sets with many tied failure times. The grouped data version of the proportional hazards model is proposed here for such estimation. Asymptotic likelihood results are given, both for the estimation of the regression coefficient and the survivor function. Some special results are given for testing the hypothesis of a zero regression coefficient which leads, for example, to a generalization of the log-rank test for the comparison of several survival curves. Application to breast cancer data, from the National Cancer Institute-sponsored End Results Group, indicates that previously noted race differences in breast cancer survival times are explained to a large extent by differences in disease extent and other demographic characteristics at diagnosis.

1,332 citations


Journal Article•DOI•
TL;DR: Getting the screenshots prepared is a good approach that might time savings but having screenshots already prepared in addition to callouts, explanations, and annotations is an excellent approach which you will save much longer.
Abstract: Getting the screenshots prepared is a good approach that might time savings. But having screenshots already prepared in addition to callouts, explanations, and annotations is an excellent approach which you will save much longer. More than likely, you've all that stuff prepared inside your biometrical methods in quantitative genetic analysis but it's not very polite to deal to the person: \"Read that fantastic manual\". User may do not know the location where the manual is on the PC as well as on what page is the looked-for solution located. On the other hand, you may also have zero time and energy to manually cut particular pages with screenshots and related instructions from your manual also to attach them to each support message.

796 citations


Journal Article•DOI•
TL;DR: In this paper, the authors examined some general models leading to weighted distributions with weight functions not necessarily bounded by unity, including probability sampling in sample surveys, additive damage models, visibility bias dependent on the nature of data collection and two-stage sampling.
Abstract: When an investigator records an observation by nature according to a certain stochastic model, the recorded observation will not have the original distribution unless every observation is given an equal chance of being recorded. A number of papers have appeared during the last ten years implicitly using the concepts of weighted and size-biased sampling distributions. In this paper, we examine some general models leading to weighted distributions with weight functions not necessarily bounded by unity. The examples include: probability sampling in sample surveys, additive damage models, visibility bias dependent on the nature of data collection and two-stage sampling. Several important distributions and their size-biased forms are recorded. A few theorems are given on the inequalities between the mean values of two weighted distributions. The results are applied to the analysis of data relating to human populations and wildlife management. For human populations, the following is raised and discussed: Let us ascertain from each male student in a class the number of brothers, including himself, and sisters he has and denote by k the number of students and by B and S the total numbers of brothers and sisters. What would be the approximate values of B/(B+S), the ratio of brothers to the total number of children, and (B+S)/k, the average number of children per family? It is shown that B/(B+S) will be an overestimate of the proportion of boys among the children per family in the general population which is about half, and similarly (B+S)/k is biased upwards as an estimate of the average number of children per family in the general population. Some suggestions are offered for the estimation of these population parameters. Lastly, for the purpose of estimating wildlife population density, certain results are formulated within the framework of quadrat sampling involving visibility bias.

492 citations


Journal Article•DOI•

436 citations


Journal Article•DOI•
TL;DR: In this article, a multinomial model is proposed and a new estimator is developed, where the likelihood density of the probability of capture is weighted with a beta prior, and the new method, unlike previous methods, does not fail for aS?y eateh veetor thus avoiding the substitution of the total estimate of N when infinite7 estimates occur.
Abstract: Summary The theory leading to the maximum likelihood (ML) estimation of population size from removal data is reviewed. The assumptions of the removal method are that ehanges in population size occur only through eapture, and the probability of eapture is equal for all individuals in a population during the removal sequenee. A modifieation of the multinomial model is proposed and a new estimator developed. In the new model the likelihood density of the probability of capture is weighted with a beta prior. The ease where oe = d = 1 (uniform prior) is eompared with ML estimation andfornd to have lower bias and varianee. The new method, unlike previous methods, does not fail for aS?y eateh veetor thus avoiding the substitution of the total eateh for the estimate of N when infinite7 estimates occur. The assumptions that result from applying large sample theory while estimQting the varianee of ML estimates are reviewed, and a eondition presentedfor the inadequacy of avymptotie varianee formulae when using the weighted estimator (oe = F = 1). Examples illustrating the use of the new method are given, one example illustrates the use of the new method when previous methods fail. Various assumption violations are investigated and the new method is found to be more robust against the violation of assumptions than previous methods.

404 citations


Journal Article•DOI•
TL;DR: Fisher's "exact" (one-sided) test may be used to test the null hypothesis p1 = p2, against the alternative hypothesis that pl > p2.
Abstract: If x and y are each binomially distributed with index n and parameters pl and p2 respectively, then the comparison of these two binomial distributions is usually displayed as a 2 X 2 table and Fisher's "exact" (one-sided) test may be used to test the null hypothesisp1 = p2, against the alternative hypothesis that pl > p2. The exact test is based on arguing conditionally on the observed number of"successes", i.e., x + y, see Yates (1934) and Fisher (1935). The distribution of x with x + y = m fixed depends on pl and P2 only through the odds ratio 0 = (p1q2)/(q,p2), where qi 1-Pi rhe conditional distribution is Pr (x 1 0) C (n,x) C (n,y) AX/zzC(n,i)C(n,m-i)0i, where i takes the values L = max(O, m-n) to U = min(n, m). Let xc be the critical value of x for the exact test of P1 > P2 (i.e., 0 > 1 ) against the null hypothesis 0 = 1 with type I error oe, so that

397 citations


Journal Article•DOI•

388 citations


Journal Article•DOI•
TL;DR: In this paper, three methods of obtaining confidence intervals for the risk ratio offailure in two independent binomial samples are compared, and it is concluded that Method A is reasonable but conservative, Method B is erratic and should not be used, aszd Method C is reasonable and less conservative than Method A.
Abstract: Three methods of obtaining confidence intervals for the risk ratio offailure in two independent binomial samples are compared. The three are (A) the method of Thomas and Gart (1977), (B) an adaptation of the method of Fieller using the normal distribution, and (C) a proposed method using a logarithmic transformation. On the basis of extensive simulations we have concluded that Method A is reasonable but conservative, Method B is erratic and should not be used, aszd Method C is reasonable and less conservative than Method A. Method C, cotnputationally the simplest, is recommended.

378 citations



Journal Article•DOI•
TL;DR: A type of correlated binomial model is proposed for use in certain toxicological experiments with laboratory animals where the outcome of interest is the occurrence of dead or malformed fetuses in a litter.
Abstract: In certain toxicological experiments with laboratory animals, the outcome of interest is the occurrence of dead or malformed fetuses in a litter. Previous investigations have shown that the simple one-parameter binomial and Poisson models generally provide poor fits to this type of binary data. In this paper, a type of correlated binomial model is proposed for use in this situation. First, the model is described in detail and is compared to a beta-binomial model proposed by Williams (1975). These two-parameter models are then contrasted for goodness of fit to some real-life data. Finally, numerical examples are given in which likelihood ratio tests based on these models are employed to assess the significance of treatment-control differences.

Journal Article•DOI•
TL;DR: In this article, the probability that the estimated between-group covariance matrix is not positive deJinite is computed for the balanced single classlfication multivariate analysis of variance with random effects.
Abstract: The probability (Q) that the estimated between-group covariance matrix is not positive deJinite is computed for the balanced single classlfication multivariate analysis of variance with random effects. It is shown that Q depends only on the roots of the matrix product of the inverse of the true within-group and the true between-group covariance matrices which, for independent variables, reduces to expressions in intra-class correlations. Values of Q are computedfor ranges of size of experiment, intra-class correlation and number of variables. Even for large experiments, Q can approach 100% if there are many variables, for example with 160 groups of size 10 and either 8 independent variables each with intra-class 0.025 or 14 variables each with intra-class correlation 0.0625. Some rationalization of the results is given in terms of the bias in the roots of the sample between-group covariance matrix. In genetic applications, the between-group covariance matrix is proportional to the genetic covariance matrix, if non-positive definite, heritabilities and ordinary or partial genetic correlations are outside their valid limits, and the effiect on selection index construction is discussed.

Journal Article•DOI•
TL;DR: In this paper, the authors consider some nonnormal regression situations in which there are many regressor variables, and it is desired to determine good fitting models, according to the value of the likelihood ratio statistic for tests of submodels against the full model.
Abstract: We consider some nonnormal regression situations in which there are many regressor variables, and it is desired to determine good fitting models, according to the value of the likelihood ratio statistic for tests of submodels against the full model. EfXicient computational algorithms for the normal linear model are adopted for use with nonnormal models. Even with as many as 10-15 regressor variables present, wefind it is often possible to determine all of the betterfitting models with relatively small amounts of computer time. The use of the procedures is illustrated on exponential, Poisson and binary regression models.


Journal Article•DOI•
TL;DR: The conclusion is that the heredity-IQ controversy has been a "tale full of sound and fury, signifying nothing" and the idea that there are racial-genetic differences in mental abilities and behavioral traits of humans is, at best, no more than idle speculation.
Abstract: In this paper the nature of the reasoning processes applied to the nature-nurture question is discussed in general and with particular reference to mental and behavioral traits. The nature of data analysis and analysis of variance is discussed. Necessarily, the nature of causation is considered. The notion that mere data analysis can establish "real" causation is attacked. Logic of quantitative genetic theory is reviewed briefly. The idea that heritability is meaningful in the human mental and behavioral arena is attacked. The conclusion is that the heredity-IQ controversy has been a "tale full of sound and fury, signifying nothing". To suppose that one can establish effects of an intervention process when it does not occur in the data is plainly ludicrous. Mere observational studies can easily lead to stupidities, and it is suggested that this has happened in the heredity-IQ arena. The idea that there are racial-genetic differences in mental abilities and behavioral traits of humans is, at best, no more than idle speculation.

Book•DOI•

Journal Article•DOI•
TL;DR: A method is presented for evaluating the power of'statistical tests (based on the F-ratio) for the detection of linkage in a segregating population, between a marker locus and a locus affecting a quantitative trait.
Abstract: A method is presented for evaluating the power of'statistical tests (based on the F-ratio) Jfr the detection of'linkage in a segregating population, between a marker locus and a locus affecting a quantitative trait. For a quantitative locus generating 0.01 of the total phenotypic variance at least 10, 000-20, 000 offspring divided among 10-100 families are required oJr a po wer of 0. 90. A given decrease in family size generally requires a more than equivalent increase in number of jamnilies jor equal power. Power is drastically reduced if the probability of'recoinbination between marker and quantitative locus exceeds 0.10-0.15. Gene frequency and dominance at the quantitative locus have little effect on power, except when the num-nber of families is small. Dominance at the marker locus, or marker gene frequencies other than 0.50, will decrease power for given lamilv size.

Journal Article•DOI•
TL;DR: It is shown that the standard slope ratio and parallel line models for bioassay can be considered as approximations to the logistic in the extreme dose regions, while the parallel line model can be expected to fit in the middle region.
Abstract: Bioassays with a quantitative response showing a sigmoid log-dose relationship can be analysed by fitting a non-linear dose-response model directly to the data. It is demonstrated that the four-parameter logistic model, previously applied to immunoassay (Healy 1972), is applicable to the free fat cell bioassay of insulin (Moody, Stan, Stan and Gliemann 1974). It is shown that the standard slope ratio and parallel line models for bioassay can be considered as approximations to the logistic in the extreme dose regions, while the parallel line model can be expected to fit in the middle region. The full statistical analysis of the four-parameter logistic model applied to a general assay design is described. An APL computer program has been developed to facilitate the calculations, which include non-linear curve-fitting, tests of goodness of fit and parallelity, as well as point and interval estimates of the relative potency. Examples of free fat cell bioassays of insulin that have been analysed according to these methods are given. Efficient estimation of the potency calls for concentrating the doses in the region with the steepest slope of the dose-response curve. With respect to testing the parallelity and to allow for assay-to-assay variability and unpredictable potencies, it may be preferable to use an assay design with doses distributed over a wide range and to apply a dose-response model which, like the four-parameter logistic, is capable of fitting over the whole feasible dose range.




Journal Article•DOI•
TL;DR: In a multidimensional contingency table strategies have been proposed to build log-linear models using either stepwise methods or standardized estimates of the parameters of the saturated model.
Abstract: In a multidimensional contingency table strategies have been proposed to build log-linear models using either stepwise methods or standardized estimates of the parameters of the saturated model. Brown (1976) proposed a two-step procedure to screen effects and then test a subset of models. Alternate methods of model building are discussed with respect to the final choice of model and with respect to intermediate information available to the data analyst during the selection process.

Journal Article•DOI•
TL;DR: A comparison is made between two different approaches to the linear logistic regression analysis of retrospective study data: the prospective model wherein the dependent variable is a dichotomous indicator of case/control status; and the retrospective model wherein it is a binary or polychotomous classification of exposure.
Abstract: A comparison is made between two different approaches to the linear logistic regression analysis of retrospective study data: the prospective model wherein the dependent variable is a dichotomous indicator of case/control status; and the retrospective model wherein the dependent variable is a binary or polychotomous classification of exposure. The two models yield increasingly similar estimates of the relative risk with increasing degrees of covariate adjustment. When the covariate effects are saturated with parameters, the relative risk estimates are identical. The prospective model is generally to be preferred for studies involving multiple quantitative risk factors.

Journal Article•DOI•
TL;DR: Gail and Gart as discussed by the authors presented tables showing the required sample size, n, for the Fisher-Irwin exact conditional test for 2 X 2 tables to achieve a power of at least 0.50, 0.80 and 0.90 against one-sided alternatives at nominal.05 and.01 significance levels.
Abstract: Gail and Gart (1973) present tables showing the required sample size, n, for the Fisher-Irwin exact conditional test for 2 X 2 tables to achieve a power of at least 0.50, 0.80 and 0.90 against one-sided alternatives at the nominal .05 and .01 significance levels. However, calculations for n > 35 were based on the Arc sine approximation, which was found to underestimate the actual required sample size by as much as 34%. The purpose of this note is to revise the Gail-Gart tables, giving exact sample sizes in all cases. The problem of nominal versus actual significance levels when the underlying probability of response is low is also briefly discussed.

Report•DOI•
TL;DR: The work involved in attempting all possible orderings of the variance components is usually prohibitive. as mentioned in this paper proposed a simple algorithm that achieves optimality properties and is nevertheless computationally simple.
Abstract: : An evaluation is attempted of the ever growing methodology in the estimation of variance components. Optimality properties are sometimes achieved at considerable computational efforts. Certain methods depend on a subjective ordering of the components, and if the ordering is unfortunate the method may fail to yield estimates for certain components while with a different ordering all components may well be estimable. The work involved in attempting all possible orderings of the variance components is usually prohibitive. The present method achieves optimality properties and is nevertheless computationally simple. In fact it possesses Minque optimality for a particular choice of norm, but also various other optimality properties and necessary and sufficient conditions for estimability associated with Minque simplify considerably. Moreover we are able to derive sufficient conditions for consistency which also provide estimability conditions of a simpler structure. The consistency of our estimators makes them convenient as starting points for a single ML cycle to obtain asymptotically fully efficient estimates.



Journal Article•DOI•
TL;DR: A likelihood ratio statistic is proposed for testing goodness of fit with grouped data which are subject to random right censoring and it is shown that, under appropriate conditions, this statistic has an asymptotic chi-square distribution which is non-central under contiguous alternatives.
Abstract: A likelihood ratio statistic is proposed for testing goodness of fit with grouped data which are subject to random right censoring. It is shown that, under appropriate conditions, this statistic has an asymptotic chi-square distribution which is non-central under contiguous alternatives. A formula is given for the non-centrality parameter. Two examples are given. The first concerns data from a large scale animal survival study with serial sacrifice where it is attempted to fit the Weibull, Gompertz and exponential power distributions to life length. The second example concerns marijuana usage and this needs an extension of the test to the doubly censored case. Another use of the statistic is to provide a quantitative method of ranking the fit of various proposed models to survival or reliability data.

Journal Article•DOI•
TL;DR: In this paper, the authors compared Simpson's index Y and the InJoirinational index H with a new measure Q based on the interquartile slope of the cumulative species abundance curve.
Abstract: Sununesar Two popular diversity indices, Simpson 's index Y and the InJoirinationl index H, are compared with a new measure Q based onl the inter-quartile slope oJ the cumulative species abundance curve. It is assumed that interest extends to characteriZing the site environlnent: the population present at the instant of sampling is considered to be on/i' one oJ a range oJ possible populations which the site could support. Expressions are derived Jor the expectations and variances of the three sample statistics a, aH, and a} wvhen the species abundances are gainmna variates. Q is a mnore in/cbrinative measure than H or Y. Both H and Y depend greatly onl the ab'unidancces ot the commonest species, which may fluctuate wvidel/ Jroi i'ear to iear. Q depends onl the more stable species with median abundance and discrim.inilates better between sites than H or Y: it can he recommended when sites are to be compared. The expected value oQ/Q is expressed in terns of the parameters of the gainina, lognormal and log-series models and is shown to he mnuch more robust than H or Y to the particular choice ot'model. For the log-series model, E(Q) is represented h ' the parameter a, while Jor the lognormnal E( Q) = 0.3 71 T/u, where T is the number of species in the population and u is the standard deviation oJ logged abundances. aU inay he biased in sinall samples, though the bias should bejairlY sinall whenl more thamn 50% of a species are present in the sample. A bsence oJ sinall sample bias should not he an overriding criterion in selecting a diversityv index since sinall samples wt'ill at best onl/ allow the crudest comparisons bet ween communities. The use of a single index to characterise the pattern of the abundances of different species in a community has obvious appeal and several such measures have been formulated. In practice the diversity is measured for a sample drawn from the community, so it is important that any proposed index is independent of sample size, at least for large samples; this is achieved if the index is based on the species relative abundances. We here study the behaviour of the two most popular measures of diversity, Simpson's index and the Information index, and propose an alternative index which provides a better characterisation of the community. To study the behavior of the diversity sample statistics theoretically, we must make assumptions about the mathematical form of the distribution of species abundances in the community and the nature of the sampling variability. Our assumptions can of course be no more than approximations to reality, but we expect our main conclusions to be fairly robust to deviations from the chosen model. We shall assume the population contains T species (denoted by S* in many ecological