scispace - formally typeset
Search or ask a question

Showing papers in "Biometrics in 1983"


Journal ArticleDOI
TL;DR: In this article, the authors present an overview of the basic concepts of multivariate analysis, including matrix algebra and random vectors, as well as a strategy for analyzing multivariate models.
Abstract: (NOTE: Each chapter begins with an Introduction, and concludes with Exercises and References.) I. GETTING STARTED. 1. Aspects of Multivariate Analysis. Applications of Multivariate Techniques. The Organization of Data. Data Displays and Pictorial Representations. Distance. Final Comments. 2. Matrix Algebra and Random Vectors. Some Basics of Matrix and Vector Algebra. Positive Definite Matrices. A Square-Root Matrix. Random Vectors and Matrices. Mean Vectors and Covariance Matrices. Matrix Inequalities and Maximization. Supplement 2A Vectors and Matrices: Basic Concepts. 3. Sample Geometry and Random Sampling. The Geometry of the Sample. Random Samples and the Expected Values of the Sample Mean and Covariance Matrix. Generalized Variance. Sample Mean, Covariance, and Correlation as Matrix Operations. Sample Values of Linear Combinations of Variables. 4. The Multivariate Normal Distribution. The Multivariate Normal Density and Its Properties. Sampling from a Multivariate Normal Distribution and Maximum Likelihood Estimation. The Sampling Distribution of 'X and S. Large-Sample Behavior of 'X and S. Assessing the Assumption of Normality. Detecting Outliners and Data Cleaning. Transformations to Near Normality. II. INFERENCES ABOUT MULTIVARIATE MEANS AND LINEAR MODELS. 5. Inferences About a Mean Vector. The Plausibility of ...m0 as a Value for a Normal Population Mean. Hotelling's T 2 and Likelihood Ratio Tests. Confidence Regions and Simultaneous Comparisons of Component Means. Large Sample Inferences about a Population Mean Vector. Multivariate Quality Control Charts. Inferences about Mean Vectors When Some Observations Are Missing. Difficulties Due To Time Dependence in Multivariate Observations. Supplement 5A Simultaneous Confidence Intervals and Ellipses as Shadows of the p-Dimensional Ellipsoids. 6. Comparisons of Several Multivariate Means. Paired Comparisons and a Repeated Measures Design. Comparing Mean Vectors from Two Populations. Comparison of Several Multivariate Population Means (One-Way MANOVA). Simultaneous Confidence Intervals for Treatment Effects. Two-Way Multivariate Analysis of Variance. Profile Analysis. Repealed Measures, Designs, and Growth Curves. Perspectives and a Strategy for Analyzing Multivariate Models. 7. Multivariate Linear Regression Models. The Classical Linear Regression Model. Least Squares Estimation. Inferences About the Regression Model. Inferences from the Estimated Regression Function. Model Checking and Other Aspects of Regression. Multivariate Multiple Regression. The Concept of Linear Regression. Comparing the Two Formulations of the Regression Model. Multiple Regression Models with Time Dependant Errors. Supplement 7A The Distribution of the Likelihood Ratio for the Multivariate Regression Model. III. ANALYSIS OF A COVARIANCE STRUCTURE. 8. Principal Components. Population Principal Components. Summarizing Sample Variation by Principal Components. Graphing the Principal Components. Large-Sample Inferences. Monitoring Quality with Principal Components. Supplement 8A The Geometry of the Sample Principal Component Approximation. 9. Factor Analysis and Inference for Structured Covariance Matrices. The Orthogonal Factor Model. Methods of Estimation. Factor Rotation. Factor Scores. Perspectives and a Strategy for Factor Analysis. Structural Equation Models. Supplement 9A Some Computational Details for Maximum Likelihood Estimation. 10. Canonical Correlation Analysis Canonical Variates and Canonical Correlations. Interpreting the Population Canonical Variables. The Sample Canonical Variates and Sample Canonical Correlations. Additional Sample Descriptive Measures. Large Sample Inferences. IV. CLASSIFICATION AND GROUPING TECHNIQUES. 11. Discrimination and Classification. Separation and Classification for Two Populations. Classifications with Two Multivariate Normal Populations. Evaluating Classification Functions. Fisher's Discriminant Function...nSeparation of Populations. Classification with Several Populations. Fisher's Method for Discriminating among Several Populations. Final Comments. 12. Clustering, Distance Methods and Ordination. Similarity Measures. Hierarchical Clustering Methods. Nonhierarchical Clustering Methods. Multidimensional Scaling. Correspondence Analysis. Biplots for Viewing Sample Units and Variables. Procustes Analysis: A Method for Comparing Configurations. Appendix. Standard Normal Probabilities. Student's t-Distribution Percentage Points. ...c2 Distribution Percentage Points. F-Distribution Percentage Points. F-Distribution Percentage Points (...a = .10). F-Distribution Percentage Points (...a = .05). F-Distribution Percentage Points (...a = .01). Data Index. Subject Index.

10,148 citations


Journal ArticleDOI
TL;DR: In this article, the authors present an overview of the main issues in epidemiology research and propose a method for controlling extraneous factors in the context of epidemiological studies, using Logistic Regression with Interaction, Effect Modification, and synergy.
Abstract: Key Issues in Epidemiologic Research: An Overview. OBJECTIVES AND METHODS OR EPIDEMIOLOGIC RESEARCH. Fundamentals of Epidemiologic Research. Types of Epidemiologic Research. Design Options in Observational Studies. Typology of Observational Study Designs. Measures of Disease Frequency: Incidence. Other Measures of Disease Frequency. Measures of Association. Measures of Potential Impact and Summary of the Measures. VALIDITY OF EPIDEMIOLOGIC RESEARCH. Validity: General Considerations. Selection Bias. Information Bias. Confounding. Confounding Involving Several Risk Factors. PRINCIPLES AND PROCEDURES OF EPIDEMIOLOGIC ANALYSIS. Statistical Inferences About Effect Measures: Simple Analysis. Overview of Options for Control of Extraneous Factors. Stratified Analysis. Matching in Epidemiologic Studies. Interaction, Effect Modification, and Synergism. Modeling: Theoretical Considerations. Modeling: Analysis Strategy. Applications of Modeling with No Interaction. Applications of Logistic Regression with Interaction, Using Unconditional ML Estimation. Applications of Modeling: Conditional Likelihood Estimation. Appendices. Index.

3,179 citations




Journal ArticleDOI

1,818 citations


Journal ArticleDOI
TL;DR: This paper presents a meta-modelling framework for estimating the probabilities of different types of population sizes using a simple linear Regression model, and some examples show how this model can be modified to accommodate diverse population sizes.

1,406 citations




Journal ArticleDOI
TL;DR: A formula is derived for determining the number of observations necessary to test the equality of two survival distributions when concomitant information is incorporated, and this formula should be useful in designing clinical trials with a heterogeneous patient population.
Abstract: A formula is derived for determining the number of observations necessary to test the equality of two survival distributions when concomitant information is incorporated. This formula should be useful in designing clinical trials with a heterogeneous patient population. Schoenfeld (1981, Biometrika 68, 316-319) derived the asymptotic power of a class of statistics used to test the equality of two survival distributions. That result is extended to the case where concomitant information is available for each individual and where the proportional-hazards model holds. The loss of efficiency caused by ignoring concomitant variables is also computed.

852 citations


Journal ArticleDOI

698 citations


Journal ArticleDOI
TL;DR: In the assessment of the statistical properties of a diagnostic test, for example the sensitivity and specificity of the test, it is common to derive estimates from a sample limited to those cases for whom subsequent definitive disease verification is obtained.
Abstract: SUMMARY In the assessment of the statistical properties of a diagnostic test, for example the sensitivity and specificity of the test, it is common to derive estimates from a sample limited to those cases for whom subsequent definitive disease verification is obtained. Omission of nonverified cases can seriously bias the estimates. In order to adjust the estimates it is necessary to make assumptions about the mechanism for selecting cases for verification. Methods for making the necessary adjustments can then be derived.

Journal ArticleDOI
TL;DR: An exact expression is given for the jackknife estimate of the number of species in a community and for the variance of this number when quadrat sampling procedures are used.
Abstract: An exact expression is given for the jackknife estimate of the number of species in a community and for the variance of this number when quadrat sampling procedures are used. The jackknife estimate is a function of the number of species that occur in one and only one quadrat. The variance of the number of species can be constructed, as can approximate two-sided confidence intervals. The behavior of the jackknife estimate, as affected by quadrat size, sample size and sampling area, is investigated by simulation.

Journal ArticleDOI
TL;DR: In models for vital rates which include effects due to age, period and cohort, there is aliasing due to a linear dependence among these three factors both when age and period intervals are equal and when they are not.
Abstract: In models for vital rates which include effects due to age, period and cohort, there is aliasing due to a linear dependence among these three factors This dependence arises both when age and period intervals are equal and when they are not One solution to the dependence is to set an arbitrary constraint on the parameters Estimable functions of the parameters are invariant to the particular constraint applied For evenly spaced intervals, deviations from linearity are estimable but only a linear function of the three slopes is estimable When age and period intervals have different widths, further aliasing occurs It is assumed that the number of deaths in the numerator of the rate equation has a Poisson distribution The calculations are illustrated with data on mortality from prostate cancer among nonwhites in the US

Journal ArticleDOI
TL;DR: The method is illustrated by using a nonlinear model, derived from the multistage theory of carcinogenesis, to analyze lung cancer death rates among British physicians who were regular cigarette smokers.
Abstract: Models are considered in which the underlying rate at which events occur can be represented by a regression function that describes the relation between the predictor variables and the unknown parameters. Estimates of the parameters can be obtained by means of iteratively reweighted least squares (IRLS). When the events of interest follow the Poisson distribution, the IRLS algorithm is equivalent to using the method of scoring to obtain maximum likelihood (ML) estimates. The general Poisson regression models include log-linear, quasilinear and intrinsically nonlinear models. The approach considered enables one to concentrate on describing the relation between the dependent variable and the predictor variables through the regression model. Standard statistical packages that support IRLS can then be used to obtain ML estimates, their asymptotic covariance matrix, and diagnostic measures that can be used to aid the analyst in detecting outlying responses and extreme points in the model space. Applications of these methods to epidemiologic follow-up studies with the data organized into a life-table type of format are discussed. The method is illustrated by using a nonlinear model, derived from the multistage theory of carcinogenesis, to analyze lung cancer death rates among British physicians who were regular cigarette smokers.

Journal ArticleDOI
TL;DR: In this paper, the results of Aitchison (1955, Journal of the American Statistical Association 50, 901-908) on the estimation of the mean and variance of a distribution with a discrete probability mass at zero are applied and extended to give an estimate of the variance associated with the estimate.
Abstract: The data from marine surveys often contain a large proportion of zeros. Treating the zeros separately can lead, in some cases, to more efficient estimators of abundance. To this end, the results of Aitchison (1955, Journal of the American Statistical Association 50, 901-908) on the estimation of the mean and variance of a distribution with a discrete probability mass at zero are applied and extended to give an estimate of the variance associated with the estimate of the mean. It is shown that under some conditions these are minimum variance unbiased estimators. The case in which the nonzero values are lognormally distributed is examined in detail and applied to an ichthyoplankton survey.


Journal ArticleDOI
TL;DR: The properties of this method are described, including an approach based on likelihood, which leads to a generalization to the cases where the groups have a factorial structure or where covariates are available for each individual.
Abstract: The observed mortality of a group of individuals often needs to be compared with that expected from the death rates of the national population, with allowance made for age and period. Expected deaths are usually calculated by the subject-years method (Case and Lea, 1955, British Journal of Preventive and Social Medicine 9, 62-72), in which each person is assumed at risk up to the date of the analysis, the date of death, or the date the person was lost to follow-up, whichever is first. Some of the properties of this method are described, including an approach based on likelihood. For this purpose the observed number of deaths may be treated as though it were a Poisson variable. The likelihood approach leads to a generalization to the cases where the groups have a factorial structure or where covariates are available for each individual. The calculations are readily carried out by use of GLIM or GENSTAT.

Journal ArticleDOI
TL;DR: An approach that seeks to combine data from a group of individuals in order to improve the estimates of individual growth parameters is extended, incorporating growth-related covariates into the model.
Abstract: SUMMARY The analysis of growth curves has long been important in biostatistics. Work has focused on two problems: the estimation of individual curves based on many data points, and the estimation of the mean growth curve for a group of individuals. This paper extends a recent approach that seeks to combine data from a group of individuals in order to improve the estimates of individual growth parameters. Growth is modeled as polynomial in time, and the group model is also linear, incorporating growth-related covariates into the model. The estimation used is empirical Bayes. The estimation formulas are illustrated with a set of data on rat growth, originally presented by Box (1950, Biometrics 6, 362-389).



Journal ArticleDOI
TL;DR: A new class of group sequential procedures for clinical trials is introduced, and the use of these procedures is illustrated by reference to a recently completed comparative study.
Abstract: In this paper a new class of group sequential procedures for clinical trials is introduced, and the use of these procedures is illustrated by reference to a recently completed comparative study. In a group sequential trial the decision to stop or to continue is made at regular intervals throughout the trial, but not as frequently as after every patient response. This more practical formulation retains most of the advantages of sequential analysis, particularly the economy in sample size. Comparisons are made with group sequential designs derived from the repeated significance test.

Journal ArticleDOI
TL;DR: There are three definitions of heritability and it is important to distinguish among these carefully in order to avoid misinterpretations.
Abstract: The term 'heritability', which evokes the image of transmission from parents to children, is used in biology to characterize the resemblance of related individuals in terms of a given characteristic, and to analyse the genetic and environmental causes of this resemblance. In fact, there are three definitions of heritability and it is important to distinguish among these carefully in order to avoid misinterpretations. Various techniques for measuring associated parameters are linked to these definitions. A rigorous analysis of the assumptions which permit the interpretation of parameter estimates is necessary to avoid false conclusions.

Journal ArticleDOI
TL;DR: In this article, a growth model based on a stochastic differential equation related to the Bertalanffy-Richards growth model, and a measurement-error com- ponent is presented.
Abstract: SUMMARY A brief review of the uses and methods of height-growth prediction in forestry is given in ?2. In ?3 the proposed growth model is presented; it consists of a stochastic differential equation related to the Bertalanffy-Richards growth model, and a measurement-error com- ponent. Explicit expressions and an efficient computational procedure for the likelihood function are obtained. In ?4 a method for the simultaneous maximum likelihood estimation of global and local parameters is outlined. The log likelihood function is maximized by a modified Newton method. The special structure of the problem is exploited in order to handle the very large number of variables involved in the optimization. The approach presented has been success- fully implemented, and some computational experience is reported in ?5.

Journal ArticleDOI
TL;DR: In this paper, two alternative models relating pollen counts to relative tree abundances are compared by using surface pollen data and forest inventory data from Wisconsin and Upper Michigan; the models explain 70-80% of the variance in the pollen data.
Abstract: SUMMARY Past forest composition may, in principle, be reconstructed from the spectra of pollen found in peat or lake sediment, but pollen analysts normally use semiquantitative methods, and quantitative approaches need development. Models must be devised for the pollen-vegetation relationship, and appropriate statistical methods must be used to fit such models to data on modern pollen deposition (pollen spectra from surface samples) and contemporary forest composition. Each taxon's pollen deposition rate is assumed to be an independent linear function of its abundance within a fixed distance of the depositional site. Pollen data are assumed to consist of relative counts, rather than estimates of absolute deposition rates. Different approximations lead to two alternative models relating pollen counts to relative tree abundances. Maximum likelihood estimates and standard deviations are obtained numerically for the parameters of both models. The models are illustrated and compared by using surface pollen data and forest inventory data from Wisconsin and Upper Michigan; the models explain 70-80% of the variance in the pollen data and give interpretable and congruent results.

Journal ArticleDOI
TL;DR: The multiprocess Kalman filter offers a powerful general framework for the modelling and analysis of noisy time series which are subject to abrupt changes in pattern and can be used to provide on-line probabilities of whether changes have occurred as well as to identify the type of change that is involved.
Abstract: SUMMARY The multiprocess Kalman filter offers a powerful general framework for the modelling and analysis of noisy time series which are subject to abrupt changes in pattern. It has considerable potential application to many forms of biological series used in clinical monitoring. In particular, the approach can be used to provide on-line probabilities of whether changes have occurred, as well as to identify the type of change that is involved. In this paper, we extend and illustrate the methodology within the context of a particular case study. The general features of the problem, and the approach adopted, will be seen to have wide application.

Journal ArticleDOI
TL;DR: The problem of finding robust estimators of population size in closed K-sample capture-recapture experiments is considered and a general estimation procedure is given which does not depend on any assumptions about the form of the distribution of capture probabilities.
Abstract: In this paper the problem of finding robust estimators of population size in closed K-sample capture-recapture experiments is considered. Particular attention is paid to models where heterogeneity of capture probabilities is allowed. First, a general estimation procedure is given which does not depend on any assumptions about the form of the distribution of capture probabilities. This is followed by a detailed discussion of the usefulness of the generalized jackknife technique to reduce bias. Numerical comparisons of the bias and variance of various estimators are given. Finally, a general discussion is given with several recommendations on estimators to be used in practice.



Journal ArticleDOI
Abstract: SUMMARY A theory for unbiased estimation of the total of arbitrary particle characteristics in line-intercept sampling, for transects of fixed and of random length, is presented. This theory unifies present lineintercept sampling results. Examples are given and variance estimation is discussed. 1. Introduction and Literature Review Line-intercept sampling (LIS) is a method of sampling particles in a region whereby, roughly, a particle is sampled if a chosen line segment, called a 'transect', intersects the particle. It has the advantage over 'quadrat sampling' in that there is no need to delineate the quadrats and determine which objects are in each quadrat. Examples of the economics of LIS versus quadrat sampling can be found in Canfield (1941), Bauer (1943), Warren and Olsen (1964), and Bailey (1970). The particles may represent plants, shrubs, tree crowns, nearly stationary animals, animal dens or signs, roads, logs, forest debris, particles on a microscope slide, particles in a plane section of a rock or metal sample, etc. In early biological applications, sampling with a transect appears to have been a purposivesampling technique for studying how vegetation varies with changing environment, with the transect running perpendicular to the zonation (Weaver and Clements, 1929). In the study of range vegetation, Canfield (1941) incorporated random placement of the transect and, by taking the proportion of the sampled transect intercepted by the vegetation, obtained an unbiased estimator of coverage that is, the ratio of the area covered by the vegetation to the area of the region of interest. However, he did not prove the unbiasedness of this estimator. Canfield called this method the 'line-interception method'. He also discussed such design questions as how many lines of what length are required and whether or not the area of interest should be stratified. Bauer (1943) compared transect sampling to quadrat sampling in an area of dense chapparal vegetation and in a laboratory experiment. He concluded that '. . . transect sampling deserves much wider use. . .'. McIntyre (1953) investigated the possibility of using data on intercept lengths in order to estimate not only coverage, but also density-that is, the ratio of the number of particles to the area of the region of interest. He was able to do this for populations which consisted of particles that were all magnifications of a known shape. Lucas and Seber (1977) presented and proved the unbiasedness of estimators of particle density and coverage for arbitrarily shaped and located particles when the transect is randomly placed. Their estimator of coverage is the same as that of Canfield (1941). Eberhardt (1978) reviewed three transect methods for use in ecological sampling: LIS, and two methods, 'line-transect sampling' and 'strip-transect sampling', in which the particles are points and the probability of observing a particle is a function of its perpendicular distance

Journal ArticleDOI
TL;DR: A way of finding a parameterized form for the correlated error structure by examining the residuals from an ordinary least squares regression is suggested and such a model is then fitted by using maximum likelihood.
Abstract: In order to provide clues to the aetiology of a disease mortality indices for different areas are often related to explanatory variables by using multiple regression. However mortality in nearby areas may be similar for reasons not attributable to the covariates so the errors will not be independent. This paper suggests a way of finding a parameterized form for the correlated error structure by examining the residuals from an ordinary least squares regression. Such a model is then fitted by using maximum likelihood. An example based on cardiovascular mortality in British towns is used to illustrate the problems and [the] solution. (summary in FRE) (EXCERPT)