scispace - formally typeset
Search or ask a question

Showing papers in "Biometrics in 1997"


Journal ArticleDOI
TL;DR: A scaled Wald statistic is presented, together with an F approximation to its sampling distribution, that is shown to perform well in a range of small sample settings and has the advantage that it reproduces both the statistics and F distributions in those settings where the latter is exact.
Abstract: Restricted maximum likelihood (REML) is now well established as a method for estimating the parameters of the general Gaussian linear model with a structured covariance matrix, in particular for mixed linear models. Conventionally, estimates of precision and inference for fixed effects are based on their asymptotic distribution, which is known to be inadequate for some small-sample problems. In this paper, we present a scaled Wald statistic, together with an F approximation to its sampling distribution, that is shown to perform well in a range of small sample settings. The statistic uses an adjusted estimator of the covariance matrix that has reduced small sample bias. This approach has the advantage that it reproduces both the statistics and F distributions in those settings where the latter is exact, namely for Hotelling T2 type statistics and for analysis of variance F-ratios. The performance of the modified statistics is assessed through simulation studies of four different REML analyses and the methods are illustrated using three examples.

3,862 citations


Journal ArticleDOI
TL;DR: Van Der Heijden et al. as discussed by the authors used correspondence analysis for the analysis of transitions between more than two time points, where the transition matrix is the product of the margins of the table divided by the total sample size.
Abstract: Correspondence analysis is an exploratory tool for the analysis of associations between categorical variables, the results of which may be displayed graphically. For longitudinal data, two types of analysis can be distinguished: the first focuses on transitions, whereas the second investigates trends. For transitional analysis with two time points, an analysis of the transition matrix (showing the relative frequencies for pairs of categories) provides insight into the structure of departures from independence in the transitions. Transitions between more than two time points can also be studied simultaneously. In trend analyses often the trajectories of different groups are compared. Examples for all these analyses are provided. Correspondence analysis is an exploratory tool for the analysis of association(s) between categorical variables. Usually, the results are displayed in a graphical way. There are many interpretations of correspondence analysis. Here, we make use of two of them. A first interpretation is that the observed categorical data are collected in a matrix, and correspondence analysis approximates this matrix by a matrix of lower rank[1]. This lower rank approximation of, say, rank M + 1 is then displayed graphically in an M-dimensional representation in which each row and each column of the matrix is displayed as a point. The difference in rank between the rank M + 1 matrix and the rank M representation is matrix of rank 1, and this matrix is the product of the marginal counts of the matrix, that is most often considered uninteresting. This brings us to the second interpretation, that is, that when the two-way matrix is a contingency table, correspondence analysis decomposes the departure from a matrix where the row and column variables are independent[2,3]. Thus, correspondence analysis is a tool for residual analysis. This interpretation holds because for a contingency table estimates under the independence model are obtained from the product of the margins of the table divided by the total sample size. Longitudinal data are data where observations (e.g., individuals) are measured at least twice using the same variables. We consider here only categorical (i.e., nominal or ordinal) variables, as only this kind of variables is analyzed in standard applications of correspondence analysis[4]. We first discuss correspondence analysis for the analysis of transitions. Thereafter, we consider analysis of trends with canonical correspondence analysis. 1 Leiden University, Leiden, The Netherlands 2 Utrecht University, Utrecht, The Netherlands Update based on original article by Peter G. M. Van Der Heijden, Wiley StatsRef: Statistics Reference Online, © 2014, John Wiley & Sons, Ltd Wiley StatsRef: Statistics Reference Online, © 2014–2015 John Wiley & Sons, Ltd. This article is © 2015 John Wiley & Sons, Ltd. DOI: 10.1002/9781118445112.stat05497.pub2 1 Correspondence Analysis of Longitudinal Data 1 Transitional Analysis

2,104 citations


Journal ArticleDOI

2,043 citations


BookDOI
TL;DR: Item response theory has become an essential component in the toolkit of every researcher in the behavioral sciences as mentioned in this paper and it provides a powerful means to study individual responses to a variety of stimuli, and the methodology has been extended and developed to cover many different models of interaction.
Abstract: Item response theory has become an essential component in the toolkit of every researcher in the behavioral sciences. It provides a powerful means to study individual responses to a variety of stimuli, and the methodology has been extended and developed to cover many different models of interaction. This volume presents a wide-ranging handbook to item response theory - and its applications to educational and psychological testing. It will serve as both an introduction to the subject and also as a comprehensive reference volume for practitioners and researchers. It is organized into six major sections: the nominal categories model, models for response time or multiple attempts on items, models for multiple abilities or cognitive components, nonparametric models, models for nonmonotone items, and models with special assumptions. Each chapter in the book has been written by an expert of that particular topic, and the chapters have been carefully edited to ensure that a uniform style of notation and presentation is used throughout. As a result, all researchers whose work uses item response theory will find this an indispensable companion to their work and it will be the subject's reference volume for many years to come.

1,878 citations



Journal ArticleDOI
TL;DR: The authors argue that model selection uncertainty should be fully incorporated into statistical inference whenever estimation is sensitive to model choice and that choice is made with reference to the data and suggest strategies for data analysis.
Abstract: We argue that model selection uncertainty should be fully incorporated into statistical inference whenever estimation is sensitive to model choice and that choice is made with reference to the data. We consider different philosophies for achieving this goal and suggest strategies for data analysis. We illustrate our methods through three examples. The first is a Poisson regression of bird counts in which a choice is to be made between inclusion of one or both of two covariates. The second is a line transect data set for which different models yield substantially different estimates of abundance. The third is a simulated example in which truth is known.

1,584 citations


Journal ArticleDOI
TL;DR: This paper considers the analysis of genetic case-control data and recommends that analyses that treat alleles rather than people as observations should not be used, and that such data should be analyzed by genotype.
Abstract: This paper considers the analysis of genetic case-control data. One approach considers the allele frequency in cases and controls. Because each individual has two alleles at any autosomal locus, there will be twice as many alleles as people. Another approach considers the risk of the disease in those who do not have the allele of interest (A), those who have a single copy (heterozygous), and those who are homozygous for A. A third approach does not differentiate between individuals with one or two copies of A. This was common when alleles were determined serologically and one could not distinguish between homozygotes and those with one copy of A and one of an unknown allele. All three approaches have been used in the literature, but this is the first systematic comparison of them. The different interpretations of the odds ratios from such analyses are explored and conditions are given under which the first two approaches are asymptotically equivalent. The chi-squared statistics from the three approaches are discussed. Both the odds ratio and the chi-squared statistic from the analysis that treats alleles rather than genotypes as individual entities are appropriate only when the Hardy-Weinberg equilibrium holds. When the equilibrium holds, the allele-based test statistic is asymptotically equivalent to the test for trend using the genotype data. Thus, analyses that treat alleles rather than people as observations should not be used. Instead, we recommend that such data should be analyzed by genotype.

941 citations


Journal ArticleDOI
TL;DR: This work argues that the Cox proportional hazards regression model method is superior to naive methods where one maximizes the partial likelihood of the Cox model using the observed covariate values and improves on two-stage methods where empirical Bayes estimates of the covariate process are computed and then used as time-dependent covariates.
Abstract: The relationship between a longitudinal covariate and a failure time process can be assessed using the Cox proportional hazards regression model We consider the problem of estimating the parameters in the Cox model when the longitudinal covariate is measured infrequently and with measurement error We assume a repeated measures random effects model for the covariate process Estimates of the parameters are obtained by maximizing the joint likelihood for the covariate process and the failure time process This approach uses the available information optimally because we use both the covariate and survival data simultaneously Parameters are estimated using the expectation-maximization algorithm We argue that such a method is superior to naive methods where one maximizes the partial likelihood of the Cox model using the observed covariate values It also improves on two-stage methods where, in the first stage, empirical Bayes estimates of the covariate process are computed and then used as time-dependent covariates in a second stage to find the parameters in the Cox model that maximize the partial likelihood

911 citations


Journal ArticleDOI
TL;DR: In this article, a class of capture-recapture (CR) models is proposed to describe the presence of transients in natural populations and the relative efficiency of an ad hoc approach to dealing with transients that leaves out the first observation of each animal is compared.
Abstract: The presence of transient animals, common enough in natural populations, invalidates the estimation of survival by traditional capture-recapture (CR) models designed for the study of residents only. Also, the study of transit is interesting in itself. We thus develop here a class of CR models to describe the presence of transients. In order to assess the merits of this approach we examine the bias of the traditional survival estimators in the presence of transients in relation to the power of different tests for detecting transients. We also compare the relative efficiency of an ad hoc approach to dealing with transients that leaves out the first observation of each animal. We then study a real example using lazuli bunting (Passerina amoena) and, in conclusion, discuss the design of an experiment aiming at the estimation of transience. In practice, the presence of transients is easily detected whenever the risk of bias is high. The ad hoc approach, which yields unbiased estimates for residents only, is satisfactory in a time-dependent context but poorly efficient when parameters are constant. The example shows that intermediate situations between strict 'residence' and strict 'transience' may exist in certain studies. Yet, most of the time, if the study design takes into account the expected length of stay of a transient, it should be possible to efficiently separate the two categories of animals.

616 citations



Journal ArticleDOI
TL;DR: In this paper, a concordance correlation coefficient was proposed to evaluate the reproducibility of measurements between two trials of an assay or instrunent and developed an alternative called the concordances correlation coefficient.
Abstract: Lin (1989, Biometrics 45, 255-268) objected to the use of the intraclass correlation coefficient as a way to evaluate the reproducibility of measurements between two trials of an assay or instrunent and developed an alternative called the concordance correlation coefficient. It is noted that intraclass correlation refers not to a single coefficient but to a group of coefficients and that Lin's alternative is nearly identical to a subset of the coefficients in this group.

Journal ArticleDOI
TL;DR: Extensive numerical studies show that the asymptotic approximations are adequate for practical use and the biases of the proposed estimators are small even when censoring may occur in the interiors of the intervals.
Abstract: Estimation of the average total cost for treating patients with a particular disease is often complicated by the fact that the survival times are censored on some study subjects and their subsequent costs are unknown. The naive sample average of the observed costs from all study subjects or from the uncensored cases only can be severely biased, and the standard survival analysis techniques are not applicable. To minimize the bias induced by censoring, we partition the entire time period of interest into a number of small intervals and estimate the average total cost either by the sum of the Kaplan-Meier estimator for the probability of dying in each interval multiplied by the sample mean of the total costs from the observed deaths in that interval or by the sum of the Kaplan-Meier estimator for the probability of being alive at the start of each interval multiplied by an appropriate estimator for the average cost over the interval conditional on surviving to the start of the interval. The resultant estimators are consistent if censoring occurs solely at the boundaries of the intervals. In addition, the estimators are asymptotically normal with easily estimated variances. Extensive numerical studies show that the asymptotic approximations are adequate for practical use and the biases of the proposed estimators are small even when censoring may occur in the interiors of the intervals. An ovarian cancer study is provided.

Journal ArticleDOI
TL;DR: The structural components method is extended to the estimation of the Receiver Operating Characteristics (ROC) curve area for clustered data, incorporating the concepts of design effect and effective sample size used by Rao and Scott (1992, Biometrics 48, 577-585) for clustered binary data.
Abstract: Current methods for estimating the accuracy of diagnostic tests require independence of the test results in the sample. However, cases in which there are multiple test results from the same patient are quite common. In such cases, estimation and inference of the accuracy of diagnostic tests must account for intracluster correlation. In the present paper, the structural components method of DeLong, DeLong, and Clarke-Pearson (1988, Biometrics 44, 837-844) is extended to the estimation of the Receiver Operating Characteristics (ROC) curve area for clustered data, incorporating the concepts of design effect and effective sample size used by Rao and Scott (1992, Biometrics 48, 577-585) for clustered binary data. Results of a Monte Carlo simulation study indicate that the size of statistical tests that assume independence is inflated in the presence of intracluster correlation. The proposed method, on the other hand, appropriately handles a wide variety of intracluster correlations, e.g., correlations between true disease statuses and between test results. In addition, the method can be applied to both continuous and ordinal test results. A strategy for estimating sample size requirements for future studies using clustered data is discussed.


Journal ArticleDOI
TL;DR: A thoroughly updated and expanded version of the authors' successful textbook on geological factor analysis, this book draws on examples from botany, zoology, ecology, and oceanography, as well as geology as discussed by the authors.
Abstract: This graduate-level text aims to introduce students of the natural sciences to the powerful technique of factor analysis and to provide them with the background necessary to be able to undertake analyses on their own. A thoroughly updated and expanded version of the authors' successful textbook on geological factor analysis, this book draws on examples from botany, zoology, ecology, and oceanography, as well as geology. Applied multivariate statistics has grown into a research area of almost unlimited potential in the natural sciences. The methods introduced in this book, such as classical principal components, principal component factor analysis, principal coordinate analysis, and correspondence analysis, can reduce masses of data to manageable and interpretable form. Q-mode and Q-R-mode methods are also presented. Special attention is given to methods of robust estimation and the identification of atypical and influential observations. Throughout the book, the emphasis is on application rather than theory.

Journal ArticleDOI
TL;DR: A model for a mark-recapture experiment with resightings obtained from marked animals any time between capture periods and throughout the geographic range of the animals is proposed and shows that the estimator is equivalent to that suggested by Jolly but is only valid under the random emigration assumption.
Abstract: We propose a model for a mark-recapture experiment with resightings obtained from marked animals any time between capture periods and throughout the geographic range of the animals. The likelihood is described for random movement of animals in and out of the capture site, and closed-form maximum likelihood estimators are reported. We show that the estimator is equivalent to that suggested by Jolly (1965, Biometrika 52, 239) but is only valid under the random emigration assumption. The model provides a common framework for most of the widely used mark-recapture models including live-recapture, tag-recovery, and tag-resight models and allows simultaneous analysis of data obtained in all three ways.

Journal ArticleDOI
TL;DR: This paper presents a method to compute sample sizes and statistical powers for studies involving correlated observations, and appeals to a statistic based on the generalized estimating equation method for correlated data.
Abstract: Correlated data occur frequently in biomedical research. Examples include longitudinal studies, family studies, and ophthalmologic studies. In this paper, we present a method to compute sample sizes and statistical powers for studies involving correlated observations. This is a multivariate extension of the work by Self and Mauritsen (1988, Biometrics 44, 79-86), who derived a sample size and power formula for generalized linear models based on the score statistic. For correlated data, we appeal to a statistic based on the generalized estimating equation method (Liang and Zeger, 1986, Biometrika 73, 13-22). We highlight the additional assumptions needed to deal with correlated data. Some special cases that are commonly seen in practice are discussed, followed by simulation studies.

Journal ArticleDOI
TL;DR: Potential applications of the P-value distribution under the alternative hypothesis to the design, analysis, and interpretation of results of clinical trials are considered.
Abstract: The P-value is a random variable derived from the distribution of the test statistic used to analyze a data set and to test a null hypothesis. Under the null hypothesis, the P-value based on a continuous test statistic has a uniform distribution over the interval [0, 1], regardless of the sample size of the experiment. In contrast, the distribution of the P-value under the alternative hypothesis is a function of both sample size and the true value or range of true values of the tested parameter. The characteristics, such as mean and percentiles, of the P-value distribution can give valuable insight into how the P-value behaves for a variety of parameter values and sample sizes. Potential applications of the P-value distribution under the alternative hypothesis to the design, analysis, and interpretation of results of clinical trials are considered.

Journal ArticleDOI
TL;DR: This work has shown that the regression model first suggested by Yates and Cochran (1938) and elaborated by Finlay and Wilkinson (1963) and Eberhart and Russell (1966) is appropriate for the analysis of means from equally replicated data with homoscedastic errors.
Abstract: where Yij mean yield of ith genotype in jth environment, , = general mean, gi = effect of ith genotype, ej = effect of jth environment, (ge)ij = interaction of ith genotype and jth environment, Eij random error associated with means Yij, assumed to be distributed as N(O, ac2). This model is appropriate for the analysis of means from equally replicated data with homoscedastic errors. In an analysis of G x E data, it may be useful to fit a more specific model describing the interaction. The most common of such models is the regression model first suggested by Yates and Cochran (1938), which was further elaborated by Finlay and Wilkinson (1963) and Eberhart and Russell (1966). It may be written as

Journal ArticleDOI
TL;DR: Results for two-level random effects probit and logistic regression models to the three-level case are generalized and parameter estimation is based on full-information maximum marginal likelihood estimation (MMLE) using numerical quadrature to approximate the multiple random effects.
Abstract: In analysis of binary data from clustered and longitudinal studies, random effect models have been recently developed to accommodate two-level problems such as subjects nested within clusters or repeated classifications within subjects. Unfortunately, these models cannot be applied to three-level problems that occur frequently in practice. For example, multicenter longitudinal clinical trials involve repeated assessments within individuals and individuals are nested within study centers. This combination of clustered and longitudinal data represents the classic three-level problem in biometry. Similarly, in prevention studies, various educational programs designed to minimize risk taking behavior (e.g., smoking prevention and cessation) may be compared where randomization to various design conditions is at the level of the school and the intervention is performed at the level of the classroom. Previous statistical approaches to the three-level problem for binary response data have either ignored one level of nesting, treated it as a fixed effect, or used first- and second-order Taylor series expansions of the logarithm of the conditional likelihood to linearize these models and estimate model parameters using more conventional procedures for measurement data. Recent studies indicate that these approximate solutions exhibit considerable bias and provide little advantage over use of traditional logistic regression analysis ignoring the hierarchical structure. In this paper, we generalize earlier results for two-level random effects probit and logistic regression models to the three-level case. Parameter estimation is based on full-information maximum marginal likelihood estimation (MMLE) using numerical quadrature to approximate the multiple random effects. The model is illustrated using data from 135 classrooms from 28 schools on the effects of two smoking cessation interventions.

Journal ArticleDOI
TL;DR: This paper proposes two new methods to overcome deficiencies in standard methods of using the t-test and the Wilcoxon test for comparing the means of two skewed log-normal samples: a likelihood-based and a bootstrap-based approach.
Abstract: Standard methods of using the t-test and the Wilcoxon test have deficiencies for comparing the means of two skewed log-normal samples. In this paper, we propose two new methods to overcome these deficiencies: (1) a likelihood-based approach and (2) a bootstrap-based approach. Our simulation study shows that the likelihood-based approach is the best in terms of the type I error rate and power when data follow a log-normal distribution.

Journal ArticleDOI
TL;DR: A family of random walk rules for the sequential allocation of dose levels to patients in a dose-response study, or phase I clinical trial, is described and the small sample properties of this rule compare favorably to those of the continual reassessment method, determined by simulation.
Abstract: We describe a family of random walk rules for the sequential allocation of dose levels to patients in a dose-response study, or phase I clinical trial. Patients are sequentially assigned the next higher, same, or next lower dose level according to some probability distribution, which may be determined by ethical considerations as well as the patient's response. It is shown that one can choose these probabilities in order to center dose level assignments unimodally around any target quantile of interest. Estimation of the quantile is discussed; the maximum likelihood estimator and its variance are derived under a two-parameter logistic distribution, and the maximum likelihood estimator is compared with other nonparametric estimators. Random walk rules have clear advantages: they are simple to implement, and finite and asymptotic distribution theory is completely worked out. For a specific random walk rule, we compute finite and asymptotic properties and give examples of its use in planning studies. Having the finite distribution theory available and tractable obviates the need for elaborate simulation studies to analyze the properties of the design. The small sample properties of our rule, as determined by exact theory, compare favorably to those of the continual reassessment method, determined by simulation.

Journal ArticleDOI
TL;DR: Although the Breslow approximation is the default in many standard software packages, the Efron method for handling ties is to be preferred, particularly when the sample size is small either from the outset or due to heavy censoring.
Abstract: Survival-time studies sometimes do not yield distinct failure times. Several methods have been proposed to handle the resulting ties. The goal of this paper is to compare these methods. Simulations were conducted, in which failure times were generated for a two-sample problem with an exponential hazard, a constant hazard ratio, and no censoring. Failure times were grouped to produce heavy, moderate, and light ties, corresponding to a mean of 10.0, 5.0, and 2.5 failures per interval. Cox proportional hazards models were fit using each of three approximations for handling ties with each interval size for sample sizes of n = 25, 50, 250, and 500 in each group. The Breslow (1974, Biometrics 30, 89-99) approximation tends to underestimate the true beta, while the Kalbfleisch-Prentice (1973, Biometrika 60, 267-279) approximation tends to overestimate beta. As the ties become heavier, the bias of these approximations increases. The Efron (1977, Journal of the American Statistical Association 72, 557-565) approximation performs far better than the other two, particularly with moderate or heavy ties; even with n = 25 in each group, the bias is under 2%, and for sample sizes larger than 50 per group, it is less than 1%. Except for the heaviest ties in the smallest sample, confidence interval coverage for all three estimators fell in the range of 94-96%. However, the tail probabilities were asymmetric with the Breslow and Kalbfleisch-Prentice formulas; using the Efron approximation, they were closer to the nominal 2.5%. Although the Breslow approximation is the default in many standard software packages, the Efron method for handling ties is to be preferred, particularly when the sample size is small either from the outset or due to heavy censoring.

Journal ArticleDOI
TL;DR: A regression calibration method for failure time regression analysis when data on some covariates are missing or mismeasured and compared with an estimated partial likelihood estimator via simulation studies, where the proposed method performs well even though it is technically inconsistent.
Abstract: In this paper we study a regression calibration method for failure time regression analysis when data on some covariates are missing or mismeasured. The method estimates the missing data based on the data structure estimated from a validation data set, a random subsample of the study cohort in which covariates are always observed. Ordinary Cox (1972; Journal of the Royal Statistical Society, Series B 34, 187-220) regression is then applied to estimate the regression coefficients, using the observed covariates in the validation data set and the estimated covariates in the nonvalidation data set. The method can be easily implemented. We present the asymptotic theory of the proposed estimator. Finite sample performance is examined and compared with an estimated partial likelihood estimator and other related methods via simulation studies, where the proposed method performs well even though it is technically inconsistent. Finally, we illustrate the method with a mouse leukemia data set.

Journal ArticleDOI
TL;DR: General formulas for deriving the maximum likelihood estimates and the asymptotic variance-covariance matrix of the positions and effects of quantitative trait loci (QTLs) in a finite normal mixture model when the EM algorithm is used for mapping QTLs are presented.
Abstract: We present in this paper general formulas for deriving the maximum likelihood estimates and the asymptotic variance-covariance matrix of the positions and effects of quantitative trait loci (QTLs) in a finite normal mixture model when the EM algorithm is used for mapping QTLs. The general formulas are based on two matrices D and Q, where D is the genetic design matrix, characterizing the genetic effects of the QTLs, and Q is the conditional probability matrix of QTL genotypes given flanking marker genotypes, containing the information on QTL positions. With the general formulas, it is relatively easy to extend QTL mapping analysis to using multiple marker intervals simultaneously for mapping multiple QTLs, for analyzing QTL epistasis, and for estimating the heritability of quantitative traits. Simulations were performed to evaluate the performance of the estimates of the asymptotic variances of QTL positions and effects.

Journal ArticleDOI
TL;DR: In this article, one-way ANOVA and multiple comparison techniques were used to estimate the covariance of a model with a multifactor analysis of Variance component in a design model.
Abstract: Introduction.- Estimation.- Testing.- One-Way ANOVA.- Multiple Comparison Techniques.- Regression Analysis.- Multifactor Analysis of Variance.- Experimental Design Models.- Analysis of Covariance.- General Gauss-Markov Models.- Split Plot Models.- Mixed Models and Variance Components.- Model Diagnostics.- Variable Selection.- Collinearity and Alternative Estimates.-

Journal ArticleDOI
TL;DR: The models are parameterized so that the sensitivities and specificities of the diagnostic tests are simple functions of model parameters, and the usual latent class model obtains as a special case.
Abstract: Latent class analysis has been applied in medical research to assessing the sensitivity and specificity of diagnostic tests/diagnosticians. In these applications, a dichotomous latent variable corresponding to the unobserved true disease status of the patients is assumed. Associations among multiple diagnostic tests are attributed to the unobserved heterogeneity induced by the latent variable, and inferences for the sensitivities and specificities of the diagnostic tests are made possible even though the true disease status is unknown. However, a shortcoming of this approach to analyses of diagnostic tests is that the standard assumption of conditional independence among the diagnostic tests given a latent class is contraindicated by the data in some applications. In the present paper, models incorporating dependence among the diagnostic tests given a latent class are proposed. The models are parameterized so that the sensitivities and specificities of the diagnostic tests are simple functions of model parameters, and the usual latent class model obtains as a special case. Marginal models are used to account for the dependencies within each latent class. An accelerated EM gradient algorithm is demonstrated to obtain maximum likelihood estimates of the parameters of interest, as well as estimates of the precision of the estimates.

Journal ArticleDOI
TL;DR: It is demonstrated that the inverse Gaussian mixture distribution gives a significantly better fit for a data set on the frequency of epileptic seizures than the traditional Poisson distribution.
Abstract: Count data often show overdispersion compared to the Poisson distribution. Overdispersion is typically modeled by a random effect for the mean, based on the gamma distribution, leading to the negative binomial distribution for the count. This paper considers a larger family of mixture distributions, including the inverse Gaussian mixture distribution. It is demonstrated that it gives a significantly better fit for a data set on the frequency of epileptic seizures. The same approach can be used to generate counting processes from Poisson processes, where the rate or the time is random. A random rate corresponds to variation between patients, whereas a random time corresponds to variation within patients.

Journal ArticleDOI
TL;DR: The Cox regression model with a shared frailty factor allows for unobserved heterogeneity or for statistical dependence between the observed survival times, and the problem of obtaining variance estimates for regression coefficients, frailty parameter, and cumulative baseline hazards using the observed nonparametric information matrix is addressed.
Abstract: The Cox regression model with a shared frailty factor allows for unobserved heterogeneity or for statistical dependence between the observed survival times. Estimation in this model when the frailties are assumed to follow a gamma distribution is reviewed, and we address the problem of obtaining variance estimates for regression coefficients, frailty parameter, and cumulative baseline hazards using the observed nonparametric information matrix. A number of examples are given comparing this approach with fully parametric inference in models with piecewise constant baseline hazards.

Journal ArticleDOI
TL;DR: In this article, Kendall et al. showed that completely random temporary emigration influences only estimates of the probability of capture, while estimates of abundance or survival that refer to the entire population, including the temporary emigrants, remain unaffected.
Abstract: One of the basic assumptions central to the analysis of capture-recapture experiments is that all marked animals remain in the population under study for the duration of the sampling, or if they migrate out of the population they do so permanently. Burnham (1993, in Marked Individuals in the Study of Bird Populations,199-213), Kendall and Nichols (1995, Applied Statistics 22, 751-762), and Kendall, Nichols, and Hines (in press) showed that completely random temporary emigration influences only estimates of the probability of capture, these now estimating the product of the temporary emigration rate and the conditional probability of capture given the animal remains in the population. Estimates of abundance or survival that refer to the entire population, including the temporary emigrants, remain unaffected. Kendall et al. (in press) further showed that Pollock's (1982, Journal of Wildlife Management 46, 757-760) robust design could be used to estimate the temporary emigration rate when the population was assumed closed during the secondary samples. We generalize this result to allow animals to enter and leave the population during the secondary samples. We apply the results to a study of Grey Seals and perform simulation experiments to assess the robustness of our estimator to errors in field identification of brands and other violations of our assumptions.