scispace - formally typeset
Search or ask a question

Showing papers in "Biometrics in 2003"


Journal ArticleDOI
TL;DR: The regression framework is described, which is used to compare two prostate‐specific antigen biomarkers and to evaluate the dependence of biomarker accuracy on the time prior to clinical diagnosis of prostate cancer.
Abstract: Accurate diagnosis of disease is a critical part of health care. New diagnostic and screening tests must be evaluated based on their abilities to discriminate diseased from nondiseased states. The partial area under the receiver operating characteristic (ROC) curve is a measure of diagnostic test accuracy. We present an interpretation of the partial area under the curve (AUC), which gives rise to a nonparametric estimator. This estimator is more robust than existing estimators, which make parametric assumptions. We show that the robustness is gained with only a moderate loss in efficiency. We describe a regression modeling framework for making inference about covariate effects on the partial AUC. Such models can refine knowledge about test accuracy. Model parameters can be estimated using binary regression methods. We use the regression framework to compare two prostate-specific antigen biomarkers and to evaluate the dependence of biomarker accuracy on the time prior to clinical diagnosis of prostate cancer.

323 citations


Journal ArticleDOI
TL;DR: Even with very large samples, the analyst will not be able to distinguish among reasonable models of heterogeneity, even though these yield quite distinct inferences about population size, with models for closed and open populations.
Abstract: Heterogeneity in detection probabilities has long been recognized as problematic in mark-recapture studies, and numerous models developed to accommodate its effects. Individual heterogeneity is especially problematic, in that reasonable alternative models may predict essentially identical observations from populations of substantially different sizes. Thus even with very large samples, the analyst will not be able to distinguish among reasonable models of heterogeneity, even though these yield quite distinct inferences about population size. The problem is illustrated with models for closed and open populations.

306 citations


Journal ArticleDOI
TL;DR: This work reparameterizes the mixed model so that functions of the covariance parameters of the random effects distribution are incorporated as regression coefficients on standard normal latent variables and allows random effects to effectively drop out of the model.
Abstract: Nous presentons le probleme pratique important de la selection des effets aleatoires dans un modele lineaire mixte. Un modele bayesien hierarchique est utilise pour identifier tout effet aleatoire devariance nulle. L'approche proposee reparametrise le modele mixte de telle sorte que les fonctions des parametres de covariance de la distribution des effets aleatoires soient incorporees comme des coefficients de regression pour des variables latentes normales. Nous permettons aux effets aleatoires de sortir du modele en choisissant des melanges d'a priori avec des masses ponctuelles en zero pour les variances des effets aleatoires. Du fait de la reparametrisation, le modele prend une structure lineaire conditionnelle qui facilite l'utilisation de lois a priori, conjuguees et normales. Nous demontrons que les calculs a posteriori peuvent etre effectues via un algorithme simple et efficace de Monte Carlo par chaine de Markov. Les methodes sont illustrees par des donnees simulees et par des donnees reelles d'exposition prenatale au polychlorobiphenile et des donnees sur le developpement psychomoteur des enfants.

255 citations


Journal ArticleDOI
TL;DR: A test for the JMV model, a simple generalization of the Arnason‐Schwarz (AS) model, in the form of interpretable contingency tables is proposed, and a partitioning that emphasizes the role of the memory model of Brownie et al. (1993 Biometrics49, 1173–1187) as a biologically more plausible alternative to the AS model is proposed.
Abstract: In an analysis of capture-recapture data, the identification of a model that fits is a critical step. For the multisite (also called multistate) models used to analyze data gathered at several sites, no reliable test for assessing fit is currently available. We propose a test for the JMV model, a simple generalization of the Arnason-Schwarz (AS) model, in the form of interpretable contingency tables. For the AS model, we suggest complementing the test for the JMV model with a likelihood ratio test of AS vs. JMV. The examination of an example leads us to propose further a partitioning that emphasizes the role of the memory model of Brownie et al. (1993 Biometrics 49, 1173-1187) as a biologically more plausible alternative to the AS model.

243 citations


Journal ArticleDOI
TL;DR: A general methodology is developed which allows the effects of multiple covariates to be directly incorporated into the estimation procedure using a conditional likelihood approach, and is applied to eastern tropical Pacific dolphin sightings data.
Abstract: Summary. An implicit assumption of standard line transect methodology is that detection probabilities depend solely on the perpendicular distance of detected objects to the transect line. Heterogeneity in detection probabilities is commonly minimized using stratification, but this may be precluded by small sample sizes. We develop a general methodology which allows the effects of multiple covariates to be directly incorporated into the estimation procedure using a conditional likelihood approach. Small sample size properties of estimators are examined via simulations. As an example the method is applied to eastern tropical Pacific dolphin sightings data.

241 citations


Journal ArticleDOI
TL;DR: In this paper, the authors show that classical score test statistics, frequently advocated in practice, cannot be used in this context, but that well-chosen one-sided counterparts could be used instead.
Abstract: Whenever inference for variance components is required, the choice between one-sided and two-sided tests is crucial. This choice is usually driven by whether or not negative variance components are permitted. For two-sided tests, classical inferential procedures can be followed, based on likelihood ratios, score statistics, or Wald statistics. For one-sided tests, however, one-sided test statistics need to be developed, and their null distribution derived. While this has received considerable attention in the context of the likelihood ratio test, there appears to be much confusion about the related problem for the score test. The aim of this paper is to illustrate that classical (two-sided) score test statistics, frequently advocated in practice, cannot be used in this context, but that well-chosen one-sided counterparts could be used instead. The relation with likelihood ratio tests will be established, and all results are illustrated in an analysis of continuous longitudinal data using linear mixed models.

223 citations


Journal ArticleDOI
TL;DR: This work considers statistical methods to rank genes (or proteins) in regards to differential expression between tissues, and proposes that sampling variability in the gene rankings be quantified, and suggests using the "selection probability function," the probability distribution of rankings for each gene.
Abstract: Les technologies a haut debit, telles les puces a ADN et la spectrometrie de masse, permettent d'evaluer simultanement iles milliers de biomarqueurs potentiels qui distinguent differents types de tissus. Ces techniques sont particulierement interessantes dans la comparaison entre tissus cancereux et tissus normaux. Nous etudions des methodes statistiques pour classer les genes (ou les proteines) en fonction de leur differentiel d'expression dans les tissus. Nous etudions differentes mesures statistiques et nous soutenons que deux mesures liees a la courbe ROC (receiver operating characteristic) sont particulierement adaptees a cet objectif. Nous proposons aussi de quantifier la variabilite entre echantillons dans les classements de genes et nous suggerons d'utiliser la'fonction de probabilite de selection', la distribution de probabilite des classements pour chaque gene, estimee par bootstrap. Nous analysons un jeu de donnees reel obtenu a partir des resultats d'expression genique de 23 tissus ovariens normaux et de 30 tissus cancereux. Des etudes de simulation sont aussi utilisees pour etudier le comportement de differentes mesures statistiques de classement de genes et notre quantification de la variabilite entre echantillons. Notre approche conduit naturellement a une procedure de calcul de taille d'echantillon appropriee a des etudes exploratoires visant a identifier des genes a expression differentielle.

221 citations


Journal ArticleDOI
TL;DR: A parameterization of the beta-binomial mixture is developed that provides sensible inferences about the size of a closed population when probabilities of capture or detection vary among individuals.
Abstract: We develop a parameterization of the beta-binomial mixture that provides sensible inferences about the size of a closed population when probabilities of capture or detection vary among individuals. Three classes of mixture models (beta-binomial, logistic-normal, and latent-class) are fitted to recaptures of snowshoe hares for estimating abundance and to counts of bird species for estimating species richness. In both sets of data, rates of detection appear to vary more among individuals (animals or species) than among sampling occasions or locations. The estimates of population size and species richness are sensitive to model-specific assumptions about the latent distribution of individual rates of detection. We demonstrate using simulation experiments that conventional diagnostics for assessing model adequacy, such as deviance, cannot be relied on for selecting classes of mixture models that produce valid inferences about population size. Prior knowledge about sources of individual heterogeneity in detection rates, if available, should be used to help select among classes of mixture models that are to be used for inference.

218 citations


Journal ArticleDOI
TL;DR: A new approach to fitting marginal models to clustered data when cluster size is in‐ formative is proposed that is weighted inversely with the cluster size and is asymptotically equivalent to within‐cluster resampling.
Abstract: We propose a new approach to fitting marginal models to clustered data when cluster size is informative. This approach uses a generalized estimating equation (GEE) that is weighted inversely with the cluster size. We show that our approach is asymptotically equivalent to within-cluster resampling (Hoffman, Sen, and Weinberg, 2001, Biometrika 73, 13-22), a computationally intensive approach in which replicate data sets containing a randomly selected observation from each cluster are analyzed, and the resulting estimates averaged. Using simulated data and an example involving dental health, we show the superior performance of our approach compared to unweighted GEE, the equivalence of our approach with WCR for large sample sizes, and the superior performance of our approach compared with WCR when sample sizes are small.

212 citations


Journal ArticleDOI
TL;DR: This article presents a flexible framework of likelihood-based models which allow for individual heterogeneity in survival and capture rates, and includes as a special case the Cormack-Jolly-Seber model.
Abstract: Dans les etudes de capture-recapture dans des population ouvertes, il est generalement suppose que des animaux similaires (c'est-a-dire de meme sexe et de meme groupe d'âge) ont des taux de survie et des probabilites de capture similaires. Ces hypotheses sont generalement percues pour etre des simplifications excessives, et elles peuvent conduire a une selection de modele incorrecte et des estimations de parametres biaisees. Autoriser une variabilite individuelle pour les probabilites de capture d'animaux apparemment similaires est maintenant devenu possible grâce aux avancees des modeles pour population fermee et un puissance de calcul accrue. Cet article presente un cadre souple de modeles bases sur la vraisemblance qui permet l'heterogeneite des taux de survie et de capture. L'heterogeneite est modelisee en utilisant des melanges finis qui ont assez de souplesse dans la forme des distributions pour s'adapter a une grande variete de schemas differents de variation individuelle. Les modeles conditionnent sur la premiere capture de l'animal, et comprennent comme cas particulier le modele de Cormack-Jolly-Seber. La selection de modele est faite usuellement par le critere d'information d'Akmike, ou par des tests du rapport de vraisemblance, ce qui permet de tester differentes influences sur les taux de survie. Le biais dans les estimations des parametres est reduit par l'inclusion de l'heterogeneite individuelle. La selection de modele et la reduction du biais sont importants dans les etudes de population et pour prendre des decisions de gestion adaptees.

211 citations


Journal ArticleDOI
TL;DR: The basic principles of design (randomization, replication, and blocking) as they pertain to microarrays are discussed and some general guidelines for statisticians designing microarray studies are provided.
Abstract: Cet article decrit les problemes theoriques et pratiques dans les plans experimentaux pour l'expression des genes dans les biopuces. Precisement, cet article premierement discute les principes de base des plans d'etude (randomisation, replication, blocking) dans leurs applications aux biopuces, et deuxiemement donne quelques lignes de conduite generales pour les statisticiens qui planifient des etudes de biopuces.

Journal ArticleDOI
TL;DR: It is demonstrated that ICC and CCC are the same measure of agreement estimated in two ways: by the variance components procedure and by the moment method.
Abstract: Le coefficient de correlation intraclasse (ICC) et le coefficient de correlation de concordance (CCC) sont deux des mesures de concordance les plus repandues pour des variables mesurees sur une echelle continue.Nous demontrons ici que ICC et CCC sont une meme mesure de concordance estimee de deux facons: par composants de la variance, et par la methode des moments. Nous proposons d'estimer le CCC en utilisant les composants de la variance d'un modele a effets mixtes au lieu de l'habituelle methode des moments. Par l'approche des composants de la variance, le CCC peut aisement etre etendu a plus de deux observateurs et ajuste sur des covariables de confusion en incorporant celles-ci dans le modele mixte. Une etude de simulation est menee pour comparer l'approche par composants de la variance a celle de la methode des moments. La valeur de l'ajustement sur des variables de confusion est illustree par un exemple.

Journal ArticleDOI
TL;DR: A new semiparametric Bayesian hierarchical model for the joint modeling of longitudinal and survival data is proposed using Dirichlet process priors on the parameters defining the longitudinal model, resulting in more robust estimates.
Abstract: This article proposes a new semiparametric Bayesian hierarchical model for the joint modeling of longitudinal and survival data. We relax the distributional assumptions for the longitudinal model using Dirichlet process priors on the parameters defining the longitudinal model. The resulting posterior distribution of the longitudinal parameters is free of parametric constraints, resulting in more robust estimates. This type of approach is becoming increasingly essential in many applications, such as HIV and cancer vaccine trials, where patients' responses are highly diverse and may not be easily modeled with known distributions. An example will be presented from a clinical trial of a cancer vaccine where the survival outcome is time to recurrence of a tumor. Immunologic measures believed to be predictive of tumor recurrence were taken repeatedly during follow-up. We will present an analysis of this data using our new semiparametric Bayesian hierarchical joint modeling methodology to determine the association of these longitudinal immunologic measures with time to tumor recurrence.

Journal ArticleDOI
TL;DR: An adaptive two‐stage Bayesian design for finding one or more acceptable dose combinations of two cytotoxic agents used together in a Phase I clinical trial is proposed and a simulation study is presented.
Abstract: Nous proposons un dispositif bayesien adaptatif a deux etapes pour la recherche d'une ou plusieurs combinaisons acceptables de doses de deux produits cytotoxiques utilises conjointement dans un essai clinique de phase I. Cette methode necessite que chacun des deux produits ait ete etudie separement au prealable, ce qui est presque toujours le cas en pratique. De surcroit on fait l'hypothese d'un modele parametrique pour l'evaluation de la probabilite de toxicite en fonction des deux doses. Des a priori informatifs pour les parametres caracterisant les courbes de probabilite de la toxicite de chaque produit pris isolement sont soit obtenus du (des) clinicien(s) preparant l'essai, soit issus de donnees historiques, tandis qu'on ne definit que des a priori vagues pour les parametres caracterisant les interactions entre produits. Une methode d'obtention des a priori non informatifs est decrite. Le schema est applique a un essai sur la gemcitabine et le cyclophosphamide, et on presente egalement une etude de simulation.

Journal ArticleDOI
TL;DR: In this paper, the causal effect of vaccination on secondary transmission and disease was evaluated using a principal stratification framework developed by Frangakis and Rubin (2002, Biometrics 58, 21-29).
Abstract: Vaccines with limited ability to prevent HIV infection may positively impact the HIV/AIDS pandemic by preventing secondary transmission and disease in vaccine recipients who become infected. To evaluate the impact of vaccination on secondary transmission and disease, efficacy trials assess vaccine effects on HIV viral load and other surrogate endpoints measured after infection. A standard test that compares the distribution of viral load between the infected subgroups of vaccine and placebo recipients does not assess a causal effect of vaccine, because the comparison groups are selected after randomization. To address this problem, we formulate clinically relevant causal estimands using the principal stratification framework developed by Frangakis and Rubin (2002, Biometrics 58, 21-29), and propose a class of logistic selection bias models whose members identify the estimands. Given a selection model in the class, procedures are developed for testing and estimation of the causal effect of vaccination on viral load in the principal stratum of subjects who would be infected regardless of randomization assignment. We show how the procedures can be used for a sensitivity analysis that quantifies how the causal effect of vaccination varies with the presumed magnitude of selection bias.

Journal ArticleDOI
TL;DR: A correction to the variance of the Wilcoxon rank sum statistic is proposed that accounts for clustering effects and that can be used for both balanced or unbalanced data, both in the presence or absence of ties, with p-value adjusted accordingly.
Abstract: The Wilcoxon rank sum test is frequently used in statistical practice for the comparison of measures of location when the underlying distributions are far from normal or not known in advance. An assumption of the ordinary rank sum test is that individual sampling units are independent. In many ophthalmologic clinical trials, the Early Treatment for Diabetic Retinopathy Scale (ETDRS) is a principal endpoint used for measuring the level of diabetic retinopathy. This is an ordinal scale, and it is natural to consider the Wilcoxon rank sum test for the comparison of the level of diabetic retinopathy between treatment groups. However, under this design, unlike the usual Wilcoxon rank sum test, the subject is the unit of randomization, but the eye is the unit of analysis. Furthermore, a person will tend to have different, but correlated, ETDRS scores for fellow eyes. Thus, we propose a correction to the variance of the Wilcoxon rank sum statistic that accounts for clustering effects and that can be used for both balanced (same number of subunits per cluster) or unbalanced (different number of subunits per cluster) data, both in the presence or absence of ties, with p-value adjusted accordingly. In this article, we present large-sample theory and simulation results for this test procedure and apply it to diabetic retinopathy data from type I diabetics in the Sorbinil Retinopathy Trial.

Journal ArticleDOI
TL;DR: This work proposes an alternative approach to pattern-mixture models with dropout, which is a latent-class model, where the dropout time is assumed to be related to the unobserved (latent) class membership; a regression model for the response is specified conditional on the latent variable.
Abstract: Summary. In longitudinal studies with dropout, pattern-mixture models form an attractive modeling framework to account for nonignorable missing data. However, pattern-mixture models assume that the components of the mixture distribution are entirely determined by the dropout times. That is, two subjects with the same dropout time have the same distribution for their response with probability one. As that is unlikely to be the case, this assumption made lead to classification error. In addition, if there are certain dropout patterns with very few subjects, which often occurs when the number of observation times is relatively large, pattern-specific parameters may be weakly identified or require identifying restrictions. We propose an alternative approach, which is a latent-class model. The dropout time is assumed to be related to the unobserved (latent) class membership, where the number of classes is less than the number of observed patterns; a regression model for the response is specified conditional on the latent variable. This is a type of shared-parameter model, where the shared “parameter” is discrete. Parameter estimates are obtained using the method of maximum likelihood. Averaging the estimates of the conditional parameters over the distribution of the latent variable yields estimates of the marginal regression parameters. The methodology is illustrated using longitudinal data on depression from a study of HIV in women.

Journal ArticleDOI
TL;DR: The proposed methodology, incorporating shrinkage and data-adaptive features, is seen to be well suited for describing population kinetics of 14C-folate-specific activity and random effects, and can also be applied to other functional data analysis problems.
Abstract: We present the application of a nonparametric method to performing functional principal component analysis for functional curve data that consist of measurements of a random trajectory for a sample of subjects. This design typically consists of an irregular grid of time points on which repeated measurements are taken for a number of subjects. We introduce shrinkage estimates for the functional principal component scores that serve as the random effects in the model. Scatterplot smoothing methods are used to estimate the mean function and covariance surface of this model. We propose improved estimation in the neighborhood of and at the diagonal of the covariance surface, where the measurement errors are reflected. The presence of additive measurement errors motivates shrinkage estimates for the functional principal component scores. Shrinkage estimates are developed through best linear prediction and in a generalized version, aiming at minimizing one-curve-leave-out prediction error. The estimation of individual trajectories combines data obtained from that individual as well as all other individuals. We apply our methods to new data regarding the analysis of the level of 14C-folate in plasma as a function of time since dosing of healthy adults with a small tracer dose of 14C-folic acid. A time transformation was incorporated to handle design irregularity concerning the time points on which the measurements were taken. The proposed methodology, incorporating shrinkage and data-adaptive features, is seen to be well suited for describing population kinetics of 14C-folate-specific activity and random effects, and can also be applied to other functional data analysis problems.

Journal ArticleDOI
TL;DR: The pFDR, cF DR, and mFDR are shown to be equivalent under the Bayesian framework, in which the number of true null hypotheses is modeled as a random variable.
Abstract: Testing for significance with gene expression data from DNA microarray experiments involves simultaneous comparisons of hundreds or thousands of genes If R denotes the number of rejections (declared significant genes) and V denotes the number of false rejections, then V/R, if R > 0, is the proportion of false rejected hypotheses This paper proposes a model for the distribution of the number of rejections and the conditional distribution of V given R, V / R Under the independence assumption, the distribution of R is a convolution of two binomials and the distribution of V / R has a noncentral hypergeometric distribution Under an equicorrelated model, the distributions are more complex and are also derived Five false discovery rate probability error measures are considered: FDR = E(V/R), pFDR = E(V/R / R > 0) (positive FDR), cFDR = E(V/R / R = r) (conditional FDR), mFDR = E(V)/E(R) (marginal FDR), and eFDR = E(V)/r (empirical FDR) The pFDR, cFDR, and mFDR are shown to be equivalent under the Bayesian framework, in which the number of true null hypotheses is modeled as a random variable We present a parametric and a bootstrap procedure to estimate the FDRs Monte Carlo simulations were conducted to evaluate the performance of these two methods The bootstrap procedure appears to perform reasonably well, even when the alternative hypotheses are correlated (rho = 25) An example from a toxicogenomic microarray experiment is presented for illustration

Journal ArticleDOI
TL;DR: This article weakly parameterize the log-hazard function with a piecewise-linear spline and provides a smoothed estimate of the hazard function by maximizing the penalized likelihood through a mixed model-based approach.
Abstract: This article introduces a new approach for estimating the hazard function for possibly interval- and right-censored survival data. We weakly parameterize the log-hazard function with a piecewise-linear spline and provide a smoothed estimate of the hazard function by maximizing the penalized likelihood through a mixed model-based approach. We also provide a method to estimate the amount of smoothing from the data. We illustrate our approach with two well-known interval-censored data sets. Extensive numerical studies are conducted to evaluate the efficacy of the new procedure.

Journal ArticleDOI
TL;DR: A Bayesian sequential optimal design scheme comprising a pilot study on a small number of patients followed by the allocation of patients to doses one at a time is developed and its properties explored by simulation.
Abstract: A broad approach to the design of Phase I clinical trials for the efficient estimation of the maximum tolerated dose is presented. The method is rooted in formal optimal design theory and involves the construction of constrained Bayesian c- and D-optimal designs. The imposed constraint incorporates the optimal design points and their weights and ensures that the probability that an administered dose exceeds the maximum acceptable dose is low. Results relating to these constrained designs for log doses on the real line are described and the associated equivalence theorem is given. The ideas are extended to more practical situations, specifically to those involving discrete doses. In particular, a Bayesian sequential optimal design scheme comprising a pilot study on a small number of patients followed by the allocation of patients to doses one at a time is developed and its properties explored by simulation.

Journal ArticleDOI
TL;DR: A Bayesian framework for jointly modeling cluster size and multiple categorical and continuous outcomes measured on each subunit is proposed, using a continuation ratio probit model for the clustersize and underlying normal regression models for each of the subunit-specific outcomes.
Abstract: Dans des applications portant sur des donnees en grappes, comme par exemple les etudes longitudinales ou les plans experimentaux en toxicologie, il arrive souvent que les valeurs des variables mesurees sur chacun des individus d'une grappe ne soient pas sans lien avec le nombre des individus appartenant a cette grappe. Des analyses ne prenant pas en compte cette dependance peuvent alors conduire a des inferences biaisees. Cet article offre un cadre bayesien permettant de modeliser conjointement la taille des grappes et les differentes variables-qu'elles soient discretes ou continues -mesurees sur chaque individu de la grappe. Nous utilisons un modele probit sur les «categories superieures cumulees» (continuation ratio) pour la taille des grappes, et des modeles de regression sous hypothese de normalite pour chacune des variables mesurees sur les individus. La dependance entre la taille des grappes et les differentes variables est prise en compte, quant a elle, par le biais d'une structure en variables latentes. Ce choix de modele facilite des calculs realises a l'aide d'un echantillonnage de Gibbs (Gibbs sampler) a la fois simple et performant. Nous illustrons cette approche sur une etude de toxicologie, et discutons son utilisation dans la modelisation conjointe de donnees longitudinales et de donnees de survenue.

Journal ArticleDOI
TL;DR: Boschloo's test, in which the p-value from Fisher's test is used as the test statistic in an exact unconditional test, is uniformly more powerful than Fisher'sTest, and is also recommended.
Abstract: Utilise pour la comparaison de proportions dans le cadre d'essais randomises, le test exact de Ficher peut se reveler particulierement conservateur, principalement lorsque les echantillons sont de petite taille ou lorsque les proportions observees sont proches de 0 ou de 1. Cette propriete indesirable s'explique surtout par une distribution de la statistique de test (sous hypothese nulle) trop discretisee, notamment en raison d'une inference effectuee conditionnellement au nombre total des repondeurs. De ce fait, les procedures exactes non conditionnelles ont conquis une popularite fondee sur l'idee d'une augmentation de la puissance entrainee par des distributions de statistiques de tests (sous hypothese nulle) necessairement moins discretisees. Toutefois, nous mettons en garde les chercheurs contre le choix hasardeux d'un test exact non conditionnel car, dans certains cas, leur utilisation peut s'averer franchement moins puissante que le test de Fisher. A l'appui de nos propos, nous traitons d'un exemple reel, puis nous calculons les risques de type I et les puissances associees a differents tests, lors de comparaisons portant sur des groupes de tailles egales ou inegales. Nos resultats montrent que le test exact de Fisher fait generalement mieux que les tests exacts non conditionnels bases sur la difference des proportions, mieux aussi, dans le cas d'echantillons de taille inegales, que les tests exacts non conditionnels bases sur la difference des proportions divisee par son ecart type (estime sous hypothese alternative). A l'inverse, le test exact non conditionnel base sur la difference des proportions divisee par son ecart-type (estime sous hypothese nulle) il s'agit de la statistique du score - fait mieux que le test de Fisher. Ce test est recommande, tout comme le test de Boschloo, test dans lequel le test de Fisher est utilise comme une statistique de test dans un test exact non conditionnel; ce dernier test s'avere, quant a lui, uniformement plus puissant que le test de Fisher.

Journal ArticleDOI
TL;DR: Applications to angular data, p‐values, vector parameters, Bayesian inference, genetics data, and random cluster sizes are discussed, and asymptotic normality of estimates based on all possible outputations, as well as a finite number of outputations are proven given weak conditions.
Abstract: Cet article applique une methode simple de reduction quand on a des donnees regroupees, mais on ne dispose de methodes statistiques uniquement valables pour des donnees independantes. Nous supposons que la methode statistique nous fournit un estimateur Normalement distribue, ?, et un estimateur de sa variance, ? 2 . Nous selectionnons aleatoirement une donnee a partir de chaque groupe et nous appliquons notre methode statistique a ces donnees independantes. Nous repetons ceci de nombreuses fois, et nous utilisons la moyenne du ? S associe comme notre estimateur. Un estimateur de la variance est donne par la moyenne des ? 2 S moins la variance d'echantillonnage des ? S . Nous appelons cette procedure « outputation » multiple, puisque toute donnee excedentaire a l'interieur de chaque groupe est rejetee de nombreuses fois. Hoffman, Sen et Weinberg (2001) ont introduit cette approche pour des modeles lineaires generalises quand la taille du groupe est relie a la realisation. Dans cet article nous demontrons la possibilite etendue de cette approche. Nous discutons des applications aux donnees angulaires, aux p-valeurs, aux vecteurs de parametres, a l'inference bayesienne, aux donnees genetiques et aux tailles aleatoires de groupes. En complement nous prouvons, sous des conditions souples, la Normalite asymptotique des estimateurs fondes sur toutes les outputations possibles aussi bien que pour un nombre fini d'outputations. L'outputation multiple fournit une methode simple et largement applicable d'analyse de donnees regroupees. Elle est particulierement adaptee aux reductions quand les methodes pour des donnees regroupees ne sont pas applicables, elle peut aussi s'employer de maniere generale comme outil rapide et simple.

Journal ArticleDOI
TL;DR: A PH-model of treatment effect on the treated subgroup is examined and an estimating equation for the Compliers PROPortional Hazards Effect of Treatment (C-PROPHET) is derived, using the jackknife for bias correction and variance estimation.
Abstract: Survival data from randomized trials are most often analyzed in a proportional hazards (PH) framework that follows the intention-to-treat (ITT) principle. When not all the patients on the experimental arm actually receive the assigned treatment, the ITT-estimator mixes its effect on treatment compliers with its absence of effect on noncompliers. The structural accelerated failure time (SAFT) models of Robins and Tsiatis are designed to consistently estimate causal effects on the treated, without direct assumptions about the compliance selection mechanism. The traditional PH-model, however, has not yet led to such causal interpretation. In this article, we examine a PH-model of treatment effect on the treated subgroup. While potential treatment compliance is unobserved in the control arm, we derive an estimating equation for the Compliers PROPortional Hazards Effect of Treatment (C-PROPHET). The jackknife is used for bias correction and variance estimation. The method is applied to data from a recently finished clinical trial in cancer patients with liver metastases.

Journal ArticleDOI
TL;DR: It is demonstrated, with a simulation and an application, that the penalized GEE potentially improves the performance of the GEE estimator, and enjoys the same properties as linear penalty models.
Abstract: Les modeles avec penalisation comme l'estimateur ridge, l'estimateur de Stein, l'estimateur en pont, et le Lasso, ont ete proposes pour prendre en compte la collineariteen regression. Le Lasso a ete applique au modele lineaire, a la regression logistique, au modele des hasards proportionnels de Cox et aux reseaux de neurones. Cet article etudie le modele avec penalisation en pont, pour une penalite Σ j ‖β j ‖ γ applique aux equations d'estimation, et applique ce modele avec penalisation aux equations d'estimation generalisees (GEE) dans des etudes longitudinales. Le defaut de vraisemblance jointe dans les GEE est surmonte par les equations d'estimation penalisees pour lesquelles aucune vraisemblance jointe n'est requise. On donne les resultats asymptotiques pour l'estimateur sons penalite. On montre par des simulations et une application, que les CEE avec penalisation ameliorent potentiellement les resultats de l'estimateur GEE, et presentent les memes proprietes que les modeles lineaires penalises.

Journal ArticleDOI
TL;DR: A new meta-analytic method to evaluate test accuracy and arrive at a summary receiver operating characteristic (ROC) curve for a collection of studies evaluating diagnostic tests, even when test results are reported in an unequal number of nonnested ordered categories.
Abstract: Current meta-analytic methods for diagnostic test accuracy are generally applicable to a selection of studies reporting only estimates of sensitivity and specificity, or at most, to studies whose results are reported using an equal number of ordered categories. In this article, we propose a new meta-analytic method to evaluate test accuracy and arrive at a summary receiver operating characteristic (ROC) curve for a collection of studies evaluating diagnostic tests, even when test results are reported in an unequal number of nonnested ordered categories. We discuss both non-Bayesian and Bayesian formulations of the approach. In the Bayesian setting, we propose several ways to construct summary ROC curves and their credible bands. We illustrate our approach with data from a recently published meta-analysis evaluating a single serum progesterone test for diagnosing pregnancy failure.

Journal ArticleDOI
TL;DR: Numerical studies show that the new procedures for hypothesis testing and interval estimation of the common mean of several normal populations are accurate and perform better than the existing methods when the sample sizes are moderate and the number of populations is four or less.
Abstract: Cet article presente des procedures pour effectuer un test d'hypothese et une estimation par intervalle de la moyenne commune de plusieurs populations normales. Les methodes sont basees sur le concept du p generalise et de l'intervalle de confiance generalise. L'interet des methodes proposees est quantifie et compare a celui de methodes existantes. Les comparaisons numeriques montrent que les nouvelles procedures sont correctes et ont des performances superieures a celles des methodes existantes quand les echantillons sont de taille moderee et que le nombre de populations est quatre ou moins. Si le nombre de populations est cinq ou plus, cette nouvelle methode est bien superieure aux methodes existantes, quelles que soient les tailles des echantillons. Nous illustrons cette nouvelle methode et les methodes existantes avec des donnees de deux exemples.

Journal ArticleDOI
TL;DR: It is argued that the sophistication of the statistical analysis should not outweigh the quality of the data, and that finessing models for spatial dependence will often not be merited in the context of ecological regression.
Abstract: In many ecological regression studies investigating associations between environmental exposures and health outcomes, the observed relative risks are in the range 1.0-2.0. The interpretation of such small relative risks is difficult due to a variety of biases--some of which are unique to ecological data, since they arise from within-area variability in exposures/confounders. The potential for residual spatial dependence, due to unmeasured confounders and/or data anomalies with spatial structure, must also be considered, though it often will be of secondary importance when compared to the likely effects of unmeasured confounding and within-area variability in exposures/confounders. Methods for addressing sensitivity to these issues are described, along with an approach for assessing the implications of spatial dependence. An ecological study of the association between myocardial infarction and magnesium is critically reevaluated to determine potential sources of bias. It is argued that the sophistication of the statistical analysis should not outweigh the quality of the data, and that finessing models for spatial dependence will often not be merited in the context of ecological regression.

Journal ArticleDOI
TL;DR: A general Bayesian approach for inference on order‐constrained parameters in generalized linear models is proposed using an isotonic regression transformation, which allows flat regions over which increases in the level of a predictor have no effect.
Abstract: SUMMARY. In biomedical studies, there is often interest in assessing the association between one or more ordered categorical predictors and an outcome variable, adjusting for covariates. For a k-level predictor, one typically uses either a k - 1 degree of freedom (df) test or a single df trend test, which requires scores for the different levels of the predictor. In the absence of knowledge of a parametric form for the response function, one can incorporate monotonicity constraints to improve the efficiency of tests of association. This article proposes a general Bayesian approach for inference on order-constrained parameters in generalized linear models. Instead of choosing a prior distribution with support on the constrained space, which can result in major computational difficulties, we propose to map draws from an unconstrained posterior density using an isotonic regression transformation. This approach allows flat regions over which increases in the level of a predictor have no effect. Bayes factors for assessing ordered trends can be computed based on the output from a Gibbs sampling algorithm. Results from a simulation study are presented and the approach is applied to data from a time-to-pregnancy study.