scispace - formally typeset
Search or ask a question

Showing papers in "Biometrics in 1998"


Journal Article•DOI•
TL;DR: Introduction.
Abstract: Introduction. Aspects of Interpretation. Technical Considerations. Statistical Analysis. Special Methods for Joint Responses. Some Examples. Strategical Aspects. More Specialized Topics. Appendices.

3,913 citations


Journal Article•DOI•
TL;DR: A general approach to Time Series Modelling and ModeLLing with ARMA Processes, which describes the development of a Stationary Process in Terms of Infinitely Many Past Values and the Autocorrelation Function.
Abstract: Preface 1 INTRODUCTION 1.1 Examples of Time Series 1.2 Objectives of Time Series Analysis 1.3 Some Simple Time Series Models 1.3.3 A General Approach to Time Series Modelling 1.4 Stationary Models and the Autocorrelation Function 1.4.1 The Sample Autocorrelation Function 1.4.2 A Model for the Lake Huron Data 1.5 Estimation and Elimination of Trend and Seasonal Components 1.5.1 Estimation and Elimination of Trend in the Absence of Seasonality 1.5.2 Estimation and Elimination of Both Trend and Seasonality 1.6 Testing the Estimated Noise Sequence 1.7 Problems 2 STATIONARY PROCESSES 2.1 Basic Properties 2.2 Linear Processes 2.3 Introduction to ARMA Processes 2.4 Properties of the Sample Mean and Autocorrelation Function 2.4.2 Estimation of $\gamma(\cdot)$ and $\rho(\cdot)$ 2.5 Forecasting Stationary Time Series 2.5.3 Prediction of a Stationary Process in Terms of Infinitely Many Past Values 2.6 The Wold Decomposition 1.7 Problems 3 ARMA MODELS 3.1 ARMA($p,q$) Processes 3.2 The ACF and PACF of an ARMA$(p,q)$ Process 3.2.1 Calculation of the ACVF 3.2.2 The Autocorrelation Function 3.2.3 The Partial Autocorrelation Function 3.3 Forecasting ARMA Processes 1.7 Problems 4 SPECTRAL ANALYSIS 4.1 Spectral Densities 4.2 The Periodogram 4.3 Time-Invariant Linear Filters 4.4 The Spectral Density of an ARMA Process 1.7 Problems 5 MODELLING AND PREDICTION WITH ARMA PROCESSES 5.1 Preliminary Estimation 5.1.1 Yule-Walker Estimation 5.1.3 The Innovations Algorithm 5.1.4 The Hannan-Rissanen Algorithm 5.2 Maximum Likelihood Estimation 5.3 Diagnostic Checking 5.3.1 The Graph of $\t=1,\ldots,n\ 5.3.2 The Sample ACF of the Residuals

3,732 citations


Journal Article•DOI•
TL;DR: In this article, the authors consider the minimization of a second-degree polynomial subject to linear constraints and show that the Moore-Penrose inverse can be reduced to a linear transformation.
Abstract: Preface. - Matrices. - Submatrices and partitioned matricies. - Linear dependence and independence. - Linear spaces: row and column spaces. - Trace of a (square) matrix. - Geometrical considerations. - Linear systems: consistency and compatability. - Inverse matrices. - Generalized inverses. - Indepotent matrices. - Linear systems: solutions. - Projections and projection matrices. - Determinants. - Linear, bilinear, and quadratic forms. - Matrix differentiation. - Kronecker products and the vec and vech operators. - Intersections and sums of subspaces. - Sums (and differences) of matrices. - Minimzation of a second-degree polynomial (in n variables) subject to linear constraints. - The Moore-Penrose inverse. - Eigenvalues and Eigenvectors. - Linear transformations. - References. - Index.

1,987 citations


Journal Article•DOI•
TL;DR: This paper presents a general approach for assessing the sensitivity of the point and interval estimates of the primary exposure effect in an observational study to the residual confounding effects of unmeasured variable after adjusting for measured covariates.
Abstract: This paper presents a general approach for assessing the sensitivity of the point and interval estimates of the primary exposure effect in an observational study to the residual confounding effects of unmeasured variable after adjusting for measured covariates. The proposed method assumes that the true exposure effect can be represented in a regression model that includes the exposure indicator as well as the measured and unmeasured confounders. One can use the corresponding reduced model that omits the unmeasured confounder to make statistical inferences about the true exposure effect by specifying the distributions of the unmeasured confounder in the exposed and unexposed groups along with the effects of the unmeasured confounder on the outcome variable. Under certain conditions, there exists a simple algebraic relationship between the true exposure effect in the full model and the apparent exposure effect in the reduced model. One can then estimate the true exposure effect by making a simple adjustment to the point and interval estimates of the apparent exposure effect obtained from standard software or published reports. The proposed method handles both binary response and censored survival time data, accommodates any study design, and allows the unmeasured confounder to be discrete or normally distributed. We describe applications on two major medical studies.

731 citations


Journal Article•DOI•
TL;DR: Standard methods for the regression analysis of clustered data postulate models relating covariates to the response without regard to between- and within-cluster covariate effects, but it is shown that conditional likelihood methods estimate purely within-Cluster covariATE effects, whereas mixture model approaches estimate a weighted average of between-and-within-clustering effects.
Abstract: Standard methods for the regression analysis of clustered data postulate models relating covariates to the response without regard to between- and within-cluster covariate effects. Implicit in these analyses is the assumption that these effects are identical. Example data show that this is frequently not the case and that analyses that ignore differential between- and within-cluster covariate effects can be misleading. Consideration of between- and within-cluster effects also helps to explain observed and theoretical differences between mixture model analyses and those based on conditional likelihood methods. In particular, we show that conditional likelihood methods estimate purely within-cluster covariate effects, whereas mixture model approaches estimate a weighted average of between- and within-cluster covariate effects.

488 citations



Journal Article•DOI•
TL;DR: This book gives a broad and up-to-date coverage of bootstrap methods, with numerous applied examples, developed in a coherent way with the necessary theoretical basis, including improved Monte Carlo simulation.
Abstract: This book gives a broad and up-to-date coverage of bootstrap methods, with numerous applied examples, developed in a coherent way with the necessary theoretical basis. Applications include stratified data; finite populations; censored and missing data; linear, nonlinear, and smooth regression models; classification; time series and spatial problems. Special features of the book include: extensive discussion of significance tests and confidence intervals; material on various diagnostic methods; and methods for efficient computation, including improved Monte Carlo simulation. Each chapter includes both practical and theoretical exercises. Included with the book is a disk of purpose-written S-Plus programs for implementing the methods described in the text. Computer algorithms are clearly described, and computer code is included on a 3-inch, 1.4M disk for use with IBM computers and compatible machines. Users must have the S-Plus computer application. Author resource page: http://statwww.epfl.ch/davison/BMA/

464 citations


Journal Article•DOI•
TL;DR: It is argued that two other quantities must be considered in the validation of a surrogate endpoint: RE, the effect of Z on T relative to that ofZ on S, and gamma Z, the association between S and T after adjustment for Z.
Abstract: The validation of surrogate endpoints has been studied by Prentice (1989, Statistics in Medicine 8, 431-440) and Freedman, Graubard, and Schatzkin (1992, Statistics in Medicine 11, 167-178). We extended their proposals in the cases where the surrogate and the final endpoints are both binary or normally distributed. Letting T and S be random variables that denote the true and surrogate endpoint, respectively, and Z be an indicator variable for treatment, Prentice's criteria are fulfilled if Z has a significant effect on T and on S, if S has a significant effect on T, and if Z has no effect on T given S. Freedman relaxed the latter criterion by estimating PE, the proportion of the effect of Z on T that is explained by S, and by requiring that the lower confidence limit of PE be larger than some proportion, say 0.5 or 0.75. This condition can only be verified if the treatment has a massively significant effect on the true endpoint, a rare situation. We argue that two other quantities must be considered in the validation of a surrogate endpoint: RE, the effect of Z on T relative to that of Z on S, and gamma Z, the association between S and T after adjustment for Z. A surrogate is said to be perfect at the individual level when there is a perfect association between the surrogate and the final endpoint after adjustment for treatment. A surrogate is said to be perfect at the population level if RE is 1. A perfect surrogate fulfills both conditions, in which case S and T are identical up to a deterministic transformation. Fieller's theorem is used for the estimation of PE, RE, and their respective confidence intervals. Logistic regression models and the global odds ratio model studied by Dale (1986, Biometrics, 42, 909-917) are used for binary endpoints. Linear models are employed for continuous endpoints. In order to be of practical value, the validation of surrogate endpoints is shown to require large numbers of observations.

410 citations


Journal Article•DOI•

333 citations


Journal Article•DOI•
TL;DR: McLachlan and Krishnan as discussed by the authors presented a unified account of the theory, methodology, and applications of the Expectation-Maximization (EM) algorithm and its extensions, and illustrated applications in many statistical contexts.
Abstract: The first unified account of the theory, methodology, and applications of the EM algorithm and its extensionsSince its inception in 1977, the Expectation-Maximization (EM) algorithm has been the subject of intense scrutiny, dozens of applications, numerous extensions, and thousands of publications. The algorithm and its extensions are now standard tools applied to incomplete data problems in virtually every field in which statistical methods are used. Until now, however, no single source offered a complete and unified treatment of the subject.The EM Algorithm and Extensions describes the formulation of the EM algorithm, details its methodology, discusses its implementation, and illustrates applications in many statistical contexts. Employing numerous examples, Geoffrey McLachlan and Thriyambakam Krishnan examine applications both in evidently incomplete data situations-where data are missing, distributions are truncated, or observations are censored or grouped-and in a broad variety of situations in which incompleteness is neither natural nor evident. They point out the algorithm's shortcomings and explain how these are addressed in the various extensions.Areas of application discussed include: Regression Medical imaging Categorical data analysis Finite mixture analysis Factor analysis Robust statistical modeling Variance-components estimation Survival analysis Repeated-measures designs For theoreticians, practitioners, and graduate students in statistics as well as researchers in the social and physical sciences, The EM Algorithm and Extensions opens the door to the tremendous potential of this remarkably versatile statistical tool.

328 citations


Journal Article•DOI•
TL;DR: This chapter discusses mapping with Real Data Cloning and Clone Libraries, and physical maps and clone libraries, and the challenges faced in mapping with real data.
Abstract: Preface Introduction Molecular Biology Mathematics, Statistics, and Computer Science Some Molecular Biology DNA and Proteins The Central Dogma The Genetic Code Transfer RNA and Protein Sequences Genes Are Not Simple Biological Chemistry Restriction Maps Introduction Graphs Interval Graphs Measuring Fragment Sizes Multiple Maps Double Digest Problem Classifying Multiple Solutions Algorithms for DDP Algorithms and Complexity DDP is N P-Complete Approaches to DDP Simulated Annealing: TSP and DDP Mapping with Real Data Cloning and Clone Libraries A Finite Number of Random Clones Libraries by Complete Digestion Libraries by Partial Digestion Genomes per Microgram Physical Genome Maps: Oceans, Islands, and Anchors Mapping by Fingerprinting Mapping by Anchoring An Overview of Clone Overlap Putting It Together Sequence Assembly Shotgun Sequencing Sequencing by Hybridization Shotgun Sequencing Revisited Databases and Rapid Sequence Analysis DNA and Protein Sequence Databases A Tree Representation of a Sequence Hashing a Sequence Repeats in a Sequence Sequence Comparison by Hashing Sequence Comparison with at most l Mismatches Sequence Comparison by Statistical Content Dynamic Programming Alignment of Two Sequences The Number of Alignments Shortest and Longest Paths in a Network Global Distance Alignment Global Similarity Alignment Fitting One Sequence into Another Local Alignment and Clumps Linear Space Algorithms Tracebacks Inversions Map Alignment Parametric Sequence Comparisons Multiple Sequence Alignment The Cystic Fibrosis Gene Dynamic Programming in r-Dimensions Weighted-Average Sequences Profile Analysis Alignment by Hidden Markov Models Consensus Word Analysis Probability and Statistics for Sequence Alignment Global Alignment Local Alignment Extreme Value Distributions The Chein-Stein Method Poisson Approximation and Long Matches Sequence Alignment with Scores Probability and Statistics for Sequence Patterns A Central Limit Theorem Nonoverlapping Pattern Counts Poisson Approximation Site Distributions RNA Secondary Structure Combinatorics Minimum Free-energy Structures Consensus folding Trees and Sequences Trees Distance Parsimony Maximum Likelihood Trees Sources and Perspectives Molecular Biology Physical Maps and Clone Libraries Sequence Assembly Sequence Comparisons Probability and Statistics RNA Secondary Structure Trees and Sequences References Problem Solutions and Hints Mathematical Notation Algorithm Index Author Index Subject Index

Journal Article•DOI•
TL;DR: The problem of detecting influential subjects in the context of longitudinal data is considered, following the approach of local influence proposed by Cook.
Abstract: The linear mixed model has become an important tool in modelling, partially due to the introduction of the SAS procedure MIXED, which made the method widely available to practising statisticians. Its growing popularity calls for data-analytic methods to check the underlying assumptions and robustness. Here, the problem of detecting influential subjects in the context of longitudinal data is considered, following the approach of local influence proposed by Cook.

Journal Article•DOI•
TL;DR: A design strategy for single-arm clinical trials in which the goals are to find a dose of an experimental treatment satisfying both safety and efficacy requirements, treat a sufficiently large number of patients to estimate the rates of these events at the selected dose, and stop the trial early if it is likely that no dose is both safe and efficacious.
Abstract: We propose a design strategy for single-arm clinical trials in which the goals are to find a dose of an experimental treatment satisfying both safety and efficacy requirements, treat a sufficiently large number of patients to estimate the rates of these events at the selected dose with a given reliability, and stop the trial early if it is likely that no dose is both safe and efficacious. Patient outcome is characterized by a trinary ordinal variable accounting for both efficacy and toxicity. Like Thall, Simon, and Estey (1995, Statistics in Medicine 14, 357-379), we use Bayesian criteria to generate decision rules while relying on frequentist criteria obtained via simulation to determine a design parameterization with good operating characteristics. The strategy is illustrated by application to a bone marrow transplantation trial for hematologic malignancies and a trial of a biologic agent for malignant melanoma.


Journal Article•DOI•
TL;DR: In this article, general definitions of demographic and environmental variances as well as demographic covariance are given from first principles, which are consistent with use of these terms in models with additive effects and in diffusion approximations to population processes.
Abstract: SUMMARY General definitions of demographic and environmental variances as well as demographic covariance are given from first principles. The sum of the environmental variance and the demographic covariance is the covariance betweeen two individuals' contributions to the population growth within a year and corresponds to the coefficient of x2 in the expression for the variance of the change in population size. Hence, this coefficient may actually be negative. The demographic variance is the variance among individuals. It is shown that the definitions are consistent with use of these terms in models with additive effects and in diffusion approximations to population processes. The connection to classical birth and death processes, in which the environmental variance and demographic covariance is always zero, is discussed. The concepts are illustrated by some stochastic simulations.

Journal Article•DOI•
TL;DR: A bidirectional case-crossover design in which exposures at failure are compared with exposures both before and after failure is described, and relative risk estimates are resistant to confounding by time trend.
Abstract: In the case-crossover design (Maclure, 1991, American Journal of Epidemiology 133, 144-153), only cases are sampled, and risk estimates are based on within-subject comparisons of exposures at failure times with exposures at times prior to failure, using matched case-control methods. While the design provides considerable advantages, unidirectional retrospective control sampling (selecting control times only prior to failure) can cause risk estimates to be confounded by time trends in exposure. However, when subsequent exposures are not influenced by failures, as in studies of environmental exposures such as air pollutants, it is possible to determine at times postfailure what a subject's level of exposure would have been had the subject not failed. We describe a bidirectional case-crossover design in which exposures at failure are compared with exposures both before and after failure. Simulation analyses show that relative risk estimates are resistant to confounding by time trend. We also extend the method to studies involving multiple failure times.

Journal Article•DOI•

Journal Article•DOI•
TL;DR: A Bayesian model is provided that allows the random effects to have a nonparametric prior distribution in longitudinal random effects models and a Dirichlet process prior is proposed for the distribution of therandom effects.
Abstract: In longitudinal random effects models, the random effects are typically assumed to have a normal distribution in both Bayesian and classical models. We provide a Bayesian model that allows the random effects to have a nonparametric prior distribution. We propose a Dirichlet process prior for the distribution of the random effects; computation is made possible by the Gibbs sampler. An example using marker data from an AIDS study is given to illustrate the methodology.

Journal Article•DOI•
TL;DR: This paper proposes and compares three very different regression analysis methods for Receiver operating characteristic (ROC) curves and elucidate the correspondence between regression parameters in the different models.
Abstract: The accuracy of a medical diagnostic test is typically summarized by the sensitivity and specificity when the test result is dichotomous. Receiver operating characteristic (ROC) curves are measures of test accuracy that are used when test results are continuous and are considered the analogs of sensitivity and specificity for continuous tests. ROC regression analysis allows one to evaluate effects of factors that may influence test accuracy. Such factors might include characteristics of study subjects or operating conditions for the test. Unfortunately, regression analysis methods for ROC curves are not well developed and methods that do exist have received little use to date. In this paper, we propose and compare three very different regression analysis methods. Two are modifications of methods previously proposed for radiology settings. The third is a special case of a general method recently proposed by us. The three approaches are compared with regard to settings in which they can be applied and distributional assumptions they require. In the setting where test results are normally distributed, we elucidate the correspondence between regression parameters in the different models. The methods are applied to simulated data and to data from a study of a new diagnostic test for hearing impairment. It is hoped that the presentation in this paper will both encourage the use of regression analysis for evaluating diagnostic tests and help guide the choice of the most appropriate regression analysis approach in applications.

Journal Article•DOI•
TL;DR: A shared parameter model with logistic link is presented for longitudinal binary response data to accommodate informative drop-out and comparisons are made to an approximate conditional logit model in terms of a clinical trial dataset and simulations to provide evidence that the share parameter model holds for the pain data.
Abstract: A shared parameter model with logistic link is presented for longitudinal binary response data to accommodate informative drop-out. The model consists of observed longitudinal and missing response components that share random effects parameters. To our knowledge, this is the first presentation of such a model for longitudinal binary response data. Comparisons are made to an approximate conditional logit model in terms of a clinical trial dataset and simulations. The naive mixed effects logit model that does not account for informative drop-out is also compared. The simulation-based differences among the models with respect to coverage of confidence intervals, bias, and mean squared error (MSE) depend on at least two factors: whether an effect is a between- or within-subject effect and the amount of between-subject variation as exhibited by variance components of the random effects distributions. When the shared parameter model holds, the approximate conditional model provides confidence intervals with good coverage for within-cluster factors but not for between-cluster factors. The converse is true for the naive model. Under a different drop-out mechanism, when the probability of drop-out is dependent only on the current unobserved observation, all three models behave similarly by providing between-subject confidence intervals with good coverage and comparable MSE and bias but poor within-subject confidence intervals, MSE, and bias. The naive model does more poorly with respect to the within-subject effects than do the shared parameter and approximate conditional models. The data analysis, which entails a comparison of two pain relievers and a placebo with respect to pain relief, conforms to the simulation results based on the shared parameter model but not on the simulation based on the outcome-driven drop-out process. This comparison between the data analysis and simulation results may provide evidence that the shared parameter model holds for the pain data.

Journal Article•DOI•
TL;DR: In this article, a fully efficient approach for the analysis of multi-environment early stage variety trials is considered that accommodates a general spatial covariance structure for the errors of each trial, simultaneously producing best linear unbiased predictors of the genotype and genotype by environment interaction effects and residual maximum likelihood estimates of the spatial parameters and variance components.
Abstract: A fully efficient approach for the analysis of multi-environment early stage variety trials is considered that accommodates a general spatial covariance structure for the errors of each trial. The analysis simultaneously produces best linear unbiased predictors of the genotype and genotype by environment interaction effects and residual maximum likelihood estimates of the spatial parameters and variance components. Two motivating examples are presented and analyzed, and the results suggest that the previous approximate analyses can seriously affect estimation of the genetic merit of breeding lines, particularly for models with more complex variance structures.

Journal Article•DOI•
TL;DR: A proportional hazards (PH) model is modified to take account of long-term survivors by assuming the cumulative hazard to be bounded but otherwise unspecified to yield an improper survival function.
Abstract: A proportional hazards (PH) model is modified to take account of long-term survivors by assuming the cumulative hazard to be bounded but otherwise unspecified to yield an improper survival function. A marginal likelihood is derived under the restriction for type I censoring patterns. For a PH model with cure, the marginal and the partial likelihood are not the same. In the absence of covariate information, the estimate of the cure rate based on the marginal likelihood reduces to the value of the Kaplan-Meier estimate at the end of the study. An example of low asymptotic efficiency of the partial likelihood as compared to the marginal, profile, and parametric likelihoods is given. An algorithm is suggested to fit the full PH model with cure.

Journal Article•DOI•
TL;DR: In this article, a line transect survey of Antarctic minke whales is used to estimate the abundance of the animals on the trackline and the probability of detection depends on the perpendicular distance and additional covariates.
Abstract: SUMMARY One of the key assumptions of conventional line transect (LT) theory is that all animals in the observer's path are detected. When this assumption fails, simultaneous survey by two independent observers can be used to estimate detection probabilities and abundance. Models are developed for such surveys for both grouped and ungrouped perpendicular distance data. The models unify and generalize existing line transect and mark-recapture models. They provide a general framework for the estimation of abundance from LT surveys in which detection of animals on the trackline is not certain and/or the probability of detection depends on perpendicular distance and additional covariates. Existing LT models in the literature are obtained as special cases of the general models. We use data from a shipboard line transect survey of Antarctic minke whales to illustrate use of

Journal Article•DOI•
TL;DR: In this paper, the authors explore using autologistic regression models for spatial binary data with covariates and find that the MCMC MLEs are approximately normally distributed and that MCMC estimates of Fisher information may be used to estimate the variance of MCMCMLEs and to construct confidence intervals.
Abstract: SUMMARY In this paper, we explore using autologistic regression models for spatial binary data with covariates. Autologistic regression models can handle binary responses exhibiting both spatial correlation and dependence on covariates. We use Markov chain Monte Carlo (MCMC) to estimate the parameters in these models. The distributional behavior of the MCMC maximum likelihood estimates (MCMC MLEs) is studied via simulation. We find that the MCMC MLEs are approximately normally distributed and that the MCMC estimates of Fisher information may be used to estimate the variance of the MCMC MLEs and to construct confidence intervals. Finally, we illustrate by example how our studies may be applied to model the distribution of plant species.

Journal Article•DOI•
TL;DR: This paper reports the results of an extensive Monte Carlo study of the distribution of the likelihood ratio test statistic using the value of the restricted likelihood for testing random components in the linear mixed-effects model when the number of fixed components remains constant.
Abstract: This paper reports the results of an extensive Monte Carlo study of the distribution of the likelihood ratio test statistic using the value of the restricted likelihood for testing random components in the linear mixed-effects model when the number of fixed components remains constant. The distribution of this test statistic is considered when one additional random component is added. The distribution of the likelihood ratio test statistic computed using restricted maximum likelihood is compared to the likelihood ratio test statistic computed from the usual maximum likelihood. The rejection proportion is computed under the null hypothesis using a mixture of chi-square distributions. The restricted likelihood ratio statistic has a reasonable agreement with the maximum likelihood test statistic. For the parameter combinations considered, the rejection proportions are, in most cases, less than the nominal 5% level for both test statistics, though, on average, the rejection proportions for REML are closer to the nominal level than for ML.

Journal Article•DOI•
TL;DR: Confidence intervals and bands are shown for a cumulative incidence function under the Cox model for future patients with certain covariates in the presence of dependent competing risks in survival analysis.
Abstract: In the presence of dependent competing risks in survival analysis, the Cox model can be utilized to examine the covariate effects on the cause-specific hazard function for the failure type of interest. For this situation, the cumulative incidence function provides an intuitively appealing summary curve for marginal probabilities of this particular event. In this paper, we show how to construct confidence intervals and bands for such a function under the Cox model for future patients with certain covariates. Our proposals are illustrated with data from a prostate cancer trial.

Journal Article•DOI•
TL;DR: A likelihood-based approach is introduced, which is referred to as the semiparametric method, and it is shown that this method is an appealing alternative to the Cox proportional hazards model to analyze the relationship between survival and CD4 count in patients with AIDS.
Abstract: The Cox proportional hazards model is commonly used to model survival data as a function of covariates. Because of the measuring mechanism or the nature of the environment, covariates are often measured with error and are not directly observable. A naive approach is to use the observed values of the covariates in the Cox model, which usually produces biased estimates of the true association of interest. An alternative strategy is to take into account the error in measurement, which may be carried out for the Cox model in a number of ways. We examine several such approaches and compare and contrast them through several simulation studies. We introduce a likelihood-based approach, which we refer to as the semiparametric method, and show that this method is an appealing alternative. The methods are applied to analyze the relationship between survival and CD4 count in patients with AIDS.

Journal Article•DOI•
TL;DR: The penalized likelihood approach gives a solution to problems involving general censoring and truncation in survival data and the age-specific incidence of dementia is estimated and risk factors of dementia studied.
Abstract: The Cos model is the model of choice when analyzing survival data presenting only right censoring and left truncation. There is a need for methods that can accommodate more complex observation schemes involving general censoring and truncation. In addition, it is important in many epidemiological applications to have a smooth estimate of the hazard function. We show that the penalized likelihood approach gives a solution to these problems. The solution of the maximum of the penalized likelihood is approximated on a basis of splines. The smoothing parameter is estimated using approximate cross-validation; confidence bands can be given. A simulation study shows that this approach gives better results than the smoothed Nelson-Aalen estimator. We apply this method to the analysis of data from a large cohort study on cerebral aging. The age-specific incidence of dementia is estimated and risk factors of dementia studied.

Journal Article•DOI•
TL;DR: In this article, the authors developed a mark-recapture estimator based on estimated inclusion probabilities for line transect surveys in which some of the central assumptions of conventional line-transect theory fail.
Abstract: SUMMARY Horvitz-Thompson estimators based on estimated inclusion probabilities are developed for line transect surveys in which some of the central assumptions of conventional line transect theory fail. The estimators are designed to accommodate uncertain detection of animals in the observers' path simultaneously with dependence of detection probabilities on explanatory variables other than perpendicular distance. They require that data be collected by two observer teams simultaneously searching the survey region, recording for each detection both positional data and details of any other variables that may affect detectability. One of the estimators is designed to correct for responsive animal movement. This method requires that the primary observer team searches independently of the secondary team, whereas the second estimator requires both observer teams to search independently of each other. The methods can be viewed as specific forms of mark-recapture experiments, in which individual capture probabilities are modelled as functions of observable explanatory variables and in which animals detected by both observer teams correspond to recaptures. They differ from existing mark-recapture Horvitz-Thompson estimators primarily in that they use an assumption about the distribution of perpendicular distances that is implicit in conventional

Journal Article•DOI•
TL;DR: In this paper, the authors proposed a method based on two separate steps to estimate the (co)variance matrix of the traits and use this estimate to obtain the canonical variables associated to the traits.
Abstract: Statistical methods for the detection of genes influencing quantitative trait (QTLs) with the aid of genetic markers are well developed for the analysis of a single trait. In practice, many experimental data contain observations on multiple correlated traits and methods that permit joint analysis of all traits are now required. Generalization of the maximum likelihood method to a multitrait analysis is a good approach, but the increase in complexity, due to the number of parameters to be estimated simultaneously, could restrain its practical use when the number of traits is large. We propose an alternative method based on two separate steps. The first step is to estimate the (co)variance matrix of the traits and use this estimate to obtain the canonical variables associated to the traits. The second step is to apply a single-trait maximum likelihood method to each of the canonical variables and to combine the results. Working in a local asymptotic framework for the effects of the putative pleiotropic QTL, i.e., for a pleiotropic QTL whose effect is too small to be detected with certainty, we prove that the combined analysis with canonical variables is asymptotically equivalent to a multitrait maximum likehood analysis. A threshold for the mapping of the pleiotropic QTL is also given. The probability of detecting a QTL is not always increased by the addition of more correlated traits. As an example, a theoretical comparison between the power of a multitrait analysis with two variables and the power of a single-trait analysis is presented. Experimental data collected to study the polygenic resistance of tomato plants to bacterial wilt are used to illustrate the combined analysis with canonical variables.