scispace - formally typeset
Search or ask a question

Showing papers in "Journal of The Royal Statistical Society Series C-applied Statistics in 1999"


Journal ArticleDOI
TL;DR: In this paper, the cubic smoothing spline is used in conjunction with fixed and random effects, random coefficients and variance modelling to provide simultaneous modelling of trends and covariance structure, which allows coherent and flexible empirical model building in complex situations.
Abstract: In designed experiments and in particular longitudinal studies, the aim may be to assess the effect of a quantitative variable such as time on treatment effects. Modelling treatment effects can be complex in the presence of other sources of variation. Three examples are presented to illustrate an approach to analysis in such cases. The first example is a longitudinal experiment on the growth of cows under a factorial treatment structure where serial correlation and variance heterogeneity complicate the analysis. The second example involves the calibration of optical density and the concentration of a protein DNase in the presence of sampling variation and variance heterogeneity. The final example is a multienvironment agricultural field experiment in which a yield-seeding rate relationship is required for several varieties of lupins. Spatial variation within environments, heterogeneity between environments and variation between varieties all need to be incorporated in the analysis. In this paper, the cubic smoothing spline is used in conjunction with fixed and random effects, random coefficients and variance modelling to provide simultaneous modelling of trends and covariance structure. The key result that allows coherent and flexible empirical model building in complex situations is the linear mixed model representation of the cubic smoothing spline. An extension is proposed in which trend is partitioned into smooth and nonsmooth components. Estimation and inference, the analysis of the three examples and a discussion of extensions and unresolved issues are also presented.

594 citations


Journal ArticleDOI
TL;DR: In this article, a non-homogeneous hidden Markov model is proposed for relating precipitation occurrences at multiple rain-gauge stations to broad scale atmospheric circulation patterns (the socalled "downscaling problem").
Abstract: Summary. A non-homogeneous hidden Markov model is proposed for relating precipitation occurrences at multiple rain-gauge stations to broad scale atmospheric circulation patterns (the socalled 'downscaling problem'). We model a 15-year sequence of winter data from 30 rain stations in south-western Australia. The first 10 years of data are used for model development and the remaining 5 years are used for model evaluation. The fitted model accurately reproduces the observed rainfall statistics in the reserved data despite a shift in atmospheric circulation (and, consequently, rainfall) between the two periods. The fitted model also provides some useful insights into the processes driving rainfall in this region.

417 citations


Journal ArticleDOI
TL;DR: In this paper, a non-iterative algorithm for fitting a particular dynamic paired comparison model was proposed, which improves over the commonly used algorithm of Elo by incorporating the variability in parameter estimates, can be performed regularly even for large populations of competitors.
Abstract: Summary. Paired comparison data in which the abilities or merits of the objects being compared may be changing over time can be modelled as a non-linear state space model. When the population of objects being compared is large, likelihood-based analyses can be too computationally cumbersome to carry out regularly. This presents a problem for rating populations of chess players and other large groups which often consist of tens of thousands of competitors. This problem is overcome through a computationally simple non-iterative algorithm for fitting a particular dynamic paired comparison model. The algorithm, which improves over the commonly used algorithm of Elo by incorporating the variability in parameter estimates, can be performed regularly even for large populations of competitors. The method is evaluated on simulated data and is applied to ranking the best chess players of all time, and to ranking the top current tennis-players.

347 citations


Journal ArticleDOI
TL;DR: The problems of replication stability, model complexity, selection bias and an overoptimistic estimate of the predictive value of a model are discussed together with several proposals based on resampling methods, which favour greater simplicity of the final regression model.
Abstract: Summary. The number of variables in a regression model is often too large and a more parsimonious model may be preferred. Selection strategies (e.g. all-subset selection with various penalties for model complexity, or stepwise procedures) are widely used, but there are few analytical results about their properties. The problems of replication stability, model complexity, selection bias and an overoptimistic estimate of the predictive value of a model are discussed together with several proposals based on resampling methods. The methods are applied to data from a case-control study on atopic dermatitis and a clinical trial to compare two chemotherapy regimes by using a logistic regression and a Cox model. A recent proposal to use shrinkage factors to reduce the bias of parameter estimates caused by model building is extended to parameterwise shrinkage factors and is discussed as a further possibility to illustrate problems of models which are too complex. The results from the resampling approaches favour greater simplicity of the final regression model.

294 citations


Journal ArticleDOI
TL;DR: In this paper, the authors demonstrate the practical importance of these design and analysis principles by an analysis of, and simulations based on, data from the US National Wilms Tumor Study.
Abstract: Two-phase stratified sampling is used to select subjects for the collection of additional data, e.g. validation data in measurement error problems. Stratification jointly by outcome and covariates, with sampling fractions chosen to achieve approximately equal numbers per stratum at the second phase of sampling, enhances efficiency compared with stratification based on the outcome or covariates alone. Nonparametric maximum likelihood may result in substantially more efficient estimates of logistic regression coefficients than weighted or pseudolikelihood procedures. Software to implement all three procedures is available. We demonstrate the practical importance of these design and analysis principles by an analysis of, and simulations based on, data from the US National Wilms Tumor Study.

222 citations


Journal ArticleDOI
TL;DR: A multiple-cause model is developed in which deaths from cancer and cardiovascular disease in Glasgow are examined simultaneously in a spatial model in which spatial autocorrelation between residuals is examined.
Abstract: Summary. Multilevel modelling is used on problems arising from the analysis of spatially distributed health data. We use three applications to demonstrate the use of multilevel modelling in this area. The first concerns small area all-cause mortality rates from Glasgow where spatial autocorrelation between residuals is examined. The second analysis is of prostate cancer cases in Scottish counties where we use a range of models to examine whether the incidence is higher in more rural areas. The third develops a multiple-cause model in which deaths from cancer and cardiovascular disease in Glasgow are examined simultaneously in a spatial model. We discuss some of the issues surrounding the use of complex spatial models and the potential for future developments.

179 citations


Journal ArticleDOI
TL;DR: In this article, a semiparametric method is proposed to estimate the mean and variance of regression quantiles from cross-sectional or longitudinal data for females under 3 years of age.
Abstract: Summary. The appropriate interpretation of measurements often requires standardization for concomitant factors. For example, standardization of weight for both height and age is important in obesity research and in failure-to-thrive research in children. Regression quantiles from a reference population afford one intuitive and popular approach to standardization. Current methods for the estimation of regression quantiles can be classified as nonparametric with respect to distributional assumptions or as fully parametric. We propose a semiparametric method where we model the mean and variance as flexible regression spline functions and allow the unspecified distribution to vary smoothly as a function of covariates. Similarly to Cole and Green, our approach provides separate estimates and summaries for location, scale and distribution. However, similarly to Koenker and Bassett, we do not assume any parametric form for the distribution. Estimation for either cross-sectional or longitudinal samples is obtained by using estimating equations for the location and scale functions and through local kernel smoothing of the empirical distribution function for standardized residuals. Using this technique with data on weight, height and age for females under 3 years of age, we find that there is a close relationship between quantiles of weight for height and age and quantiles of body mass index (BMI=weight/height2) for age in this cohort.

103 citations


Journal ArticleDOI
TL;DR: In this article, the authors considered a set of data from 80 stations in the Venezuelan state of Guarico consisting of accumulated monthly rainfall in a time span of 16 years and considered a model based on a full second degree polynomial over the spatial co-ordinates as well as the first two Fourier harmonics to describe the variability during the year.
Abstract: We consider a set of data from 80 stations in the Venezuelan state of Guarico consisting of accumulated monthly rainfall in a time span of 16 years. The problem of modelling rainfall accumulated over fixed periods of time and recorded at meteorological stations at different sites is studied by using a model based on the assumption that the data follow a truncated and transformed multivariate normal distribution. The spatial correlation is modelled by using an exponentially decreasing correlation function and an interpolating surface for the means. Missing data and dry periods are handled within a Markov chain Monte Carlo framework using latent variables. We estimate the amount of rainfall as well as the probability of a dry period by using the predictive density of the data. We considered a model based on a full second-degree polynomial over the spatial co-ordinates as well as the first two Fourier harmonics to describe the variability during the year. Predictive inferences on the data show very realistic results, capturing the typical rainfall variability in time and space for that region. Important extensions of the model are also discussed.

90 citations


Journal ArticleDOI
TL;DR: In this paper, the authors examine the structure of distance matrices in the presence of a priori grouping of units and show how the total squared distance among the units of a multivariate data set can be partitioned according to the factors of an external classification.
Abstract: are not consonant with MANOVA assumptions. One particular such data set from economics is described. This set has a 24 factorial design with eight variables measured on each individual, but the application of MANOVA seems inadvisable given the highly skewed nature of the data. To establish a basis for analysis, we examine the structure of distance matrices in the presence of a priori grouping of units and show how the total squared distance among the units of a multivariate data set can be partitioned according to the factors of an external classification. The partitioning is exactly analogous to that in the univariate analysis of variance. It therefore provides a framework for the analysis of any data set whose structure conforms to that of MANOVA, but which for various reasons cannot be analysed by this technique. Descriptive aspects of the technique are considered in detail, and inferential questions are tackled via randomization tests. This approach provides a satisfactory analysis of the economics data.

76 citations


Journal ArticleDOI
TL;DR: In this paper, a model for estimating the number of grouse reaching maturity in the high eastern region of Belgium is presented. But the model does not consider the effect of habitat factors, such as changes in methods of rabies control, the activity of poachers and evolution of the plant habitat in the region over the observation period.
Abstract: Blackgrouse (Tetrao tetrix) in the high eastern region of Belgium form a very small population that is close to extinction. Biologists have followed them closely for many years. The numbers of cocks on their mating grounds are counted each spring, with data available for 30 years. In modelling this population, we are interested to see whether we can obtain an adequate model for the available data by using only climatic information. If a population is in an ecologically viable equilibrium, it should be able to adjust to a changing habitat; short-term variations in weather should be the only major influence on the size of the population. This can have an immediate effect on the survival of the young, but also a more delayed effect through the numbers of grouse reaching maturation. To account for the latter, population levels in the previous 2 years are included in the model because the cocks take 2 years to reach maturity. We know that habitat factors, especially variations in the number of foxes as influenced by changes in methods of rabies control, the activity of poachers and evolution of the plant habitat in the region over the observation period, have an effect. However, the question is not whether such variables are missing; we know that they are. The question concerns the adequacy and appropriateness of a climatic model for describing the observed variations. For a further discussion of these assumptions, and details on the models and the conclusions, see

72 citations


Journal ArticleDOI
TL;DR: In this article, the analysis of discrete state series such as DNA sequences can be modelled by Markov chains and the analysis is discussed in the context of log-linear models.
Abstract: Discrete state series such as DNA sequences can often be modelled by Markov chains. The analysis of such series is discussed in the context of log-linear models. The data produce contingency tables with similar margins due to the dependence of the observations. However, despite the unusual structure of the tables, the analysis is equivalent to that for data from multinomial sampling. The reason why the standard number of degrees of freedom is correct is explained by using theoretical arguments and the asymptotic distribution of the deviance is verified empirically. Problems involved with fitting high order Markov chain models, such as reduced power and computational expense, are also discussed.

Journal ArticleDOI
TL;DR: In this article, the effect of the non-stationarity of the sea level on the estimation of extreme sea-level distributions is assessed and compared with a recently proposed alternative that incorporates the knowledge of the tidal component and its associated interactions, by applying them to 22 UK data sites.
Abstract: The sea-level is the composition of astronomical tidal and meteorological surge processes. It exhibits temporal non-stationarity due to a combination of long-term trend in the mean level, the deterministic tidal component, surge seasonality and interactions between the tide and surge. We assess the effect of these non-stationarities on the estimation of the distribution of extreme sea-levels. This is important for coastal flood assessment as the traditional method of analysis assumes that, once the trend has been removed, extreme sea-levels are from a stationary sequence. We compare the traditional approach with a recently proposed alternative that incorporates the knowledge of the tidal component and its associated interactions, by applying them to 22 UK data sites and through a simulation study. Our main finding is that if the tidal non-stationarity is ignored then a substantial underestimation of extreme sea-levels results for most sites. In contrast, if surge seasonality and the tide–surge interaction are not modelled the traditional approach produces little additional bias. The alternative method is found to perform well but requires substantially more statistical modelling and better data quality.

Journal ArticleDOI
TL;DR: In this article, the analysis of extreme values is often required from short series which are biasedly sampled or contain outliers, such as data for sea-levels at two UK east coast sites and data on athletics records for women's 3000 m track races.
Abstract: The analysis of extreme values is often required from short series which are biasedly sampled or contain outliers. Data for sea-levels at two UK east coast sites and data on athletics records for women's 3000 m track races are shown to exhibit such characteristics. Univariate extreme value methods provide a poor quantification of the extreme values for these data. By using bivariate extreme value methods we analyse jointly these data with related observations, from neighbouring coastal sites and 1500 m races respectively. We show that using bivariate methods provides substantial benefits, both in these applications and more generally with the amount of information gained being determined by the degree of dependence, the lengths and the amount of overlap of the two series, the homogeneity of the marginal characteristics of the variables and the presence and type of the outlier.

Journal ArticleDOI
TL;DR: In this paper, the daily evolution of the price of Abbey National shares over a 10-week period is analysed by using regression models based on possibly non-symmetric stable distributions, which can be used in practice for interactive modelling of heavy-tailed processes.
Abstract: The daily evolution of the price of Abbey National shares over a 10-week period is analysed by using regression models based on possibly non-symmetric stable distributions. These distributions, which are only known through their characteristic function, can be used in practice for interactive modelling of heavy-tailed processes. A regression model for the location parameter is proposed and shown to induce a similar model for the mode. Finally, regression models for the other three parameters of the stable distribution are introduced. The model found to fit best allows the skewness of the distribution, rather than the location or scale parameters, to vary over time. The most likely share return is thus changing over time although the region where most returns are observed is stationary.

Journal ArticleDOI
TL;DR: In this article, the authors suggest the logit rank plot as a good way of summarizing the effectiveness of a risk score, where the slope of this plot gives an overall measure of effectiveness.
Abstract: Summary. A risk score s for event E is a function of covariates with the property that P(EIs) is an increasing function of s. Motivated by applications in medicine and in criminology, we suggest the logit rank plot as a good way of summarizing the effectiveness of such a score. Explicitly, plot logit{P(EIs)} against logit(r), where r is the proportional rank of s in a sample or population. The slope of this plot gives an overall measure of effectiveness, and the logit rank transformation provides a common basis on which different risk scores can be compared. Some practical and theoretical aspects are discussed.

Journal ArticleDOI
TL;DR: In this paper, the authors examined the sensitivity of additive treatment effect to hidden bias in two observational studies, one investigating the effects of minimum wage laws on employment and the other of exposure to lead.
Abstract: In two observational studies, one investigating the effects of minimum wage laws on employment and the other of the effects of exposures to lead, an estimated treatment effect's sensitivity to hidden bias is examined. The estimate uses the combined quantile averages that were introduced in 1981 by B. M. Brown as simple, efficient, robust estimates of location admitting both exact and approximate confidence intervals and significance tests. Closely related to Gastwirth's estimate and Tukey's trimean, the combined quantile average has asymptotic efficiency for normal data that is comparable with that of a 15% trimmed mean, and higher efficiency than the trimean, but it has resistance to extreme observations or breakdown comparable with that of the trimean and better than the 15% trimmed mean. Combined quantile averages provide consistent estimates of an additive treatment effect in a matched randomized experiment. Sensitivity analyses are discussed for combined quantile averages when used in a matched observational study in which treatments are not randomly assigned. In a sensitivity analysis in an observational study, subjects are assumed to differ with respect to an unobserved covariate that was not adequately controlled by the matching, so that treatments are assigned within pairs with probabilities that are unequal and unknown. The sensitivity analysis proposed here uses significance levels, point estimates and confidence intervals based on combined quantile averages and examines how these inferences change under a range of assumptions about biases due to an unobserved covariate. The procedures are applied in the studies of minimum wage laws and exposures to lead. The first example is also used to illustrate sensitivity analysis with an instrumental variable.

Journal ArticleDOI
TL;DR: In this article, a frequency domain estimation approach for log-linear models that accounts for both overdispersion and autocorrelation was proposed to estimate the association between counts of mortality and the concentration of airborne particles in Philadelphia, USA.
Abstract: Motivated by a study of the association between counts of daily mortality and air pollution, we present a frequency domain estimation approach for log-linear models that accounts for both overdispersion and autocorrelation. The methods also allow for the discounting or downweighting of information at particular frequencies at which, for example, confounding variables are likely to have greatest influence. This allows flexible sensitivity analyses to be carried out to assess the possible effect of confounders on the estimated effect. We apply the methods to estimate the association between counts of mortality and the concentration of airborne particles in Philadelphia, USA, for the years 1974–1988. We obtain an estimated effect of particulate air pollution on mortality that is significantly greater than zero but less than that obtained by a standard log-linear analysis.

Journal ArticleDOI
TL;DR: In this paper, it is shown that dropout often reduces the efficiency of longitudinal experiments considerably, and a general, computationally simple method is provided, for designing longitudinal studies when dropout is to be expected, such that there is little risk of large losses of efficiency due to the missing data.
Abstract: It is shown that drop-out often reduces the efficiency of longitudinal experiments considerably. In the framework of linear mixed models, a general, computationally simple method is provided, for designing longitudinal studies when drop-out is to be expected, such that there is little risk of large losses of efficiency due to the missing data. All the results are extensively illustrated using data from a randomized experiment with rats.

Journal ArticleDOI
TL;DR: A nonparametric method based on the approach of Pettitt has been developed for testing the occurrence of a changed segment of a sequence and how to estimate the end points of it.
Abstract: Non-coding deoxyribonucleic acid (DNA) can typically be modelled by a sequence of Bernoulli random variables by coding one base, e.g. T, as 1 and other bases as 0. If a segment of a sequence is functionally important, the probability of a 1 will be different in this changed segment from that in the surrounding DNA. It is important to be able to see whether such a segment occurs in a particular DNA sequence and to pin-point it so that a molecular biologist can investigate its possible function. Here we discuss methods for testing the occurrence of such a changed segment and how to estimate the end points of it. Maximum-likelihood-based methods are not very tractable and so a nonparametric method based on the approach of Pettitt has been developed. The problem and its solution are illustrated by a specific DNA example.

Journal ArticleDOI
TL;DR: In this paper, the difference variance dispersion graph (DVDG) is introduced to help in the choice of a response surface design in the food technology domain, and the DVDG can be used in practice.
Abstract: Variance dispersion graphs have become a popular tool in aiding the choice of a response surface design. Often differences in response from some particular point, such as the expected position of the optimum or standard operating conditions, are more important than the response itself. We describe two examples from food technology. In the first, an experiment was conducted to find the levels of three factors which optimized the yield of valuable products enzymatically synthesized from sugars and to discover how the yield changed as the levels of the factors were changed from the optimum. In the second example, an experiment was conducted on a mixing process for pastry dough to discover how three factors affected a number of properties of the pastry, with a view to using these factors to control the process. We introduce the difference variance dispersion graph (DVDG) to help in the choice of a design in these circumstances. The DVDG for blocked designs is developed and the examples are used to show how the DVDG can be used in practice. In both examples a design was chosen by using the DVDG, as well as other properties, and the experiments were conducted and produced results that were useful to the experimenters. In both cases the conclusions were drawn partly by comparing responses at different points on the response surface.

Journal ArticleDOI
TL;DR: In this article, a statistical testing methodology for closely spaced spectral lines based on multitaper spectrum estimates was developed and applied to a time series of solar magnetic field magnitude data recorded by the Ulysses spacecraft.
Abstract: We develop statistical testing methodology for closely spaced spectral lines based on multitaper spectrum estimates and apply it to a time series of solar magnetic field magnitude data recorded by the Ulysses spacecraft. The test is formulated through complex-valued weighted least squares. Using this test in combination with the test for well-separated lines, we can accurately detect the solar equatorial rotation frequency of the sun, and its harmonics, from data recorded by Ulysses's magnetometer at solar mid-latitudes; this is a potentially important result for solar physicists, since it has implications for the structure and dynamics of magnetic fields in the solar corona.

Journal ArticleDOI
TL;DR: In this paper, the authors apply some log-linear modelling methods, which have been proposed for treating non-ignorable non-response, to some data on voting intention from the British General Election Survey.
Abstract: We apply some log-linear modelling methods, which have been proposed for treating non-ignorable non-response, to some data on voting intention from the British General Election Survey. We find that, although some non-ignorable non-response models fit the data very well, they may generate implausible point estimates and predictions. Some explanation is provided for the extreme behaviour of the maximum likelihood estimates for the most parsimonious model. We conclude that point estimates for such models must be treated with great caution. To allow for the uncertainty about the non-response mechanism we explore the use of profile likelihood inference and find the likelihood surfaces to be very flat and the interval estimates to be very wide. To reduce the width of these intervals we propose constraining confidence regions to values where the parameters governing the non-response mechanism are plausible and study the effect of such constraints on inference. We find that the widths of these intervals are reduced but remain wide.

Journal ArticleDOI
TL;DR: In this article, the authors developed a model for a data set of 15 variables measured on a set of 14,000 applications for unsecured personal loans, and the resulting global model of behaviour enabled them to identify several previously unsuspected relationships of considerable interest to the bank.
Abstract: A bank offering unsecured personal loans may be interested in several related outcome variables, including defaulting on the repayments, early repayment or failing to take up an offered loan. Current predictive models used by banks typically consider such variables individually. However, the fact that they are related to each other, and to many interrelated potential predictor variables, suggests that graphical models may provide an attractive alternative solution. We developed such a model for a data set of 15 variables measured on a set of 14000 applications for unsecured personal loans. The resulting global model of behaviour enabled us to identify several previously unsuspected relationships of considerable interest to the bank. For example, we discovered important but obscure relationships between taking out insurance, prior delinquency with a credit card and delinquency with the loan.

Journal ArticleDOI
TL;DR: This article investigated the properties of these methods in the context of a survey of the sexual behaviour of university students, and found that their results may be highly sensitive to the prior specification, which is in line with other studies on response bias in the reports of young people's sexual behaviour that suggest that the respondents overrepresent the sexually active.
Abstract: Summary. When tables contain missing values, statistical models that allow the non-response probability to be a function of the intended response have been proposed by several researchers. We investigate the properties of these methods in the context of a survey of the sexual behaviour of university students. Profile likelihoods can be computed, even when models are not identified and saturated profile likelihoods (making no assumptions about the non-response mechanism) are derived. Bayesian approaches are investigated and it is shown that their results may be highly sensitive to the prior specification. The proportion of responders answering 'yes' to the question 'have you ever had sexual intercourse?' was 73%. However, different assumptions about the nonresponders gave proportions as low as 46% or as high as 83%. Our preferred estimate, derived from the response-saturated profile likelihood, is 67% with a 95% confidence interval of 58-74%. This is in line with other studies on response bias in the reports of young people's sexual behaviour that suggest that the respondents overrepresent the sexually active.

Journal ArticleDOI
TL;DR: This work reanalyse data from a 5-year trial of two oral cholera vaccines in Matlab, Bangladesh and compares two approaches based on the Cox model in terms of their strategies for detecting time-varying vaccine effects, and their estimation techniques for obtaining a time-dependent RR(t) estimate.
Abstract: Summary. We consider the statistical evaluation and estimation of vaccine efficacy when the protective effect wanes with time. We reanalyse data from a 5-year trial of two oral cholera vaccines in Matlab, Bangladesh. In this field trial, one vaccine appears to confer better initial protection than the other, but neither appears to offer protection for a period longer than about 3 years. Timedependent vaccine effects are estimated by obtaining smooth estimates of a time-varying relative risk RR(t) using survival analysis. We compare two approaches based on the Cox model in terms of their strategies for detecting time-varying vaccine effects, and their estimation techniques for obtaining a time-dependent RR(t) estimate. These methods allow an exploration of time-varying vaccine effects while making minimal parametric assumptions about the functional form of RR(t) for vaccinated compared with unvaccinated subjects.

Journal ArticleDOI
TL;DR: In this paper, a method for describing the distribution of the prediction variance within the region R by using quantile plots was proposed, and the utility of these plots is illustrated with a four component fertilizer experiment that was initiated in Sao Paulo, Brazil.
Abstract: Summary. Vining and co-workers have used plots of the prediction variance trace (PVT) along the so-called prediction rays to compare mixture designs in a constrained region R. In the present paper, we propose a method for describing the distribution of the prediction variance within the region R by using quantile plots. More comprehensive comparisons between mixture designs are possible through the proposed plots than with the PVT plots. The utility of the quantile plots is illustrated with a four-component fertilizer experiment that was initiated in Sao Paulo, Brazil.

Journal ArticleDOI
TL;DR: In this article, the authors investigate the effect of human immunodeficiency virus status on the course of neurological impairment, conducted by the HIV Center at Columbia University, followed a cohort of HIV positive and negative gay men for 5 years and assessed the presence or absence of neurological impairments every 6 months.
Abstract: Summary. A study to investigate the effect of human immunodeficiency virus (HIV) status on the course of neurological impairment, conducted by the HIV Center at Columbia University, followed a cohort of HIV positive and negative gay men for 5 years and assessed the presence or absence of neurological impairment every 6 months. Almost half of the subjects dropped out before the end of the study for reasons that might have been related to the missing neurological data. We propose likelihood-based methods for analysing such binary longitudinal data under informative and noninformative drop-out. A transition model is assumed for the binary response, and several models for the drop-out processes are considered which are functions of the response variable (neurological impairment). The likelihood ratio test is used to compare models with informative and noninformative drop-out mechanisms. Using simulations, we investigate the percentage bias and meansquared error (MSE) of the parameter estimates in the transition model under various assumptions for the drop-out. We find evidence for informative drop-out in the study, and we illustrate that the bias and MSE for the parameters of the transition model are not directly related to the observed drop-out or missing data rates. The effect of HIV status on the neurological impairment is found to be statistically significant under each of the models considered for the drop-out, although the regression coefficient may be biased in certain cases. The presence and relative magnitude of the bias depend on factors such as the probability of drop-out conditional on the presence of neurological impairment and the prevalence of neurological impairment in the population under study.

Journal ArticleDOI
TL;DR: In this paper, the authors present models and algorithms for the stratigraphic analysis of earth core samples collected at archaeological sites to separate the occupation of the site into distinct periods, by dividing the earth core into well-defined blocks of uniform magnetic susceptibility.
Abstract: Summary. Models and algorithms are presented for the stratigraphic analysis of earth core samples collected at archaeological sites. The aim is to separate the occupation of the site into distinct periods, by dividing the earth core into well-defined blocks of uniform magnetic susceptibility. The models describe the response of detector equipment by using both a spread function and an error process, and they incorporate prior beliefs regarding the nature of the true susceptibility values. The prior parameters are estimated by using pseudolikelihood and the susceptibilities by maximum a posteriori methods via the one-step-late algorithm. These procedures are illustrated with data from synthetic and real core specimens. The new procedures prove to be far superior to other approaches, producing reconstructions which clearly show distinct periods of uniform magnetic susceptibility separated by sharp discontinuities.

Journal ArticleDOI
TL;DR: Changes in survival rates during 1940–1992 for patients with Hodgkin's disease are studied by using population‐based data to identify when the breakthrough in clinical trials of chemotherapy treatments started to increase population survival rates, and how long it took for the increase to level off, indicating that the full population effect of the breakthrough had been realized.
Abstract: Summary. Changes in survival rates during 1940-1992 for patients with Hodgkin's disease are studied by using population-based data. The aim of the analysis is to identify when the breakthrough in clinical trials of chemotherapy treatments started to increase population survival rates, and to find how long it took for the increase to level off, indicating that the full population effect of the breakthrough had been realized. A Weibull relative survival model is used because the model parameters are easily interpretable when assessing the effect of advances in clinical trials. However, the methods apply to any relative survival model that falls within the generalized linear models framework. The model is fitted by using modifications of existing software (SAS, GLIM) and profile likelihood methods. The results are similar to those from a cause-specific analysis of the data by Feuer and co-workers. Survival started to improve around the time that a major chemotherapy breakthrough (nitrogen mustard, Oncovin, prednisone and procarbazine) was publicized in the mid1 960s but did not level off for 1 1 years. For the analysis of data where the cause of death is obtained from death certificates, the relative survival approach has the advantage of providing the necessary adjustment for expected mortality from causes other than the disease without requiring information on the causes of death.

Journal ArticleDOI
TL;DR: Stochastic models based on Markov birth processes are constructed to describe the process of invasion of a fly larva by entomopathogenic nematodes, with their precise form leading to different patterns of invasion being identified for three populations of nematode considered.
Abstract: Stochastic models based on Markov birth processes are constructed to describe the process of invasion of a fly larva by entomopathogenic nematodes. Various forms for the birth (invasion) rates are proposed. These models are then fitted to data sets describing the observed numbers of nematodes that have invaded a fly larval after a fixed period of time. Non-linear birthrates are required to achieve good fits to these data, with their precise form leading to different patterns of invasion being identified for three populations of nematodes considered. One of these (Nemasys) showed the greatest propensity for invasion. This form of modelling may be useful more generally for analysing data that show variation which is different from that expected from a binomial distribution.