Showing papers in "Journal of The Royal Statistical Society Series C-applied Statistics in 1999"

PDF

Open Access

Journal Article•DOI•

The Analysis of Designed Experiments and Longitudinal Data by Using Smoothing Splines

[...]

Arũnas P. Verbyla¹, Brian R. Cullis, Michael G. Kenward², S. J. Welham•Institutions (2)

South Australian Research and Development Institute¹, University of Kent²

01 Jan 1999-Journal of The Royal Statistical Society Series C-applied Statistics

TL;DR: In this paper, the cubic smoothing spline is used in conjunction with fixed and random effects, random coefficients and variance modelling to provide simultaneous modelling of trends and covariance structure, which allows coherent and flexible empirical model building in complex situations.

...read moreread less

Abstract: In designed experiments and in particular longitudinal studies, the aim may be to assess the effect of a quantitative variable such as time on treatment effects. Modelling treatment effects can be complex in the presence of other sources of variation. Three examples are presented to illustrate an approach to analysis in such cases. The first example is a longitudinal experiment on the growth of cows under a factorial treatment structure where serial correlation and variance heterogeneity complicate the analysis. The second example involves the calibration of optical density and the concentration of a protein DNase in the presence of sampling variation and variance heterogeneity. The final example is a multienvironment agricultural field experiment in which a yield-seeding rate relationship is required for several varieties of lupins. Spatial variation within environments, heterogeneity between environments and variation between varieties all need to be incorporated in the analysis. In this paper, the cubic smoothing spline is used in conjunction with fixed and random effects, random coefficients and variance modelling to provide simultaneous modelling of trends and covariance structure. The key result that allows coherent and flexible empirical model building in complex situations is the linear mixed model representation of the cubic smoothing spline. An extension is proposed in which trend is partitioned into smooth and nonsmooth components. Estimation and inference, the analysis of the three examples and a discussion of extensions and unresolved issues are also presented.

...read moreread less

594 citations

Journal Article•DOI•

A non‐homogeneous hidden Markov model for precipitation occurrence

[...]

James P. Hughes¹, Peter Guttorp¹, Stephen P. Charles²•Institutions (2)

University of Washington¹, Commonwealth Scientific and Industrial Research Organisation²

01 Jan 1999-Journal of The Royal Statistical Society Series C-applied Statistics

TL;DR: In this article, a non-homogeneous hidden Markov model is proposed for relating precipitation occurrences at multiple rain-gauge stations to broad scale atmospheric circulation patterns (the socalled "downscaling problem").

...read moreread less

Abstract: Summary. A non-homogeneous hidden Markov model is proposed for relating precipitation occurrences at multiple rain-gauge stations to broad scale atmospheric circulation patterns (the socalled 'downscaling problem'). We model a 15-year sequence of winter data from 30 rain stations in south-western Australia. The first 10 years of data are used for model development and the remaining 5 years are used for model evaluation. The fitted model accurately reproduces the observed rainfall statistics in the reserved data despite a shift in atmospheric circulation (and, consequently, rainfall) between the two periods. The fitted model also provides some useful insights into the processes driving rainfall in this region.

...read moreread less

417 citations

Journal Article•DOI•

Parameter estimation in large dynamic paired comparison experiments

[...]

Mark E. Glickman¹•Institutions (1)

Boston University¹

01 Jan 1999-Journal of The Royal Statistical Society Series C-applied Statistics

TL;DR: In this paper, a non-iterative algorithm for fitting a particular dynamic paired comparison model was proposed, which improves over the commonly used algorithm of Elo by incorporating the variability in parameter estimates, can be performed regularly even for large populations of competitors.

...read moreread less

Abstract: Summary. Paired comparison data in which the abilities or merits of the objects being compared may be changing over time can be modelled as a non-linear state space model. When the population of objects being compared is large, likelihood-based analyses can be too computationally cumbersome to carry out regularly. This presents a problem for rating populations of chess players and other large groups which often consist of tens of thousands of competitors. This problem is overcome through a computationally simple non-iterative algorithm for fitting a particular dynamic paired comparison model. The algorithm, which improves over the commonly used algorithm of Elo by incorporating the variability in parameter estimates, can be performed regularly even for large populations of competitors. The method is evaluated on simulated data and is applied to ranking the best chess players of all time, and to ranking the top current tennis-players.

...read moreread less

347 citations

Journal Article•DOI•

The Use of Resampling Methods to Simplify Regression Models in Medical Statistics

[...]

Willi Sauerbrei¹•Institutions (1)

University of Freiburg¹

01 Jan 1999-Journal of The Royal Statistical Society Series C-applied Statistics

TL;DR: The problems of replication stability, model complexity, selection bias and an overoptimistic estimate of the predictive value of a model are discussed together with several proposals based on resampling methods, which favour greater simplicity of the final regression model.

...read moreread less

Abstract: Summary. The number of variables in a regression model is often too large and a more parsimonious model may be preferred. Selection strategies (e.g. all-subset selection with various penalties for model complexity, or stepwise procedures) are widely used, but there are few analytical results about their properties. The problems of replication stability, model complexity, selection bias and an overoptimistic estimate of the predictive value of a model are discussed together with several proposals based on resampling methods. The methods are applied to data from a case-control study on atopic dermatitis and a clinical trial to compare two chemotherapy regimes by using a logistic regression and a Cox model. A recent proposal to use shrinkage factors to reduce the bias of parameter estimates caused by model building is extended to parameterwise shrinkage factors and is discussed as a further possibility to illustrate problems of models which are too complex. The results from the resampling approaches favour greater simplicity of the final regression model.

...read moreread less

294 citations

Journal Article•DOI•

Design and analysis of two‐phase studies with binary outcome applied to Wilms tumour prognosis

[...]

N. E. Breslow¹, N. Chatterjee¹•Institutions (1)

University of Washington¹

01 Jan 1999-Journal of The Royal Statistical Society Series C-applied Statistics

TL;DR: In this paper, the authors demonstrate the practical importance of these design and analysis principles by an analysis of, and simulations based on, data from the US National Wilms Tumor Study.

...read moreread less

Abstract: Two-phase stratified sampling is used to select subjects for the collection of additional data, e.g. validation data in measurement error problems. Stratification jointly by outcome and covariates, with sampling fractions chosen to achieve approximately equal numbers per stratum at the second phase of sampling, enhances efficiency compared with stratification based on the outcome or covariates alone. Nonparametric maximum likelihood may result in substantially more efficient estimates of logistic regression coefficients than weighted or pseudolikelihood procedures. Software to implement all three procedures is available. We demonstrate the practical importance of these design and analysis principles by an analysis of, and simulations based on, data from the US National Wilms Tumor Study.

...read moreread less

222 citations

Journal Article•DOI•

Multilevel Modelling of the Geographical Distributions of Diseases

[...]

Ian H. Langford¹, Alistair H. Leyland¹, Jon Rasbash¹, Harvey Goldstein¹•Institutions (1)

Institute of Education¹

01 Jan 1999-Journal of The Royal Statistical Society Series C-applied Statistics

TL;DR: A multiple-cause model is developed in which deaths from cancer and cardiovascular disease in Glasgow are examined simultaneously in a spatial model in which spatial autocorrelation between residuals is examined.

...read moreread less

Abstract: Summary. Multilevel modelling is used on problems arising from the analysis of spatially distributed health data. We use three applications to demonstrate the use of multilevel modelling in this area. The first concerns small area all-cause mortality rates from Glasgow where spatial autocorrelation between residuals is examined. The second analysis is of prostate cancer cases in Scottish counties where we use a range of models to examine whether the incidence is higher in more rural areas. The third develops a multiple-cause model in which deaths from cancer and cardiovascular disease in Glasgow are examined simultaneously in a spatial model. We discuss some of the issues surrounding the use of complex spatial models and the potential for future developments.

...read moreread less

179 citations

Journal Article•DOI•

Semiparametric estimation of regression quantiles with application to standardizing weight for height and age in US children

[...]

Patrick J. Heagerty¹, Margaret S. Pepe²•Institutions (2)

University of Washington¹, Fred Hutchinson Cancer Research Center²

01 Jan 1999-Journal of The Royal Statistical Society Series C-applied Statistics

TL;DR: In this article, a semiparametric method is proposed to estimate the mean and variance of regression quantiles from cross-sectional or longitudinal data for females under 3 years of age.

...read moreread less

Abstract: Summary. The appropriate interpretation of measurements often requires standardization for concomitant factors. For example, standardization of weight for both height and age is important in obesity research and in failure-to-thrive research in children. Regression quantiles from a reference population afford one intuitive and popular approach to standardization. Current methods for the estimation of regression quantiles can be classified as nonparametric with respect to distributional assumptions or as fully parametric. We propose a semiparametric method where we model the mean and variance as flexible regression spline functions and allow the unspecified distribution to vary smoothly as a function of covariates. Similarly to Cole and Green, our approach provides separate estimates and summaries for location, scale and distribution. However, similarly to Koenker and Bassett, we do not assume any parametric form for the distribution. Estimation for either cross-sectional or longitudinal samples is obtained by using estimating equations for the location and scale functions and through local kernel smoothing of the empirical distribution function for standardized residuals. Using this technique with data on weight, height and age for females under 3 years of age, we find that there is a close relationship between quantiles of weight for height and age and quantiles of body mass index (BMI=weight/height2) for age in this cohort.

...read moreread less

103 citations

Journal Article•DOI•

Venezuelan rainfall data analysed by using a Bayesian space-time model

[...]

Bruno Sansó¹, Lelys Bravo de Guenni¹•Institutions (1)

Simón Bolívar University¹

01 Jan 1999-Journal of The Royal Statistical Society Series C-applied Statistics

TL;DR: In this article, the authors considered a set of data from 80 stations in the Venezuelan state of Guarico consisting of accumulated monthly rainfall in a time span of 16 years and considered a model based on a full second degree polynomial over the spatial co-ordinates as well as the first two Fourier harmonics to describe the variability during the year.

...read moreread less

Abstract: We consider a set of data from 80 stations in the Venezuelan state of Guarico consisting of accumulated monthly rainfall in a time span of 16 years. The problem of modelling rainfall accumulated over fixed periods of time and recorded at meteorological stations at different sites is studied by using a model based on the assumption that the data follow a truncated and transformed multivariate normal distribution. The spatial correlation is modelled by using an exponentially decreasing correlation function and an interpolating surface for the means. Missing data and dry periods are handled within a Markov chain Monte Carlo framework using latent variables. We estimate the amount of rainfall as well as the probability of a dry period by using the predictive density of the data. We considered a model based on a full second-degree polynomial over the spatial co-ordinates as well as the first two Fourier harmonics to describe the variability during the year. Predictive inferences on the data show very realistic results, capturing the typical rainfall variability in time and space for that region. Important extensions of the model are also discussed.

...read moreread less

90 citations

Journal Article•DOI•

Analysis of distance for structured multivariate data and extensions to multivariate analysis of variance

[...]

John C. Gower¹, Wojtek J. Krzanowski²•Institutions (2)

Open University¹, University of Exeter²

01 Jan 1999-Journal of The Royal Statistical Society Series C-applied Statistics

TL;DR: In this paper, the authors examine the structure of distance matrices in the presence of a priori grouping of units and show how the total squared distance among the units of a multivariate data set can be partitioned according to the factors of an external classification.

...read moreread less

Abstract: are not consonant with MANOVA assumptions. One particular such data set from economics is described. This set has a 24 factorial design with eight variables measured on each individual, but the application of MANOVA seems inadvisable given the highly skewed nature of the data. To establish a basis for analysis, we examine the structure of distance matrices in the presence of a priori grouping of units and show how the total squared distance among the units of a multivariate data set can be partitioned according to the factors of an external classification. The partitioning is exactly analogous to that in the univariate analysis of variance. It therefore provides a framework for the analysis of any data set whose structure conforms to that of MANOVA, but which for various reasons cannot be analysed by this technique. Descriptive aspects of the technique are considered in detail, and inferential questions are tackled via randomization tests. This approach provides a satisfactory analysis of the economics data.

...read moreread less

76 citations

Journal Article•DOI•

On the use of corrections for overdispersion

[...]

James K. Lindsey

01 Jan 1999-Journal of The Royal Statistical Society Series C-applied Statistics

TL;DR: In this paper, a model for estimating the number of grouse reaching maturity in the high eastern region of Belgium is presented. But the model does not consider the effect of habitat factors, such as changes in methods of rabies control, the activity of poachers and evolution of the plant habitat in the region over the observation period.

...read moreread less

Abstract: Blackgrouse (Tetrao tetrix) in the high eastern region of Belgium form a very small population that is close to extinction. Biologists have followed them closely for many years. The numbers of cocks on their mating grounds are counted each spring, with data available for 30 years. In modelling this population, we are interested to see whether we can obtain an adequate model for the available data by using only climatic information. If a population is in an ecologically viable equilibrium, it should be able to adjust to a changing habitat; short-term variations in weather should be the only major influence on the size of the population. This can have an immediate effect on the survival of the young, but also a more delayed effect through the numbers of grouse reaching maturation. To account for the latter, population levels in the previous 2 years are included in the model because the cocks take 2 years to reach maturity. We know that habitat factors, especially variations in the number of foxes as influenced by changes in methods of rabies control, the activity of poachers and evolution of the plant habitat in the region over the observation period, have an effect. However, the question is not whether such variables are missing; we know that they are. The question concerns the adequacy and appropriateness of a climatic model for describing the observed variations. For a further discussion of these assumptions, and details on the models and the conclusions, see

...read moreread less

72 citations

Journal Article•DOI•

Fitting Markov chain models to discrete state series such as DNA sequences

[...]

Peter Avery¹, Daniel A. Henderson¹•Institutions (1)

Newcastle University¹

01 Jan 1999-Journal of The Royal Statistical Society Series C-applied Statistics

TL;DR: In this article, the analysis of discrete state series such as DNA sequences can be modelled by Markov chains and the analysis is discussed in the context of log-linear models.

...read moreread less

Abstract: Discrete state series such as DNA sequences can often be modelled by Markov chains. The analysis of such series is discussed in the context of log-linear models. The data produce contingency tables with similar margins due to the dependence of the observations. However, despite the unusual structure of the tables, the analysis is equivalent to that for data from multinomial sampling. The reason why the standard number of degrees of freedom is correct is explained by using theoretical arguments and the asymptotic distribution of the deviance is verified empirically. Problems involved with fitting high order Markov chain models, such as reduced power and computational expense, are also discussed.

...read moreread less

Journal Article•DOI•

The effect of non-stationarity on extreme sea-level estimation

[...]

Mark J. Dixon¹, Jonathan A. Tawn²•Institutions (2)

University of Newcastle¹, Lancaster University²

01 Jan 1999-Journal of The Royal Statistical Society Series C-applied Statistics

TL;DR: In this article, the effect of the non-stationarity of the sea level on the estimation of extreme sea-level distributions is assessed and compared with a recently proposed alternative that incorporates the knowledge of the tidal component and its associated interactions, by applying them to 22 UK data sites.

...read moreread less

Abstract: The sea-level is the composition of astronomical tidal and meteorological surge processes. It exhibits temporal non-stationarity due to a combination of long-term trend in the mean level, the deterministic tidal component, surge seasonality and interactions between the tide and surge. We assess the effect of these non-stationarities on the estimation of the distribution of extreme sea-levels. This is important for coastal flood assessment as the traditional method of analysis assumes that, once the trend has been removed, extreme sea-levels are from a stationary sequence. We compare the traditional approach with a recently proposed alternative that incorporates the knowledge of the tidal component and its associated interactions, by applying them to 22 UK data sites and through a simulation study. Our main finding is that if the tidal non-stationarity is ignored then a substantial underestimation of extreme sea-levels results for most sites. In contrast, if surge seasonality and the tide–surge interaction are not modelled the traditional approach produces little additional bias. The alternative method is found to perform well but requires substantially more statistical modelling and better data quality.

...read moreread less

Journal Article•DOI•

Extremal analysis of short series with outliers: sea‐levels and athletics records

[...]

M. I. Barão¹, Jonathan A. Tawn²•Institutions (2)

University of Lisbon¹, Lancaster University²

01 Jan 1999-Journal of The Royal Statistical Society Series C-applied Statistics

TL;DR: In this article, the analysis of extreme values is often required from short series which are biasedly sampled or contain outliers, such as data for sea-levels at two UK east coast sites and data on athletics records for women's 3000 m track races.

...read moreread less

Abstract: The analysis of extreme values is often required from short series which are biasedly sampled or contain outliers. Data for sea-levels at two UK east coast sites and data on athletics records for women's 3000 m track races are shown to exhibit such characteristics. Univariate extreme value methods provide a poor quantification of the extreme values for these data. By using bivariate extreme value methods we analyse jointly these data with related observations, from neighbouring coastal sites and 1500 m races respectively. We show that using bivariate methods provides substantial benefits, both in these applications and more generally with the amount of information gained being determined by the degree of dependence, the lengths and the amount of overlap of the two series, the homogeneity of the marginal characteristics of the variables and the presence and type of the outlier.

...read moreread less

Journal Article•DOI•

Analysing Financial Returns by Using Regression Models Based on Non-Symmetric Stable Distributions

[...]

Philippe Lambert¹, James K. Lindsey¹•Institutions (1)

University of Liège¹

01 Jan 1999-Journal of The Royal Statistical Society Series C-applied Statistics

TL;DR: In this paper, the daily evolution of the price of Abbey National shares over a 10-week period is analysed by using regression models based on possibly non-symmetric stable distributions, which can be used in practice for interactive modelling of heavy-tailed processes.

...read moreread less

Abstract: The daily evolution of the price of Abbey National shares over a 10-week period is analysed by using regression models based on possibly non-symmetric stable distributions. These distributions, which are only known through their characteristic function, can be used in practice for interactive modelling of heavy-tailed processes. A regression model for the location parameter is proposed and shown to induce a similar model for the mode. Finally, regression models for the other three parameters of the stable distribution are introduced. The model found to fit best allows the skewness of the distribution, rather than the location or scale parameters, to vary over time. The most likely share return is thus changing over time although the region where most returns are observed is stationary.

...read moreread less

Journal Article•DOI•

The Effectiveness of Risk Scores: the Logit Rank Plot

[...]

John B. Copas¹•Institutions (1)

University of Warwick¹

01 Jan 1999-Journal of The Royal Statistical Society Series C-applied Statistics

TL;DR: In this article, the authors suggest the logit rank plot as a good way of summarizing the effectiveness of a risk score, where the slope of this plot gives an overall measure of effectiveness.

...read moreread less

Abstract: Summary. A risk score s for event E is a function of covariates with the property that P(EIs) is an increasing function of s. Motivated by applications in medicine and in criminology, we suggest the logit rank plot as a good way of summarizing the effectiveness of such a score. Explicitly, plot logit{P(EIs)} against logit(r), where r is the proportional rank of s in a sample or population. The slope of this plot gives an overall measure of effectiveness, and the logit rank transformation provides a common basis on which different risk scores can be compared. Some practical and theoretical aspects are discussed.

...read moreread less

Journal Article•DOI•

Using quantile averages in matched observational studies

[...]

Paul R. Rosenbaum¹•Institutions (1)

University of Pennsylvania¹

01 Jan 1999-Journal of The Royal Statistical Society Series C-applied Statistics

TL;DR: In this paper, the authors examined the sensitivity of additive treatment effect to hidden bias in two observational studies, one investigating the effects of minimum wage laws on employment and the other of exposure to lead.

...read moreread less

Abstract: In two observational studies, one investigating the effects of minimum wage laws on employment and the other of the effects of exposures to lead, an estimated treatment effect's sensitivity to hidden bias is examined. The estimate uses the combined quantile averages that were introduced in 1981 by B. M. Brown as simple, efficient, robust estimates of location admitting both exact and approximate confidence intervals and significance tests. Closely related to Gastwirth's estimate and Tukey's trimean, the combined quantile average has asymptotic efficiency for normal data that is comparable with that of a 15% trimmed mean, and higher efficiency than the trimean, but it has resistance to extreme observations or breakdown comparable with that of the trimean and better than the 15% trimmed mean. Combined quantile averages provide consistent estimates of an additive treatment effect in a matched randomized experiment. Sensitivity analyses are discussed for combined quantile averages when used in a matched observational study in which treatments are not randomly assigned. In a sensitivity analysis in an observational study, subjects are assumed to differ with respect to an unobserved covariate that was not adequately controlled by the matching, so that treatments are assigned within pairs with probabilities that are unequal and unknown. The sensitivity analysis proposed here uses significance levels, point estimates and confidence intervals based on combined quantile averages and examines how these inferences change under a range of assumptions about biases due to an unobserved covariate. The procedures are applied in the studies of minimum wage laws and exposures to lead. The first example is also used to illustrate sensitivity analysis with an instrumental variable.

...read moreread less

Journal Article•DOI•

Frequency domain log-linear models, air pollution and mortality.

[...]

Julia E. Kelsall¹, Scott L. Zeger², Jonathan M. Samet²•Institutions (2)

Lancaster University¹, Johns Hopkins University²

01 Jan 1999-Journal of The Royal Statistical Society Series C-applied Statistics

TL;DR: In this article, a frequency domain estimation approach for log-linear models that accounts for both overdispersion and autocorrelation was proposed to estimate the association between counts of mortality and the concentration of airborne particles in Philadelphia, USA.

...read moreread less

Abstract: Motivated by a study of the association between counts of daily mortality and air pollution, we present a frequency domain estimation approach for log-linear models that accounts for both overdispersion and autocorrelation. The methods also allow for the discounting or downweighting of information at particular frequencies at which, for example, confounding variables are likely to have greatest influence. This allows flexible sensitivity analyses to be carried out to assess the possible effect of confounders on the estimated effect. We apply the methods to estimate the association between counts of mortality and the concentration of airborne particles in Philadelphia, USA, for the years 1974–1988. We obtain an estimated effect of particulate air pollution on mortality that is significantly greater than zero but less than that obtained by a standard log-linear analysis.

...read moreread less

Journal Article•DOI•

The effect of drop-out on the efficiency of longitudinal experiments

[...]

Geert Verbeke¹, Emmanuel Lesaffre¹•Institutions (1)

Catholic University of Leuven¹

01 Jan 1999-Journal of The Royal Statistical Society Series C-applied Statistics

TL;DR: In this paper, it is shown that dropout often reduces the efficiency of longitudinal experiments considerably, and a general, computationally simple method is provided, for designing longitudinal studies when dropout is to be expected, such that there is little risk of large losses of efficiency due to the missing data.

...read moreread less

Abstract: It is shown that drop-out often reduces the efficiency of longitudinal experiments considerably. In the framework of linear mixed models, a general, computationally simple method is provided, for designing longitudinal studies when drop-out is to be expected, such that there is little risk of large losses of efficiency due to the missing data. All the results are extensively illustrated using data from a randomized experiment with rats.

...read moreread less

Journal Article•DOI•

Detecting a changed segment in DNA sequences

[...]

Peter Avery¹, Daniel A. Henderson¹•Institutions (1)

University of Newcastle¹

01 Jan 1999-Journal of The Royal Statistical Society Series C-applied Statistics

TL;DR: A nonparametric method based on the approach of Pettitt has been developed for testing the occurrence of a changed segment of a sequence and how to estimate the end points of it.

...read moreread less

Abstract: Non-coding deoxyribonucleic acid (DNA) can typically be modelled by a sequence of Bernoulli random variables by coding one base, e.g. T, as 1 and other bases as 0. If a segment of a sequence is functionally important, the probability of a 1 will be different in this changed segment from that in the surrounding DNA. It is important to be able to see whether such a segment occurs in a particular DNA sequence and to pin-point it so that a molecular biologist can investigate its possible function. Here we discuss methods for testing the occurrence of such a changed segment and how to estimate the end points of it. Maximum-likelihood-based methods are not very tractable and so a nonparametric method based on the approach of Pettitt has been developed. The problem and its solution are illustrated by a specific DNA example.

...read moreread less

Journal Article•DOI•

Difference variance dispersion graphs for comparing response surface designs with applications in food technology

[...]

Luzia A. Trinca¹, Steve Gilmour²•Institutions (2)

Sao Paulo State University¹, University of Reading²

01 Jan 1999-Journal of The Royal Statistical Society Series C-applied Statistics

TL;DR: In this paper, the difference variance dispersion graph (DVDG) is introduced to help in the choice of a response surface design in the food technology domain, and the DVDG can be used in practice.

...read moreread less

Abstract: Variance dispersion graphs have become a popular tool in aiding the choice of a response surface design. Often differences in response from some particular point, such as the expected position of the optimum or standard operating conditions, are more important than the response itself. We describe two examples from food technology. In the first, an experiment was conducted to find the levels of three factors which optimized the yield of valuable products enzymatically synthesized from sugars and to discover how the yield changed as the levels of the factors were changed from the optimum. In the second example, an experiment was conducted on a mixing process for pastry dough to discover how three factors affected a number of properties of the pastry, with a view to using these factors to control the process. We introduce the difference variance dispersion graph (DVDG) to help in the choice of a design in these circumstances. The DVDG for blocked designs is developed and the examples are used to show how the DVDG can be used in practice. In both examples a design was chosen by using the DVDG, as well as other properties, and the experiments were conducted and produced results that were useful to the experimenters. In both cases the conclusions were drawn partly by comparing responses at different points on the response surface.

...read moreread less

Journal Article•DOI•

Multitaper testing of spectral lines and the detection of the solar rotation frequency and its harmonics

[...]

David G. T. Denison¹, Andrew T. Walden¹, André Balogh¹, R. J. Forsyth¹•Institutions (1)

Imperial College London¹

01 Jan 1999-Journal of The Royal Statistical Society Series C-applied Statistics

TL;DR: In this article, a statistical testing methodology for closely spaced spectral lines based on multitaper spectrum estimates was developed and applied to a time series of solar magnetic field magnitude data recorded by the Ulysses spacecraft.

...read moreread less

Abstract: We develop statistical testing methodology for closely spaced spectral lines based on multitaper spectrum estimates and apply it to a time series of solar magnetic field magnitude data recorded by the Ulysses spacecraft. The test is formulated through complex-valued weighted least squares. Using this test in combination with the test for well-separated lines, we can accurately detect the solar equatorial rotation frequency of the sun, and its harmonics, from data recorded by Ulysses's magnetometer at solar mid-latitudes; this is a potentially important result for solar physicists, since it has implications for the structure and dynamics of magnetic fields in the solar corona.

...read moreread less

Journal Article•DOI•

Allowing for non‐ignorable non‐response in the analysis of voting intention data

[...]

Peter K. Smith¹, Chris J. Skinner¹, Paul Clarke¹•Institutions (1)

University of Southampton¹

01 Jan 1999-Journal of The Royal Statistical Society Series C-applied Statistics

TL;DR: In this paper, the authors apply some log-linear modelling methods, which have been proposed for treating non-ignorable non-response, to some data on voting intention from the British General Election Survey.

...read moreread less

Abstract: We apply some log-linear modelling methods, which have been proposed for treating non-ignorable non-response, to some data on voting intention from the British General Election Survey. We find that, although some non-ignorable non-response models fit the data very well, they may generate implausible point estimates and predictions. Some explanation is provided for the extreme behaviour of the maximum likelihood estimates for the most parsimonious model. We conclude that point estimates for such models must be treated with great caution. To allow for the uncertainty about the non-response mechanism we explore the use of profile likelihood inference and find the likelihood surfaces to be very flat and the interval estimates to be very wide. To reduce the width of these intervals we propose constraining confidence regions to values where the parameters governing the non-response mechanism are plausible and study the effect of such constraints on inference. We find that the widths of these intervals are reduced but remain wide.

...read moreread less

Journal Article•DOI•

A discrete variable chain graph for applicants for credit

[...]

Elena Stanghellini¹, Kevin McConway², David J. Hand²•Institutions (2)

University of Perugia¹, Open University²

01 Jan 1999-Journal of The Royal Statistical Society Series C-applied Statistics

TL;DR: In this article, the authors developed a model for a data set of 15 variables measured on a set of 14,000 applications for unsecured personal loans, and the resulting global model of behaviour enabled them to identify several previously unsuspected relationships of considerable interest to the bank.

...read moreread less

Abstract: A bank offering unsecured personal loans may be interested in several related outcome variables, including defaulting on the repayments, early repayment or failing to take up an offered loan. Current predictive models used by banks typically consider such variables individually. However, the fact that they are related to each other, and to many interrelated potential predictor variables, suggests that graphical models may provide an attractive alternative solution. We developed such a model for a data set of 15 variables measured on a set of 14000 applications for unsecured personal loans. The resulting global model of behaviour enabled us to identify several previously unsuspected relationships of considerable interest to the bank. For example, we discovered important but obscure relationships between taking out insurance, prior delinquency with a credit card and delinquency with the loan.

...read moreread less

Journal Article•DOI•

Information on sexual behaviour when some data are missing

[...]

Gillian M. Raab¹, Christl A. Donnelly²•Institutions (2)

Edinburgh Napier University¹, University of Oxford²

01 Jan 1999-Journal of The Royal Statistical Society Series C-applied Statistics

TL;DR: This article investigated the properties of these methods in the context of a survey of the sexual behaviour of university students, and found that their results may be highly sensitive to the prior specification, which is in line with other studies on response bias in the reports of young people's sexual behaviour that suggest that the respondents overrepresent the sexually active.

...read moreread less

Abstract: Summary. When tables contain missing values, statistical models that allow the non-response probability to be a function of the intended response have been proposed by several researchers. We investigate the properties of these methods in the context of a survey of the sexual behaviour of university students. Profile likelihoods can be computed, even when models are not identified and saturated profile likelihoods (making no assumptions about the non-response mechanism) are derived. Bayesian approaches are investigated and it is shown that their results may be highly sensitive to the prior specification. The proportion of responders answering 'yes' to the question 'have you ever had sexual intercourse?' was 73%. However, different assumptions about the nonresponders gave proportions as low as 46% or as high as 83%. Our preferred estimate, derived from the response-saturated profile likelihood, is 67% with a 95% confidence interval of 58-74%. This is in line with other studies on response bias in the reports of young people's sexual behaviour that suggest that the respondents overrepresent the sexually active.

...read moreread less

Journal Article•DOI•

Comparison of Two Smoothing Methods for Exploring Waning Vaccine Effects

[...]

L. Kathryn Durham¹, M. Elizabeth Halloran², Ira M. Longini², Amita K. Manatunga²•Institutions (2)

Pfizer¹, Emory University²

01 Jan 1999-Journal of The Royal Statistical Society Series C-applied Statistics

TL;DR: This work reanalyse data from a 5-year trial of two oral cholera vaccines in Matlab, Bangladesh and compares two approaches based on the Cox model in terms of their strategies for detecting time-varying vaccine effects, and their estimation techniques for obtaining a time-dependent RR(t) estimate.

...read moreread less

Abstract: Summary. We consider the statistical evaluation and estimation of vaccine efficacy when the protective effect wanes with time. We reanalyse data from a 5-year trial of two oral cholera vaccines in Matlab, Bangladesh. In this field trial, one vaccine appears to confer better initial protection than the other, but neither appears to offer protection for a period longer than about 3 years. Timedependent vaccine effects are estimated by obtaining smooth estimates of a time-varying relative risk RR(t) using survival analysis. We compare two approaches based on the Cox model in terms of their strategies for detecting time-varying vaccine effects, and their estimation techniques for obtaining a time-dependent RR(t) estimate. These methods allow an exploration of time-varying vaccine effects while making minimal parametric assumptions about the functional form of RR(t) for vaccinated compared with unvaccinated subjects.

...read moreread less

Journal Article•DOI•

Using quantile plots of the prediction variance for comparing designs for a constrained mixture region: an application involving a fertilizer experiment

[...]

A. I. Khuri¹, Jay M. Harrison¹, J. A. Cornell¹•Institutions (1)

University of Florida¹

01 Jan 1999-Journal of The Royal Statistical Society Series C-applied Statistics

TL;DR: In this paper, a method for describing the distribution of the prediction variance within the region R by using quantile plots was proposed, and the utility of these plots is illustrated with a four component fertilizer experiment that was initiated in Sao Paulo, Brazil.

...read moreread less

Abstract: Summary. Vining and co-workers have used plots of the prediction variance trace (PVT) along the so-called prediction rays to compare mixture designs in a constrained region R. In the present paper, we propose a method for describing the distribution of the prediction variance within the region R by using quantile plots. More comprehensive comparisons between mixture designs are possible through the proposed plots than with the PVT plots. The utility of the quantile plots is illustrated with a four-component fertilizer experiment that was initiated in Sao Paulo, Brazil.

...read moreread less

Journal Article•DOI•

Influence of human immunodeficiency virus infection on neurological impairment: an analysis of longitudinal binary data with informative drop‐out

[...]

Xinhua Liu¹, Christine Waternaux¹, Eva Petkova¹•Institutions (1)

University of York¹

01 Jan 1999-Journal of The Royal Statistical Society Series C-applied Statistics

TL;DR: In this article, the authors investigate the effect of human immunodeficiency virus status on the course of neurological impairment, conducted by the HIV Center at Columbia University, followed a cohort of HIV positive and negative gay men for 5 years and assessed the presence or absence of neurological impairments every 6 months.

...read moreread less

Abstract: Summary. A study to investigate the effect of human immunodeficiency virus (HIV) status on the course of neurological impairment, conducted by the HIV Center at Columbia University, followed a cohort of HIV positive and negative gay men for 5 years and assessed the presence or absence of neurological impairment every 6 months. Almost half of the subjects dropped out before the end of the study for reasons that might have been related to the missing neurological data. We propose likelihood-based methods for analysing such binary longitudinal data under informative and noninformative drop-out. A transition model is assumed for the binary response, and several models for the drop-out processes are considered which are functions of the response variable (neurological impairment). The likelihood ratio test is used to compare models with informative and noninformative drop-out mechanisms. Using simulations, we investigate the percentage bias and meansquared error (MSE) of the parameter estimates in the transition model under various assumptions for the drop-out. We find evidence for informative drop-out in the study, and we illustrate that the bias and MSE for the parameters of the transition model are not directly related to the observed drop-out or missing data rates. The effect of HIV status on the neurological impairment is found to be statistically significant under each of the models considered for the drop-out, although the regression coefficient may be biased in certain cases. The presence and relative magnitude of the bias depend on factors such as the probability of drop-out conditional on the presence of neurological impairment and the prevalence of neurological impairment in the population under study.

...read moreread less

Journal Article•DOI•

Empirical Bayes estimation for archaeological stratigraphy

[...]

G. T. Allum¹, Robert G. Aykroyd², J. G. B. Haigh¹•Institutions (2)

University of Bradford¹, University of Leeds²

01 Jan 1999-Journal of The Royal Statistical Society Series C-applied Statistics

TL;DR: In this paper, the authors present models and algorithms for the stratigraphic analysis of earth core samples collected at archaeological sites to separate the occupation of the site into distinct periods, by dividing the earth core into well-defined blocks of uniform magnetic susceptibility.

...read moreread less

Abstract: Summary. Models and algorithms are presented for the stratigraphic analysis of earth core samples collected at archaeological sites. The aim is to separate the occupation of the site into distinct periods, by dividing the earth core into well-defined blocks of uniform magnetic susceptibility. The models describe the response of detector equipment by using both a spread function and an error process, and they incorporate prior beliefs regarding the nature of the true susceptibility values. The prior parameters are estimated by using pseudolikelihood and the susceptibilities by maximum a posteriori methods via the one-step-late algorithm. These procedures are illustrated with data from synthetic and real core specimens. The new procedures prove to be far superior to other approaches, producing reconstructions which clearly show distinct periods of uniform magnetic susceptibility separated by sharp discontinuities.

...read moreread less

Journal Article•DOI•

Parametric relative survival regression using generalized linear models with application to Hodgkin's lymphoma

[...]

Edie Weller¹, Eric J. Feuer, Carolin M. Frey², Margaret N. Wesley³•Institutions (3)

Harvard University¹, Geisinger Medical Center², Silver Spring Networks³

01 Jan 1999-Journal of The Royal Statistical Society Series C-applied Statistics

TL;DR: Changes in survival rates during 1940–1992 for patients with Hodgkin's disease are studied by using population‐based data to identify when the breakthrough in clinical trials of chemotherapy treatments started to increase population survival rates, and how long it took for the increase to level off, indicating that the full population effect of the breakthrough had been realized.

...read moreread less

Abstract: Summary. Changes in survival rates during 1940-1992 for patients with Hodgkin's disease are studied by using population-based data. The aim of the analysis is to identify when the breakthrough in clinical trials of chemotherapy treatments started to increase population survival rates, and to find how long it took for the increase to level off, indicating that the full population effect of the breakthrough had been realized. A Weibull relative survival model is used because the model parameters are easily interpretable when assessing the effect of advances in clinical trials. However, the methods apply to any relative survival model that falls within the generalized linear models framework. The model is fitted by using modifications of existing software (SAS, GLIM) and profile likelihood methods. The results are similar to those from a cause-specific analysis of the data by Feuer and co-workers. Survival started to improve around the time that a major chemotherapy breakthrough (nitrogen mustard, Oncovin, prednisone and procarbazine) was publicized in the mid1 960s but did not level off for 1 1 years. For the analysis of data where the cause of death is obtained from death certificates, the relative survival approach has the advantage of providing the necessary adjustment for expected mortality from causes other than the disease without requiring information on the causes of death.

...read moreread less

Journal Article•DOI•

Stochastic modelling of the invasion process of nematodes in fly larvae

[...]

Malcolm J. Faddy¹, J. S. Fenlon•Institutions (1)

University of Queensland¹

01 Jan 1999-Journal of The Royal Statistical Society Series C-applied Statistics

TL;DR: Stochastic models based on Markov birth processes are constructed to describe the process of invasion of a fly larva by entomopathogenic nematodes, with their precise form leading to different patterns of invasion being identified for three populations of nematode considered.

...read moreread less

Abstract: Stochastic models based on Markov birth processes are constructed to describe the process of invasion of a fly larva by entomopathogenic nematodes. Various forms for the birth (invasion) rates are proposed. These models are then fitted to data sets describing the observed numbers of nematodes that have invaded a fly larval after a fixed period of time. Non-linear birthrates are required to achieve good fits to these data, with their precise form leading to different patterns of invasion being identified for three populations of nematodes considered. One of these (Nemasys) showed the greatest propensity for invasion. This form of modelling may be useful more generally for analysing data that show variation which is different from that expected from a binomial distribution.

...read moreread less