scispace - formally typeset
Search or ask a question

Showing papers in "Journal of The Royal Statistical Society Series A-statistics in Society in 1999"


Journal ArticleDOI
TL;DR: The authors used fractional polynomials in several continuous covariates simultaneously, and proposed ways of ensuring that the resulting models are parsimonious and consistent with basic medical knowledge, and compared their new approach with conventional modelling methods which apply stepwise variables selection to categorized covariates.
Abstract: To be useful to clinicians, prognostic and diagnostic indices must be derived from accurate models developed by using appropriate data sets We show that fractional polynomials, which extend ordinary polynomials by including non-positive and fractional powers, may be used as the basis of such models We describe how to fit fractional polynomials in several continuous covariates simultaneously, and we propose ways of ensuring that the resulting models are parsimonious and consistent with basic medical knowledge The methods are applied to two breast cancer data sets, one from a prognostic factors study in patients with positive lymph nodes and the other from a study to diagnose malignant or benign tumours by using colour Doppler blood flow mapping We investigate the problems of biased parameter estimates in the final model and overfitting using cross-validation calibration to estimate shrinkage factors We adopt bootstrap resampling to assess model stability We compare our new approach with conventional modelling methods which apply stepwise variables selection to categorized covariates We conclude that fractional polynomial methodology can be very successful in generating simple and appropriate models

469 citations


Journal ArticleDOI
TL;DR: In this paper, Markov chain Monte Carlo methods are used to make inferences about the missing data as well as the unknown parameters of interest in a Bayesian framework, applied to real life data from disease outbreaks.
Abstract: The analysis of infectious disease data is usually complicated by the fact that real life epidemics are only partially observed. In particular, data concerning the process of infection are seldom available. Consequently, standard statistical techniques can become too complicated to implement effectively. In this paper Markov chain Monte Carlo methods are used to make inferences about the missing data as well as the unknown parameters of interest in a Bayesian framework. The methods are applied to real life data from disease outbreaks.

355 citations


Journal ArticleDOI
TL;DR: In this article, a zero-inflated Poisson (ZIP) model is proposed for frequency counts in a dental epidemiological study in Belo Horizonte, Brazil, which evaluated various programmes for reducing caries.
Abstract: For frequency counts, the situation of extra zeros often arises in biomedical applications. This is demonstrated with count data from a dental epidemiological study in Belo Horizonte (the Belo Horizonte caries prevention study) which evaluated various programmes for reducing caries. Extra zeros, however, violate the variance–mean relationship of the Poisson error structure. This extra-Poisson variation can easily be explained by a special mixture model, the zero-inflated Poisson (ZIP) model. On the basis of the ZIP model, a graphical device is presented which not only summarizes the mixing distribution but also provides visual information about the overall mean. This device can be exploited to evaluate and compare various groups. Ways are discussed to include covariates and to develop an extension of the conventional Poisson regression. Finally, a method to evaluate intervention effects on the basis of the ZIP regression model is described and applied to the data of the Belo Horizonte caries prevention study.

349 citations


Journal ArticleDOI
TL;DR: The model is found to fit not only modern data but also some widely spaced historical data for the 19th and 17th centuries, and even some estimates for the early mediaeval period, which is relevant to the current debate about whether there is a fixed upper limit to the length of human life.
Abstract: "Recent new data on old age mortality point to a particular model for the way in which the probability of dying increases with age. The model is found to fit not only modern data but also some widely spaced historical data for the 19th and 17th centuries, and even some estimates for the early mediaeval period. The results show a pattern which calls for explanation. The model can also be used to predict a probability distribution for the highest age which will be attained in given circumstances. The results are relevant to the current debate about whether there is a fixed upper limit to the length of human life." A discussion of the paper by several researchers and a reply by the author are included.

243 citations


Journal ArticleDOI
TL;DR: In this article, the authors reanalyse one of the published meta-analyses in the corrections literature and argue the importance of specifically modelling heterogeneity and selection bias, suggesting lower average effects and substantially increased measures of uncertainty.
Abstract: Summary. What works seeks to identify rehabilitative treatments which are successful in reducing the likelihood that offenders will reoffend. A large number of small case-control studies have been reported in the literature, but with conflicting results. Meta-analysis has been used to reconcile these findings, but again with conflicting results. We reanalyse one of the published meta-analyses in the corrections literature and argue the importance of specifically modelling heterogeneity and selection bias. A sensitivity approach is advocated, suggesting lower average effects and substantially increased measures of uncertainty. The method is tested on a medical example where independent confirmation from a large controlled trial is also available.

155 citations


Journal ArticleDOI
TL;DR: This framework encompasses both the traditional log‐linear approach and various elements from the full Rasch model and explores extensions allowing for interactions between the Rasch and log‐ linear portions of the models in both the classical and the Bayesian contexts.
Abstract: One of the major objections to the standard multiple-recapture approach to population estimation is the assumption of homogeneity of individual “capture” probabilities. Modeling individual capture heterogeneity is complicated by the fact tha t it shows up as as a restricted form of interaction between lists in the contingency table cross -classifying list memberships for all individuals. Traditional log-linear modeling approac hes to capture-recapture problems are well-suited to modeling interactions among lists, but igno re the special dependence structure that individual heterogeneity induces. A random-effects approach, based on the Rasch (1960) model from educational testing and introduced in this context by Darroch, et al. (1993) and Agresti (1994), provides one way to introduce the dependence resulting from heterogeneity into the log-linear model; however, previous efforts to com bine the Rasch-like heterogeneity terms additively with the usual log-linear interaction ter ms suggest that a more flexible approach is required. In this paper we consider both classical multi-level approaches and fully Bayesian hierarchical approaches to modeling individual heterogeneity and list interactions. Our framework encompasses both the traditional log-linear approach and various elements from the full Rasch model. We compare these approaches on two examples, the first arising out of an epidemiological study of a population of diabetics in Italy, and the second a study intended to assess the “size” of the World Wide Web. We also explore extensions allowing for interactions between the Rasch and log-linear portions of t he models in both the classical and Bayesian contexts.

141 citations


Journal ArticleDOI
TL;DR: This paper used multilevel statistical modelling of cross-classified data to explore interviewers' influence on survey non-response, and found that the variability in whole household refusal and non-contact rates is due more to the influence of interviewers than to influence of areas.
Abstract: Summary. This paper illustrates the use of multilevel statistical modelling of cross-classified data to explore interviewers' influence on survey non-response. The results suggest that the variability in whole household refusal and non-contact rates is due more to the influence of interviewers than to the influence of areas. The results from separate logistic regression models are compared with the results from multinomial models using a polytomous dependent variable (refusals, non-contacts and responses). Using the cross-classified multilevel approach allows us to estimate correlations between refusals and non-contacts, suggesting that interviewers who are good at reducing whole household refusals are also good at reducing whole household non-contacts.

139 citations


Journal ArticleDOI
TL;DR: Among those in employment there is some evidence that movement out of their class of origin is in the direction predicted by the idea of health-related social mobility, but this evidence seems strongest for causes of death which are least likely to have been preceded by prolonged incapacity.
Abstract: "The effect of social mobility on the socioeconomic differential in mortality is examined with data from the Office for National Statistics Longitudinal Study. The analyses involve 46,980 men aged 45-64 years in 1981. The mortality risk of the socially mobile is compared with the mortality risk of the socially stable after adjustment for their class of origin (their social class in 1971) and class of destination (their social class in 1981) separately. Among those in employment there is some evidence that movement out of their class of origin is in the direction predicted by the idea of health-related social mobility. This evidence, however, seems strongest for causes of death which are least likely to have been preceded by prolonged incapacity. Movement into the class of destination, however, shows the opposite relationship with mortality."

103 citations


Journal ArticleDOI
TL;DR: It is suggested that a synthesis between these approaches is appropriate, but others follow others in warning of the inevitable extra‐statistical difficulties that will arise.
Abstract: Summary. There is a long history of interest in examining and comparing surgical outcomes. The 'epidemiological' approach was initiated by Florence Nightingale in her suggestion for uniform surgical statistics, and she clearly predicted the problems that are associated with collecting, analysing and interpreting such data. Unfortunately those responsible for implementing and reporting her scheme appeared not to have shared her insight. The contrasting 'clinical' approach was championed by Ernest Codman in his search for full and honest appraisals of surgical errors. Once again, despite initial enthusiasm, others had great difficulty in following his example, although we discuss a recent instance of a reflective analysis of an individual surgeon's performance. We conclude by suggesting that a synthesis between these approaches is appropriate, but we follow others in warning of the inevitable extra-statistical difficulties that will arise.

94 citations


Journal ArticleDOI
TL;DR: In this article, the authors proposed a new approach to the treatment of item non-response in attitude scales, which combines the ideas of latent variable identification with the issues of nonresponse adjustment in sample surveys.
Abstract: Summary. This paper proposes a new approach to the treatment of item non-response in attitude scales. It combines the ideas of latent variable identification with the issues of non-response adjustment in sample surveys. The latent variable approach allows missing values to be included in the analysis and, equally importantly, allows information about attitude to be inferred from nonresponse. We present a symmetric pattern methodology for handling item non-response in attitude scales. The methodology is symmetric in that all the variables are given equivalent status in the analysis (none is designated a ‘dependent’ variable) and is pattern based in that the pattern of responses and non-responses across individuals is a key element in the analysis. Our approach to the problem is through a latent variable model with two latent dimensions: one to summarize response propensity and the other to summarize attitude, ability or belief. The methodology presented here can handle binary, metric and mixed (binary and metric) manifest items with missing values. Examples using both artificial data sets and two real data sets are used to illustrate the mechanism and the advantages of the methodology proposed.

87 citations


Journal ArticleDOI
TL;DR: In this paper, the authors define a multivariate shrinkage estimator which combines information also across sub-populations and outcome variables, and demonstrate the superiority of the multivariate estimator over univariate shrinkages over the unbiased (sample) means.
Abstract: The familiar (univariate) shrinkage estimator of a small area mean or proportion combines information from the small area and a national survey. We define a multivariate shrinkage estimator which combines information also across subpopulations and outcome variables. The superiority of the multivariate shrinkage over univariate shrinkage, and of the univariate shrinkage over the unbiased (sample) means, is illustrated on examples of estimating the local area rates of economic activity in the subpopulations defined by ethnicity, age and sex. The examples use the sample of anonymized records of individuals from the 1991 UK census. The method requires no distributional assumptions but relies on the appropriateness of the quadratic loss function. The implementation of the method involves minimum outlay of computing. Multivariate shrinkage is particularly effective when the area level means are highly correlated and the sample means of one or a few components have small sampling and between-area variances. Estimations for subpopulations based on small samples can be greatly improved by incorporating information from subpopulations with larger sample sizes.

Journal ArticleDOI
TL;DR: An overview of the ONC project is presented, focusing on the design and analysis methodology for the CCS, and results that allow the reader to evaluate the robustness of this methodology are presented.
Abstract: As a result of lessons learnt from the 1991 census, a research programme was set up to seek improvements in census methodology. Underenumeration has been placed top of the agenda in this programme, and every effort is being made to achieve as high a coverage as possible in the 2001 census. In recognition, however, that 100% coverage will never be achieved, the one-number census (ONC) project was established to measure the degree of underenumeration in the 2001 census and, if possible, to adjust fully the outputs from the census for that undercount. A key component of this adjustment process is a census coverage survey (CCS). This paper presents an overview of the ONC project, focusing on the design and analysis methodology for the CCS. It also presents results that allow the reader to evaluate the robustness of this methodology.

Journal ArticleDOI
TL;DR: In this article, the authors compare the random effects approach with the generalized estimating equation (GEE) approach and conclude that the GEE approach is inappropriate for assessing the treatment effects for these data.
Abstract: The generalized estimating equation (GEE) approach to the analysis of longitudinal data has many attractive robustness properties and can provide a 'population average' characterization of interest, for example, to clinicians who have to treat patients on the basis of their observed characteristics. However, these methods have limitations which restrict their usefulness in both the social and the medical sciences. This conclusion is based on the premise that the main motivations for longitudinal analysis are insight into microlevel dynamics and improved control for omitted or unmeasured variables. We claim that to address these issues a properly formulated random-effects model is required. In addition to a theoretical assessment of some of the issues, we illustrate this by reanalysing data on polyp counts. In this example, the covariates include a base-line outcome, and the effectiveness of the treatment seems to vary by base-line. We compare the random-effects approach with the GEE approach and conclude that the GEE approach is inappropriate for assessing the treatment effects for these data.

Journal ArticleDOI
TL;DR: In this paper, a multilevel Poisson model was used to identify factors influencing variation in census ward level teenage conception rates in urban and rural areas of the US. Demographic and socioeconomic characteristics are accounted for as well as access to family planning services.
Abstract: Multilevel Poisson models are used to identify factors influencing variation in census ward level teenage conception rates. Multilevel logistic models are also employed to examine the outcome of these conceptions. Demographic and socioeconomic characteristics are accounted for as well as access to family planning services. The paper emphasizes the importance of customized deprivation indices that are specific to the health outcome in urban and rural areas.

Journal ArticleDOI
TL;DR: A data augmentation approach to computational difficulties in which the algorithm is repeatedly fit an overlapping series of submodels, incorporating the missing terms in each submodel as ‘offsets’.
Abstract: Estimation in mixed linear models is, in general, computationally demanding, since applied problems may involve extensive data sets and large numbers of random effects. Existing computer algorithms are slow and/or require large amounts of memory. These problems are compounded in generalized linear mixed models for categorical data, since even approximate methods involve fitting of a linear mixed model within steps of an iteratively reweighted least squares algorithm. Only in models in which the random effects are hierarchically nested can the computations for fitting these models to large data sets be carried out rapidly. We describe a data augmentation approach to these computational difficulties in which we repeatedly fit an overlapping series of submodels, incorporating the missing terms in each submodel as ‘offsets’. The submodels are chosen so that they have a nested random-effect structure, thus allowing maximum exploitation of the computational efficiency which is available in this case. Examples of the use of the algorithm for both metric and discrete responses are discussed, all calculations being carried out using macros within the MLwiN program.

Journal ArticleDOI
TL;DR: In this paper, the authors present a methodological comparison of generalized random-effects models and scoring approaches for the analysis of primary school educational progress from reception to England and Wales national curriculum key stage 1 mathematics.
Abstract: Much statistical modelling of random effects on ordered responses, particularly of grades in educational research, continues to use linear models and to treat the responses through arbitrary scores. Methodological and software developments now facilitate the proper treatment of such situations through more realistic generalized random-effects models. This paper reviews some methodological comparisons of these approaches. It highlights the flexibility offered by the macro facilities of the multilevel random-effects software MLwiN. It considers applications to an analysis of primary school educational progress from reception to England and Wales national curriculum key stage 1 mathematics. By contrasting the results from generalized modelling and scoring approaches it draws some conclusions about the theoretical, methodological and practical options that are available. It also considers that results of generalized random-model estimation may be more intelligible to users of analytical results.

Journal ArticleDOI
TL;DR: An evaluation is described of two UK Government programmes for the long-term unemployed in Great Britain, Employment Training and Employment Action, using discrete time hazard modelling of event histories, using standard random effect model formulations.
Abstract: An evaluation is described of two UK Government programmes for the long-term unemployed in Great Britain, Employment Training and Employment Action, using discrete time hazard modelling of event histories. The study design employed a closely matched comparison group and carefully chosen control variables to minimize the effect of selection bias on conclusions. The effect of unobserved heterogeneity is investigated by using some standard random effect model formulations.

Journal ArticleDOI
TL;DR: This paper proposes several approaches for modelling population counts and investigates the sensitivity of inference to the sizes of errors in health and population data, and illustrates the methods proposed using data for breast cancer in the Thames region of the UK.
Abstract: Disease mapping studies summarize spatial and spatiotemporal variations in disease risk. This information may be used for simple descriptive purposes, to assess whether health targets are being met or whether new policies are successful, to provide the context for further studies (by providing information on the form and size of the spatial variability in risk) or, by comparing the estiamted risk map with an exposure map, to obtain clues to aetiology. There are well-known problems with mapping raw risks and relative risks for rare diseases and/or small areas since sampling variability tends to dominate the subsequent maps. To alleviate these difficulties a multilevel modelling approach may be followed in which estimates based on small numbers are ‘shrunk’ towards a common value. In this paper we extend these models to investigate the effects of inaccuracies in the health and population data. In terms of the health data we consider the effects of errors that occur due to the imperfect collection procedures that are used by disease registers. For cancers in particular, this is a major problem, with case underascertainment (i.e. undercount) being the common type of error. The populations that are used for estimating disease risks have traditionally been treated as known quantities. In practice, however, these counts are often based on sources of data such as the census which are subject to error (in particular underenumeration) and are only available for census years. Intercensual population counts must consider not only the usual demographic changes (e.g. births and deaths) but migration also. We propose several approaches for modelling population counts and investigate the sensitivity of inference to the sizes of these errors. We illustrate the methods proposed using data for breast cancer in the Thames region of the UK and we compare our results with those obtained from more conventional approaches.

Journal ArticleDOI
TL;DR: In this article, an account is given of methods used to predict the outcome of the 1997 general election from early declared results, for use by the British Broadcasting Corporation (BBC) in its election night television and radio coverage.
Abstract: An account is given of methods used to predict the outcome of the 1997 general election from early declared results, for use by the British Broadcasting Corporation (BBC) in its election night television and radio coverage. Particular features of the 1997 election include extensive changes to constituency boundaries, simultaneous local elections in many districts and strong tactical voting. A new technique is developed, designed to eliminate systematic sources of bias such as differential refusal, for incorporating prior information from the BBC's exit poll. The sequence of forecasts generated on election night is displayed, with commentary.

Journal ArticleDOI
TL;DR: In this article, the effect of daylight level and hour changes on the incidence of road casualties is reviewed and refined, by analysis of official databases for Great Britain (1969-1973 and 1985-1994) and the USA (1991-1995).
Abstract: Summary. Previous studies of the apparent influence of daylight level and hour changes on the incidence of road casualties are reviewed and refined, by analysis of official databases for Great Britain (1969-1973 and 1985-1994) and the USA (1991-1995). New statistical methods, based on precisely computed altitudes of the sun for each accident location, are used to model casualty frequencies aggregated by week and hour of day, and locally evaluated associations between individual casualty incidence and solar altitude. Estimates of the altitude factor are interpreted causally to give counterfactual estimates of the effect of different clock time schedules on countrywide casualty numbers.

Journal ArticleDOI
TL;DR: Analysis of QOL data from a prospective study of breast cancer patients evaluate how physical performance is related to factors such as age, menopausal status and type of adjuvant treatment, highlighting treatment-related factors affecting physical performance that could not be considered within the summary statistic analysis.
Abstract: Summary. Longitudinal health-related quality-of-life (QOL) data are often collected as part of clinical studies. Here two analyses of QOL data from a prospective study of breast cancer patients evaluate how physical performance is related to factors such as age, menopausal status and type of adjuvant treatment. The first analysis uses summary statistic methods. The same questions are then addressed using a multilevel model. Because of the structure of the physical performance response, regression models for the analysis of ordinal data are used. The analyses of base-line and follow-up QOL data at four time points over two years from 257 women show that reported base-line physical performance was consistently associated with later performance and that women who had received chemotherapy in the month before the QOL assessment had a greater physical performance burden. There is a slight power gain of the multilevel model over the summary statistic analysis. The multilevel model also allows relationships with time-dependent covariates to be included, highlighting treatment-related factors affecting physical performance that could not be considered within the summary statistic analysis. Checking of the multilevel model assumptions is exemplified.

Journal ArticleDOI
TL;DR: The background of slow growth and low belief Although telephone surveys of business populations have long been established in the UK, the use of telephone interviewing in surveys of the general public has been quite slow to achieve widespread acceptance, and the residual level of noncoverage, given its concentration among the disadvantaged in society, prevents the full adoption of the telephone approach for many social or ocial surveys.

Journal ArticleDOI
TL;DR: An analysis of an experiment in the interaction between general practitioners and their patients is presented, in which the issue of missing data is addressed by a sensitivity analysis using multiple imputation.
Abstract: Summary. Missing data and, more generally, imperfections in implementing a study design are an endemic problem in large scale studies involving human subjects. We present an analysis of an experiment in the interaction between general practitioners and their patients, in which the issue of missing data is addressed by a sensitivity analysis using multiple imputation. Instead of specifying a model for missingness we explore certain extreme ways of departing from the assumption of data missing at random and establish the largest extent of such departures which would still fail to supplant the evidence about the studied effect. An important advantage of the approach is that the algorithm intended for the complete data, to fit generalized linear models with random effects, is used without any alteration. 1. Background Improving the communication between the patient and the general practitioner (doctor) is an important issue in the continual drive for a more efficient provision of primary health care by the UK National Health Service. It has led to the suggestion that if, before the consultation, the patient wrote down the reasons for their appointment, the direct interaction with the doctor would be much more effective. The patient would inform the doctor more coherently and would receive more focused attention and, possibly, more appropriate treatment. An experiment to test this hypothesis was set up by the staff of the Department of General Practice, University of Leicester (Middleton, 1997). 46 general practitioners in Leicestershire, England, were recruited for a study with the following design: the doctors were randomly assigned to two groups: 'control' (15 doctors) and 'study' (31 doctors). A day was selected for each doctor (the days were within as short a span of time as was practicable) when a seq- uence of 16 consecutive patients were randomly allocated to 'no-list' or 'list' treatment, eight patients to each condition. The patients assigned to the no-list condition were treated as usual; the list patients were requested to write down the reasons for the visit (problems, thoughts, questions, requests for action and the like), with as much detail as they found appropriate. In the subsequent consultation, the doctor would have a look at the patient's note first. This part of the study is referred to as phase 1. Of particular interest are the outcomes of the consultations, such as how long the con- sultation took and how many patients' problems were identified. The doctor also recorded whether a problem was identified that was originally not on the patient's agenda. The occurrence of such a problem is referred to as the by-the-way syndrome.

Journal ArticleDOI
TL;DR: A binary response multilevel model is developed, building on similar models for consumer purchase data, that allows for a dependence on the number of days since last use of one product, shampoo.
Abstract: Patterns of consumers' use of products are of interest to manufacturers. This paper is concerned with modelling diary data on the use of one product, shampoo, recorded to the nearest hour by over 500 men during 1 week. A binary response multilevel model is developed, building on similar models for consumer purchase data. The model allows for a dependence on the number of days since last use. The results of fitting various versions of this model are discussed. A problem is that the number of days since last use is missing for all times up to the first use. An approximate EM approach is considered to deal with this problem.