scispace - formally typeset
Search or ask a question
Author

Patrick Royston

Other affiliations: Imperial College London, Analysis Group, University of London  ...read more
Bio: Patrick Royston is an academic researcher from University College London. The author has contributed to research in topics: Covariate & Regression analysis. The author has an hindex of 90, co-authored 294 publications receiving 51856 citations. Previous affiliations of Patrick Royston include Imperial College London & Analysis Group.


Papers
More filters
Journal ArticleDOI
TL;DR: The principles of the method and how to impute categorical and quantitative variables, including skewed variables, are described and shown and the practical analysis of multiply imputed data is described, including model building and model checking.
Abstract: Multiple imputation by chained equations is a flexible and practical approach to handling missing data. We describe the principles of the method and show how to impute categorical and quantitative variables, including skewed variables. We give guidance on how to specify the imputation model and how many imputations are needed. We describe the practical analysis of multiply imputed data, including model building and model checking. We stress the limitations of the method and discuss the possible pitfalls. We illustrate the ideas using a data set in mental health, giving Stata code fragments. Copyright © 2010 John Wiley & Sons, Ltd.

6,349 citations

Journal ArticleDOI
29 Jun 2009-BMJ
TL;DR: The appropriate use and reporting of the multiple imputation approach to dealing with missing data is described by Jonathan Sterne and colleagues.
Abstract: Most studies have some missing data. Jonathan Sterne and colleagues describe the appropriate use and reporting of the multiple imputation approach to dealing with them

5,293 citations

Journal ArticleDOI
27 Jan 1990-BMJ
TL;DR: Use of summary measures to analyse serial measurements, though not new, is potentially a useful and simple tool in medical research.
Abstract: In medical research data are often collected serially on subjects. The statistical analysis of such data is often inadequate in two ways: it may fail to settle clinically relevant questions and it may be statistically invalid. A commonly used method which compares groups at a series of time points, possibly with t tests, is flawed on both counts. There may, however, be a remedy, which takes the form of a two stage method that uses summary measures. In the first stage a suitable summary of the response in an individual, such as a rate of change or an area under a curve, is identified and calculated for each subject. In the second stage these summary measures are analysed by simple statistical techniques as though they were raw data. The method is statistically valid and likely to be more relevant to the study questions. If this method is borne in mind when the experiment is being planned it should promote studies with enough subjects and sufficient observations at critical times to enable useful conclusions to be drawn. Use of summary measures to analyse serial measurements, though not new, is potentially a useful and simple tool in medical research.

2,875 citations

Journal ArticleDOI
TL;DR: This article describes an implementation for Stata of the MICE method of multiple multivariate imputation, described by van Buuren, Boshuizen, and Knook (1999), and describes five ado-files, which create multiple mult variables and utilities to intercon-vert datasets created by mvis and by the miset program from John Carlin and colleagues.
Abstract: Following the seminal publications of Rubin about thirty years ago, statisticians have become increasingly aware of the inadequacy of "complete-case" analysis of datasets with missing observations. In medicine, for example, observa- tions may be missing in a sporadic way for different covariates, and a complete-case analysis may omit as many as half of the available cases. Hotdeck imputation was implemented in Stata in 1999 by Mander and Clayton. However, this technique may perform poorly when many rows of data have at least one missing value. This article describes an implementation for Stata of the MICE method of multiple multivariate imputation described by van Buuren, Boshuizen, and Knook (1999). MICE stands for multivariate imputation by chained equations. The basic idea of data analysis with multiple imputation is to create a small number (e.g., 5-10) of copies of the data, each of which has the missing values suitably imputed, and analyze each complete dataset independently. Estimates of parameters of inter- est are averaged across the copies to give a single estimate. Standard errors are computed according to the "Rubin rules", devised to allow for the between- and within-imputation components of variation in the parameter estimates. This arti- cle describes five ado-files. mvis creates multiple multivariate imputations. uvis imputes missing values for a single variable as a function of several covariates, each with complete data. micombine fits a wide variety of regression models to a mul- tiply imputed dataset, combining the estimates using Rubin's rules, and supports survival analysis models (stcox and streg), categorical data models, generalized linear models, and more. Finally, misplit and mijoin are utilities to intercon- vert datasets created by mvis and by the miset program from John Carlin and colleagues. The use of the routines is illustrated with an example of prognostic modeling in breast cancer.

2,132 citations

Journal ArticleDOI
TL;DR: It is argued that the simplicity achieved is gained at a cost; dichotomization may create rather than avoid problems, notably a considerable loss of power and residual confounding.
Abstract: In medical research, continuous variables are often converted into categorical variables by grouping values into two or more categories. We consider in detail issues pertaining to creating just two groups, a common approach in clinical research. We argue that the simplicity achieved is gained at a cost; dichotomization may create rather than avoid problems, notably a considerable loss of power and residual confounding. In addition, the use of a data-derived 'optimal' cutpoint leads to serious bias. We illustrate the impact of dichotomization of continuous predictor variables using as a detailed case study a randomized trial in primary biliary cirrhosis. Dichotomization of continuous data is unnecessary for statistical analysis and in particular should not be applied to explanatory variables in regression models.

1,853 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: Mice adds new functionality for imputing multilevel data, automatic predictor selection, data handling, post-processing imputed values, specialized pooling routines, model selection tools, and diagnostic graphs.
Abstract: The R package mice imputes incomplete multivariate data by chained equations. The software mice 1.0 appeared in the year 2000 as an S-PLUS library, and in 2001 as an R package. mice 1.0 introduced predictor selection, passive imputation and automatic pooling. This article documents mice, which extends the functionality of mice 1.0 in several ways. In mice, the analysis of imputed data is made completely general, whereas the range of models under which pooling works is substantially extended. mice adds new functionality for imputing multilevel data, automatic predictor selection, data handling, post-processing imputed values, specialized pooling routines, model selection tools, and diagnostic graphs. Imputation of categorical data is improved in order to bypass problems caused by perfect prediction. Special attention is paid to transformations, sum scores, indices and interactions using passive imputation, and to the proper setup of the predictor matrix. mice can be downloaded from the Comprehensive R Archive Network. This article provides a hands-on, stepwise approach to solve applied incomplete data problems.

10,234 citations

Journal ArticleDOI
TL;DR: It is important that the medical profession play a significant role in critically evaluating the use of diagnostic procedures and therapies as they are introduced in the detection, management, and management of diseases.
Abstract: PREAMBLE......e4 APPENDIX 1......e121 APPENDIX 2......e122 APPENDIX 3......e124 REFERENCES......e124 It is important that the medical profession play a significant role in critically evaluating the use of diagnostic procedures and therapies as they are introduced in the detection, management,

8,362 citations

Journal ArticleDOI
12 Oct 2016-BMJ
TL;DR: Risk of Bias In Non-randomised Studies - of Interventions is developed, a new tool for evaluating risk of bias in estimates of the comparative effectiveness of interventions from studies that did not use randomisation to allocate units or clusters of individuals to comparison groups.
Abstract: Non-randomised studies of the effects of interventions are critical to many areas of healthcare evaluation, but their results may be biased. It is therefore important to understand and appraise their strengths and weaknesses. We developed ROBINS-I (“Risk Of Bias In Non-randomised Studies - of Interventions”), a new tool for evaluating risk of bias in estimates of the comparative effectiveness (harm or benefit) of interventions from studies that did not use randomisation to allocate units (individuals or clusters of individuals) to comparison groups. The tool will be particularly useful to those undertaking systematic reviews that include non-randomised studies.

8,028 citations

Journal ArticleDOI
TL;DR: The principles of the method and how to impute categorical and quantitative variables, including skewed variables, are described and shown and the practical analysis of multiply imputed data is described, including model building and model checking.
Abstract: Multiple imputation by chained equations is a flexible and practical approach to handling missing data. We describe the principles of the method and show how to impute categorical and quantitative variables, including skewed variables. We give guidance on how to specify the imputation model and how many imputations are needed. We describe the practical analysis of multiply imputed data, including model building and model checking. We stress the limitations of the method and discuss the possible pitfalls. We illustrate the ideas using a data set in mental health, giving Stata code fragments. Copyright © 2010 John Wiley & Sons, Ltd.

6,349 citations

Journal ArticleDOI
29 Jun 2009-BMJ
TL;DR: The appropriate use and reporting of the multiple imputation approach to dealing with missing data is described by Jonathan Sterne and colleagues.
Abstract: Most studies have some missing data. Jonathan Sterne and colleagues describe the appropriate use and reporting of the multiple imputation approach to dealing with them

5,293 citations