scispace - formally typeset
Open AccessJournal ArticleDOI

Improper analysis of trials randomised using stratified blocks or minimisation.

Brennan C Kahan, +1 more
- 20 Feb 2012 - 
- Vol. 31, Iss: 4, pp 328-340
Reads0
Chats0
TLDR
It is shown that balancing treatment groups using stratification leads to correlation between the treatment groups, and if this correlation is ignored and an unadjusted analysis is performed, standard errors for the treatment effect will be biased upwards, resulting in 95% confidence intervals that areToo wide, type I error rates that are too low and a reduction in power.
Abstract
Many clinical trials restrict randomisation using stratified blocks or minimisation to balance prognostic factors across treatment groups. It is widely acknowledged in the statistical literature that the subsequent analysis should reflect the design of the study, and any stratification or minimisation variables should be adjusted for in the analysis. However, a review of recent general medical literature showed only 14 of 41 eligible studies reported adjusting their primary analysis for stratification or minimisation variables. We show that balancing treatment groups using stratification leads to correlation between the treatment groups. If this correlation is ignored and an unadjusted analysis is performed, standard errors for the treatment effect will be biased upwards, resulting in 95% confidence intervals that are too wide, type I error rates that are too low and a reduction in power. Conversely, an adjusted analysis will give valid inference. We explore the extent of this issue using simulation for continuous, binary and time-to-event outcomes where treatment is allocated using stratified block randomisation or minimisation.

read more

Content maybe subject to copyright    Report

Research Article
Statistics
in Medicine
Received XXXX
(www.interscience.wiley.com) DOI: 10.1002/sim.0000
Improper analysis of trials randomised using
stratified blocks or minimisation
Brennan C. Kahan
a
and Tim P. Morris
a
Many clinical trials restrict randomisation using stratified blocks or minimisation to balance prognostic factors
across treatment groups. It is widely acknowledged in the statistical literature that the subsequent analysis should
reflect the design of the study, and any stratification or minimisation variables should be adjusted for in the analysis.
However, a review of recent general medical literature showed only 14 of 41 eligible studies reported adjusting
their primary analysis for stratification or minimisation variables. We show that balancing treatment groups using
stratification leads to correlation between the treatment groups. If this correlation is ignored and an unadjusted
analysis is performed, standard errors for the treatment effect will be biased upwards, resulting in 95% confidence
intervals that are too wide, type I error rates that are too low, and a reduction in power. Conversely, an adjusted
analysis will give valid inference. The extent of this issue is explored using simulation for continuous, binary,
and time-to-event outcomes where treatment is allocated using stratified block randomisation or minimisation.
Copyright
c
0000 John Wiley & Sons, Ltd.
Keywords: Clinical trials; Covariates; Adjustment; Minimisation; Stratification; Unadjusted analysis
1. Introduction
Well conducted randomised controlled trials are considered the gold standard for unbiased comparison of treatments as they
ensure there are no systematic differences between treatment groups. The most basic method of allocating patients to a
treatment is simple randomisation[
1
], where the probability of being assigned to either treatment is the same for all patients.
However, this can lead to chance imbalances in important prognostic factors between treatment groups as well as the overall
proportion receiving each treatment. A number of randomisation procedures have been developed that promote balance
in key prognostic factors between treatment groups[
2
]. These include stratified block randomisation[
3
], stratified biased
coin randomisation[
4
], stratified urn randomisation[
5
], optimum biased coin[
6
], and minimisation[
7
]. A recent article by
Pond et al[
8
] showed that 85% of cancer clinical trials used at least one baseline variable in their randomisation, and Scott
et al[
9
] showed that in 2001 45% of randomised trials in the Lancet or New England Journal of Medicine used either
stratification or minimisation. These are the two most popular methods of balancing covariates in clinical trials, despite
concerns surrounding both[10, 11], and will be the focus of this paper.
Stratified block randomisation involves grouping patients into strata defined by baseline characteristics, and performing
block randomisation within each stratum. This ensures that the stratifying variables are balanced on completion of each
block. For a small number of variables this method will work well, but tends to break down as the number grows larger.
Four variables with two levels each leads to
2
4
= 16
strata. If certain combinations of balancing variables are rare, some
strata may not have even one completed block. Minimisation is a method that overcomes this problem. For each new
patient entering a trial the baseline characteristics of patients in the two treatment groups are summarised and the patient is
allocated to the treatment which would provide the best marginal balance in terms of prognostic factors. In practice the
patient is usually assigned to the preferred treatment with probability
π
where
π >
0.5. When neither treatment offers an
advantage in terms of marginal balance, π = 0.5 as with simple randomisation.
a
MRC Clinical Trials Unit, Aviation House, 125 Kingsway, London, WC2B 6NH, U.K.
Correspondence to: MRC Clinical Trials Unit, Aviation House, 125 Kingsway, London, WC2B 6NH, U.K. Email: brk@ctu.mrc.ac.uk
Statist. Med. 0000, 00 1–13 Copyright
c
0000 John Wiley & Sons, Ltd.
Prepared using simauth.cls [Version: 2010/03/10 v3.00]

Statistics
in Medicine B. C. Kahan and T. P. Morris
Although stratified randomisation and minimisation are common in clinical trials, there remains some confusion over
whether it is necessary to adjust for the stratification or minimisation variables in the analysis of the trial. A majority
of the statistical literature suggests any baseline variable involved in randomisation should also be included in the
analysis[
9
,
12
,
13
,
14
,
15
,
16
,
17
,
18
] though some argue this is not always sensible[
19
]. Many textbooks and articles that
describe stratification and minimisation do not mention any implications for the analysis[
20
,
21
,
22
,
23
,
24
,
25
,
26
]. It is
important that this issue is clarified: an incorrect analysis could potentially result in a beneficial treatment being denied to
patients or a treatment that is not beneficial being adopted.
A number of simulation studies have shown that ignoring the stratification or minimisation variables in the analysis
may lead to invalid tests of significance[
27
,
28
,
29
,
30
,
31
]. Type I error rates are shown to be too low for continuous and
time-to-event outcomes. Binary outcomes have not, to our knowledge, been studied using simulation.
Throughout, we will refer to variables which are used in the allocation scheme as ‘balancing’ variables, stratified block
randomisation as ‘stratification’, and treatment allocation as ‘randomisation’ (since we only consider minimisation with
some random element, as is usual in practice).
This article investigates the effects of ignoring stratification or minimisation variables in the analysis. In Section 2 we
show that an unadjusted analysis after stratification leads to standard errors for the treatment effect that are biased upwards.
In Section 3 we use simulation to show that stratification leads to correlation between the treatment groups, and investigate
what impact this has on coverage rates if ignored in the analysis. We then present a series of simulations based on real trial
data to determine to what extent these issues are likely to arise in practice. Section 4 reviews a cross section of recently
reported randomised controlled trials to see how often balancing variables are used in randomisation, and how these are
dealt with in the analysis. Section 5 is a discussion, and makes recommendations for trials randomised using stratified
blocks or minimisation.
2. Correlated outcomes
We initially consider the simple case of matched pairs to investigate how ignoring the matching will affect the analysis,
before considering the issue under general stratification.
Consider a study with a continuous outcome in which each patient is matched with another patient on some baseline
variables. In each matched pair one patient is assigned to the treatment group, the other to the control group. Given
n
matched pairs, this can be thought of as a stratified trial with
n
strata, 2 patients per stratum and a total of
2n
patients. This
is similar to a 2 ×2 crossover trial where each patient receives both treatments.
Let
Y
1 j
and
Y
2 j
denote patient outcomes from the
j
th
matched pair, assigned to treatments 1 and 2 respectively. Assume
that
Var(Y
1 j
) = Var(Y
2 j
) = σ
2
, the correlation between patients in the
j
th
pair is
ρ
, and the correlation between patients
with different j is 0. It follows that Cov(Y
1 j
,Y
2 j
) = ρσ
2
.
Let
¯
Y
1
and
¯
Y
2
denote the mean outcomes in treatment groups 1 and 2 respectively. The variance of the treatment difference,
¯
Y
1
¯
Y
2
, can be found using the following formula:
Var(
¯
Y
1
¯
Y
2
) = Var(
¯
Y
1
) + Var(
¯
Y
2
) 2Cov(
¯
Y
1
,
¯
Y
2
) (1)
This is simplified to
Var(
¯
Y
1
¯
Y
2
) = Var(
¯
Y
1
) + Var(
¯
Y
2
)
for a two sample t test, where
¯
Y
1
and
¯
Y
2
are assumed to be independent. It is easy to show when patients are matched in a
pair, as in the above example, this assumption is not true:
Cov(
¯
Y
1
,
¯
Y
2
) = Cov
1
n
n
j=1
Y
1 j
,
1
n
n
j=1
Y
2 j
!
(2)
=
1
n
2
n
j=1
Cov(Y
1 j
,Y
2 j
) (3)
=
ρσ
2
n
(4)
The correlation between
¯
Y
1
and
¯
Y
2
is then ρ, which is the same as the correlation between matched pairs.
The variance of the treatment difference not accounting for this correlation between
¯
Y
1
and
¯
Y
2
is
2σ
2
n
2 www.sim.org Copyright
c
0000 John Wiley & Sons, Ltd. Statist. Med. 0000, 00 1–13
Prepared using simauth.cls

B. C. Kahan and T. P. Morris
Statistics
in Medicine
However, given that
¯
Y
1
and
¯
Y
2
are correlated by design, the true variance, continuing from (1), is
2σ
2
n
2ρσ
2
n
=
2σ
2
n
(1 ρ).
We then see that by not accounting for the correlation between
¯
Y
1
and
¯
Y
2
, the variance of the treatment difference will
be biased upwards by a factor of
(1 ρ)
1
. Parzen et al[
32
] showed similar results for trials that are balanced on centre.
This issue arises because of the covariance between
¯
Y
1
and
¯
Y
2
. This is similar under general stratification. By assuming
a correlation of
ρ
for all patients within a stratum,
Cov(
¯
Y
1
,
¯
Y
2
)
cannot be equal to 0, because the covariance of any two
patients in the same stratum is non-zero, and always positive. This means that under stratification in general, standard errors
of treatment effect will be biased upwards (though not necessarily to the same extent as in the case of matched-pairs),
leading to confidence intervals that are too wide and p-values that are too large.
It is well known that an unadjusted analysis is inappropriate for a matched-pairs study (where pair would be included in
the analysis) and for crossover trials (where subject would be included in the analysis). Although the correlation within
strata of a parallel group trial may be smaller than the within-subject correlation for a crossover trial, the considerations
are the same. The extent of the problem depends largely on the within-stratum correlation (i.e. the intraclass correlation).
If this is non-negligible then there will be bias in the estimate of
Var(
¯
Y
1
¯
Y
2
)
. The within-stratum correlation depends
on the strength of the relationship between the stratification variables and the outcome. The stronger the relationship, the
larger the within-stratum correlation. Additionally, as more balancing variables are used in the randomisation process, the
within-stratum correlation will increase (assuming the balancing variables are associated with the outcome). In practice,
this means we should always expect non-negligible within-stratum correlation in stratified trials, as variables should only
be used in balancing if they are expected to be related to outcome.
Stratification will also lead to correlated treatment groups in the cases of binary or survival outcomes. However, this is
complicated by the fact that the standard error of the treatment effect is expected to increase when covariates are fitted[
33
].
For the binary case, Robinson and Jewell[
34
] show that adjusting for a balanced covariate that is associated with the
outcome will cause a proportionally larger increase in the treatment effect estimate than in its standard error, meaning
ˆ
β
adj
SE(
ˆ
β
adj
)
>
ˆ
β
unadj
SE(
ˆ
β
unadj
)
,
where
ˆ
β
is the estimate of the log–odds-ratio and the subscript denotes whether this is adjusted or unadjusted. This implies
an increase in power for adjusted analyses. It is important to investigate what effect an unadjusted analysis will have in
practice, given that the standard error is expected to increase after adjustment.
3. Simulation studies
3.1. General considerations
Simulation is used to investigate the impact of an unadjusted analysis after stratification or minimisation. We initially show
that the correlation between treatment groups is introduced by stratified randomisation, as shown in (2). Secondly, we
examine coverage rates of 95% confidence intervals for the treatment difference as the effect of the stratification variable
on outcome increases. Finally, we present simulations based on data from real trials to explore what impact unadjusted
analyses might have on coverage and power in practice. All simulations were done using Stata 11.1.
3.2. Correlation between treatment groups
This simulation study investigates the correlation between treatment groups after both simple randomisation and stratification.
Two hundred patients were simulated for each replication. Eight thousand replications were used to estimate the correlation
between treatment groups. A random sample of 200 replications is shown in Figure 1. Two hundred replications were
chosen to maintain clarity of the graph. A continuous response Y
i
for the i
th
patient was simulated as:
Y
i
= α + β X
treat
+ γX
strat
+ ε
i
, (5)
where
X
treat
is a binary indicator equal to one if the patient receives the treatment,
β
is the additive effect of treatment on
outcome,
X
strat
are the stratification variables (here only one binary stratification variable is used) and
γ
are the regression
coefficients corresponding to
X
strat
.
γ
was assigned values of 3 and then 6 and
ε
i
N(0, 1)
. These effects are extreme and
unlikely to occur in practice but are used to illustrate the point that the correlation between treatment groups increases as
γ
Statist. Med. 0000, 00 1–13 Copyright
c
0000 John Wiley & Sons, Ltd. www.sim.org 3
Prepared using simauth.cls

Statistics
in Medicine B. C. Kahan and T. P. Morris
increases. It should be noted that the correlation between treatment groups is independent of the treatment effect, and will
be the same for any value of β (here, we used β = 0).
Figure 1 shows the strong correlation between
¯
Y
1
and
¯
Y
2
(where the subscripts denote the treatment group) under stratified
randomisation. Under simple randomisation however,
¯
Y
1
and
¯
Y
2
are uncorrelated. This demonstrates that the correlation is
introduced by the process of stratification. Under simple randomisation treatment groups will be uncorrelated, and so an
unadjusted analysis will give valid results. The correlation between
¯
Y
1
and
¯
Y
2
increases as the strength of the relationship
between the stratification variable and outcome increases.
3.3. Coverage
This simulation study investigates the coverage of 95% confidence intervals for the treatment effect as the effect of the
stratifying variable increased. The coverage rate was defined as the proportion of times the 95% confidence interval
contained
β
. Eight thousand replications were used so that if the true coverage was 95% then 8000 simulations would
estimate a confidence interval for coverage to within ±0.5%.
Data were again generated using model (5). Here,
ε
i
N(0, 1)
for all simulations. As in Section 3.2 the choice of
β
is
unimportant as the coverages rates are independent of treatment effect in this scenario, and so the results will be the same
for any value of
β
(again
β
was set to 0).
X
strat
was a binary stratification variable, and patients were assigned to each level
of
X
strat
with probability 0.5.
γ
was set to values of 0 to 3 in increments of 0.2. Each simulated patient was randomised
twice: once using simple randomisation and once using stratified randomisation with a block size of eight. A block size of
eight was chosen so that it would be small enough to promote balance, but not so small as to be unrealistic. Both adjusted
and unadjusted analyses were performed. Four scenarios were therefore considered for each simulated dataset:
1. Simple randomisation, unadjusted analysis
2. Simple randomisation, adjusted analysis
3. Stratified randomisation, unadjusted analysis
4. Stratified randomisation, adjusted analysis
For each replication a 95% confidence interval for the treatment effect was calculated. These confidence intervals were then
used to calculate the coverage for each scenario.
Figure 2 shows results for
n = 250
(similar results, not shown, were observed for
n =
100, 500 and 1000). This shows
that ignoring balancing variables in the analysis will lead to confidence intervals that do not have nominal coverage. At the
extreme, the coverage levels approach 100%. Even when
γ
is small relative to the residual standard deviation the coverage
is affected. In contrast, adjusting for balancing factors in the analysis or using simple randomisation always gave nominal
coverage.
3.4. Simulations based on real trial data
Thus far we have only considered the impact of an unadjusted analysis in the case of stratified randomisation with a
continuous outcome. In this section we examine the impact of an unadjusted analysis for continuous, binary, and time-to-
event outcomes after both stratification and minimisation have been used. Simulations were carried out using parameters
from real trial data. Five trials were used: FASTER[
35
], MIST2[
36
], RE01, GBSG, and PBC. Information on RE01, GBSG
and PBC can be found in Royston and Sauerbrei[
37
]. Of these ve trials, only FASTER and MIST2 used balancing variables
in their randomisation. For the other trials, several prognostic factors were chosen as candidate balancing variables. Sections
3.4.1 to 3.4.5 give more information on each individual trial. As above, 8000 replications were used for each scenario.
Balancing variables were generated from a multivariate normal distribution. Variances and covariances were based on
the original dataset so that the proportion of patients in each stratum was similar to the original study. Binary and ordinal
data were then categorised based on cut-points specified to give the desired proportions. Continuous observations that fell
outside the range of the original dataset were replaced with the minimum or maximum from the original dataset.
Sample sizes of 100, 200, 500, and 1000 were used, as well as the original sample size from each study. The treatment
effect was chosen to give approximately 80% power when using the original sample size.
Stratification was used with random permuted blocks of size 8. Minimisation allocated treatments in turn and then
calculated the marginal covariate imbalance for each. The preferred treatment was then chosen by a biased coin with
π = 0.8
.
(This reflects practice since, with the exception of permuted blocks, deterministic methods are generally discouraged
[
12
,
11
].) For stratification and minimisation, continuous variables were dichotomised at their observed mean from the
original dataset, but were included in the analysis as continuous variables. The GBSG and MIST2 trials were exceptions, as
progesterone receptor status from GBSG was dichotomised at its observed median due to skewness, and baseline pleural
effusion in MIST2 was dichotomised at the same value as used in the original trial.
Coverage, power, and standard errors were compared between adjusted and unadjusted analyses. For each analysis we
defined a model-based standard error and an empirical standard error as follows:
4 www.sim.org Copyright
c
0000 John Wiley & Sons, Ltd. Statist. Med. 0000, 00 1–13
Prepared using simauth.cls

B. C. Kahan and T. P. Morris
Statistics
in Medicine
SE
model
=
1
r
r
j=1
SE
j
and
SE
empirical
=
s
1
r 1
r
j=1
(
ˆ
β
j
β )
2
,
where
r
denotes the number of simulations, and
SE
j
denotes the standard error of the treatment effect for the
j
th
simulation.
We define the % bias in model-based standard errors as
100
SE
model
SE
empirical
SE
empirical
.
Values > 0 indicate model-based standard errors are biased upwards, while < 0 indicates they are biased downwards.
Continuous outcomes were again generated from model (5), as in sections 3.2 and 3.3. Binary outcomes were generated
using a latent response model. Latent variables
Y
?
i
were generated and the observed responses
Y
i
were classified as 1 if
Y
?
i
> 0
and 0 otherwise. The latent response was simulated from a model similar to (5) but where
ε
i
follows a logistic
distribution with mean 0 and variance
π
2
3
. Here
β
and
γ
represent log-odds ratios. Because the expected treatment effect
would be different for adjusted and unadjusted models (even when
X
strat
are balanced[
33
]), coverage of the treatment effect
was compared when β was set to 0 so that the expectation was the same for adjusted and unadjusted models.
Survival outcomes were generated from a Weibull distribution with baseline hazard functions based on the observed data.
Survival times were generated as suggested by Bender et al[38] by
T = H
1
0
[ln(U)exp((βX
treat
+ γX
strat
))],
where
H
0
is the cumulative baseline hazard function and
U Uniform(0, 1)
. This model for survival times implies a
proportional hazards model
h(t|x) = h
0
(t) exp(βX
treat
+ γX
strat
).
As with binary outcomes, coverage of the treatment effect was compared with β set to 0.
3.4.1. FASTER The FASTER trial (Function After Spinal Treatment: Exercise and Rehabilitation) was a 2
×
2 factorial
trial which tested the effects of rehabilitation and an educational booklet on Oswestry disability index (ODI), a continuous
outcome, in 316 participants following back surgery (either a discectomy for herniated disc or decompression for spinal
stenosis)[35]. The type of surgery and the operating surgeon were used as balancing variables.
Our simulations simplify the study to a single factor parallel group trial. Two sets of simulations were performed: the
first set used the type of surgery and surgeon as balancing factors, as in the original trial; the second excluded surgeon and
instead used type of surgery and baseline ODI as balancing variables. Additionally, in order to investigate the effect of
block size on the results, the second set of simulations used block sizes of 2, 8, and 32.
For all simulations the residual standard deviation was set to 18, 46% of participants were assumed to have a discectomy,
and the regression coefficient for discectomy was
13. For the simulations involving surgeon, there were 23 surgeons in the
study, 10 of which operated on less than 10 patients each. This was simplified in our simulations by combining all surgeons
with less than 10 patients, resulting in 14 surgeons with the number of patients per surgeon ranging from 10 to 40. This
was done to avoid overstratification. The effect of surgeon was included by generating a random effect for each surgeon
based on a normal distribution. The variance was chosen so that the intraclass correlation coefficient for surgeon was 0.01,
reflecting the observed data. All parameters used for simulation were estimated prior to combining surgeons with less than
10 patients. For the second set of simulations, baseline ODI was generated as
N(46, 19
2
)
and its regression coefficient
was set to 0.5.
3.4.2. MIST2 The MIST2 trial was a 2
×
2 factorial trial testing whether tPA (tissue plasminogen activator) or DNase
(deoxyribunoclease) were effective in reducing the size of patients pleural effusion (a continuous outcome)[
36
]. Patients
were randomised to one of four treatment groups using minimisation. The size of the baseline pleural effusion (greater
or less than 30% of the hemithorax), whether the patient was purulent, and whether the infection was acquired via the
community or in hospital were all used as minimisation factors. For simplicity, simulations were conducted assuming a
parallel group design with two treatment groups.
The residual standard deviation was set to 16, 48% of patients were assumed to be purulent, and 12% of patients were
assumed to have had a hospital acquired infection. The mean and standard deviation of the size of the baseline pleural
effusion were set to 43 and 22 respectively. The regression coefficients were
0.6 for the size of the baseline pleural
effusion, 0.9 for purulence, and 6.5 for a hospital acquired infection.
Statist. Med. 0000, 00 1–13 Copyright
c
0000 John Wiley & Sons, Ltd. www.sim.org 5
Prepared using simauth.cls

Citations
More filters
Journal ArticleDOI

Clinical Trials: A Practical Approach

M. K. Palmer
Journal ArticleDOI

When and how should multiple imputation be used for handling missing data in randomised clinical trials – a practical guide with flowcharts

TL;DR: This work considers how to optimise the handling of missing data during the planning stage of a randomised clinical trial and recommends analytical approaches which may prevent bias caused by unavoidable missing data.
References
More filters
Book

Practical statistics for medical research

TL;DR: Practical Statistics for Medical Research is a problem-based text for medical researchers, medical students, and others in the medical arena who need to use statistics but have no specialized mathematics background.
Journal ArticleDOI

CONSORT 2010 Explanation and Elaboration: updated guidelines for reporting parallel group randomised trials

TL;DR: This update of the CONSORT statement improves the wording and clarity of the previous checklist and incorporates recommendations related to topics that have only recently received recognition, such as selective outcome reporting bias.
Journal ArticleDOI

Sequential treatment assignment with balancing for prognostic factors in the controlled clinical trial.

Stuart J. Pocock, +1 more
- 01 Mar 1975 - 
TL;DR: A new general procedure for treatment assignment is described which concentrates on minimizing imbalance in the distributions of treatment numbers within the levels of each individual prognostic factor.
Journal ArticleDOI

Clinical Trials: A Practical Approach

M. K. Palmer
Related Papers (5)
Frequently Asked Questions (17)
Q1. What are the contributions mentioned in the paper "Improper analysis of trials randomised using stratified blocks or minimisation" ?

It is widely acknowledged in the statistical literature that the subsequent analysis should reflect the design of the study, and any stratification or minimisation variables should be adjusted for in the analysis. The authors show that balancing treatment groups using stratification leads to correlation between the treatment groups. 

Eight thousand replications were used so that if the true coverage was 95% then 8000 simulations would estimate a confidence interval for coverage to within ±0.5%. 

as more balancing variables are used in the randomisation process, the within-stratum correlation will increase (assuming the balancing variables are associated with the outcome). 

Centre was used as a balancing variable in 35 trials (54%), and 24 trials (37%) used at least one patient-level prognostic factor as a balancing factor. 

Given that the only effect of adjusting for a stratification variable which is unrelated to outcome is the loss of a degree of freedom, it is recommended that when stratification or minimisation has been used all analyses which estimate the treatment effect are pre-specified to be adjusted for balancing factors. 

Of these, seven were excluded: three were cluster-randomised trials, one was a crossover trial, two were single arm studies, and one was a secondary analysis of a study that had been previously reported within the review period. 

in order to investigate the effect of block size on the results, the second set of simulations used block sizes of 2, 8, and 32. 

Determining which balancing variables are not related to outcome within a trial will rely on post-hoc analysis or preliminary testing (where the method of analysis depends on the results of a preliminary significance test). 

Given that balancing variables should only be chosen if they are highly prognostic, it is likely that the effect sizes seen in their simulations would be similar to those seen in actual trials using stratification or minimisation. 

This shows that ignoring balancing variables in the analysis will lead to confidence intervals that do not have nominal coverage. 

A number of simulation studies have shown that ignoring the stratification or minimisation variables in the analysis may lead to invalid tests of significance[27, 28, 29, 30, 31]. 

In practice, this means the authors should always expect non-negligible within-stratum correlation in stratified trials, as variables should only be used in balancing if they are expected to be related to outcome. 

Well conducted randomised controlled trials are considered the gold standard for unbiased comparison of treatments as they ensure there are no systematic differences between treatment groups. 

Some people are inherently mistrustful of adjusted analyses as they feel the investigators may have used a variable selection technique that leads to biased results, or performed several different analyses and only presented those which were most favourable. 

These issues lead to a loss of interpretability: the authors can no longer interpret a 95% confidence interval as nominal, or p-values as the probability of observing a result as or more extreme under the null hypothesis. 

With small sample sizes for binary or time-to-event outcomes, adjusted analyses resulted in coverage rates that were too low (Table 2). 

The most basic method of allocating patients to a treatment is simple randomisation[1], where the probability of being assigned to either treatment is the same for all patients.