scispace - formally typeset
Open AccessJournal ArticleDOI

Use of historical control data for assessing treatment effects in clinical trials.

Reads0
Chats0
TLDR
This manuscript reviews several methods for historical borrowing, illustrating how key parameters in each method affect borrowing behavior, and then, compares these methods on the basis of mean square error, power and type I error.
Abstract
Clinical trials rarely, if ever, occur in a vacuum. Generally, large amounts of clinical data are available prior to the start of a study, particularly on the current study's control arm. There is obvious appeal in using (i.e., 'borrowing') this information. With historical data providing information on the control arm, more trial resources can be devoted to the novel treatment while retaining accurate estimates of the current control arm parameters. This can result in more accurate point estimates, increased power, and reduced type I error in clinical trials, provided the historical information is sufficiently similar to the current control data. If this assumption of similarity is not satisfied, however, one can acquire increased mean square error of point estimates due to bias and either reduced power or increased type I error depending on the direction of the bias. In this manuscript, we review several methods for historical borrowing, illustrating how key parameters in each method affect borrowing behavior, and then, we compare these methods on the basis of mean square error, power and type I error. We emphasize two main themes. First, we discuss the idea of 'dynamic' (versus 'static') borrowing. Second, we emphasize the decision process involved in determining whether or not to include historical borrowing in terms of the perceived likelihood that the current control arm is sufficiently similar to the historical data. Our goal is to provide a clear review of the key issues involved in historical borrowing and provide a comparison of several methods useful for practitioners.

read more

Content maybe subject to copyright    Report

Use of historical control data for assessing treatment effects in
clinical trials
Kert Viele
a,*
, Scott Berry
a
, Beat Neuenschwander
b
, Billy Amzal
c
, Fang Chen
d
, Nathan
Enas
e
, Brian Hobbs
f
, Joseph G. Ibrahim
g
, Nelson Kinnersley
h
, Stacy Lindborg
i
, Sandrine
Micallef
j
, Satrajit Roychoudhury
k
, and Laura Thompson
l
a
Berry Consultants, Austin, TX, USA
b
Novartis Pharma, CIS, Basel, Switzerland
c
LA-SER
Analytica, London, UK
d
SAS, Cary, NC, USA
e
Eli Lilly & Company, Indianapolis, IN, USA
f
MD
Anderson, Houston, TX, USA
g
University of North Carolina, Chapel Hill, NC, USA
h
F. Hoffman La
Roche, Welwyn Garden City, Hertfordshire, UK
i
Biogen IDEC, Cambridge, MA, USA
j
Sanofi-
Aventis R&D, Paris, France
k
Novartis, East Hanover, NJ, USA
l
US Food and Drug Administration,
Rockville, MD, USA
Abstract
Clinical trials rarely, if ever, occur in a vacuum. Generally, large amounts of clinical data are
available prior to the start of a study, particularly on the current study’s control arm. There is
obvious appeal in using (i.e., ‘borrowing’) this information. With historical data providing
information on the control arm, more trial resources can be devoted to the novel treatment while
retaining accurate estimates of the current control arm parameters. This can result in more accurate
point estimates, increased power, and reduced type I error in clinical trials, provided the historical
information is sufficiently similar to the current control data. If this assumption of similarity is not
satisfied, however, one can acquire increased mean square error of point estimates due to bias and
either reduced power or increased type I error depending on the direction of the bias. In this
manuscript, we review several methods for historical borrowing, illustrating how key parameters
in each method affect borrowing behavior, and then, we compare these methods on the basis of
mean square error, power and type I error. We emphasize two main themes. First, we discuss the
idea of ‘dynamic’ (versus ‘static’) borrowing. Second, we emphasize the decision process
involved in determining whether or not to include historical borrowing in terms of the perceived
likelihood that the current control arm is sufficiently similar to the historical data. Our goal is to
provide a clear review of the key issues involved in historical borrowing and provide a comparison
of several methods useful for practitioners.
Keywords
priors; borrowing; historical data; Bayesian
1. INTRODUCTION
A large proportion of clinical trials involves the comparison of a novel treatment to an
existing control arm, either a placebo or a standard of care. While often the control arm
stands on its own within a trial, with parameter estimates for the control group depending
Copyright © 2013 John Wiley & Sons, Ltd.
*
Correspondence to: Kert Viele, Berry Consultants, Austin, TX, USA. kert@berryconsultants.net.
NIH Public Access
Author Manuscript
Pharm Stat. Author manuscript; available in PMC 2014 March 13.
Published in final edited form as:
Pharm Stat. 2014 ; 13(1): 41–54. doi:10.1002/pst.1589.
NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript

only on the data within the current trial, interest has been growing over the past few decades
in leveraging historical clinical trial data on the control arm [1–6]. Often, one or more
clinical trials have been conducted involving the control arm (perhaps the current control
arm was the novel treatment in the historical trial). In theory, bringing this existing
information into the current trial holds the promise of more efficient trial design. Such trials
may be smaller, and/or unequal randomization may be used to place proportionately more
subjects on the experimental treatment arm in a study, potentially increasing the relative
amount of information both on the efficacy and safety of the current novel treatment, as well
as on secondary endpoints. In clinical practice, expected results are based on the current set
of historical studies, and it makes statistical sense to capitalize on this historical data
whenever possible.
In practice, methods for borrowing historical information, and the ramifications of these
methods, are less well understood in terms of benefits, effects, and regulatory ramifications.
Potentially, the incorporation of quality external information allows for reduced mean
square error (MSE), increased power, and reduced type I error within the current trial. In
contrast, should the historical data be inconsistent with current trial control arm data, there is
a potential for bias and inflated type I error. The relative weights of these risks depend on
the phase of development. For example, smaller sample sizes in early phase studies
combined with less rigorous control of type I error make the possibility of reduced MSE and
increased power very appealing, while in a phase III trial, any possibility of inflated type I
error may be controversial. In early phase, development point and interval estimates may
carry more weight, but power and type I error remain important as decisions must constantly
be made whether or not to continue a development program. Thus, it is important to
understand type I error and power in terms of ‘how many phase II trials would result in
correct go/no-go decisions for phase III’.
Authors of this article are members of the DIA Bayesian Scientific Working Group
(BSWG), which was formed in 2011 and includes representatives from industry, regulatory
agencies, and academia, with the vision to ensure that Bayesian methods are well
understood, accepted more broadly, and appropriately utilized to improve decision making
and enhance patient outcomes. Our goal in this article is to illustrate and compare several
methods (a test-then-pool approach, power priors, single arm trials, and hierarchical
modeling) in a concrete example, showing the amount of weight each method places on the
historical data, and the potential MSE, power, and type I error implications.
We specifically emphasize the idea of ‘dynamic borrowing’ in the approaches considered. It
is important that any method for historical borrowing recognizes when the current data
appear to be inconsistent with the historical data. We expect variation in the actual
parameters from study to study. These may be due to slightly differing patient populations,
site locations, improvements in secondary aspects of treatment in the time between the
historical and control data, and so forth. A method that incorporates dynamic borrowing
borrows most when the current data are consistent with historical data and borrows least
when the current data are inconsistent.
To see these issues, we begin with an extreme analogy. Suppose your friend is watching a
basketball game and wants to estimate the current (today) free throw shooting percentage of
his favorite player (for those unfamiliar with basketball, the key point here is that the player
takes a series of ‘shots’, and each one is either successful or not). Suppose we know that
going into the game this season the player has made 130 of 200 free shots (65%). Typically,
professional players are fairly consistent over the course of a season, so you argue that this
historical data indicates his current true free throw percentage (the parameter) is probably
around 65%. There might be some discrepancy today (sampling variability in the historical
Viele et al.
Page 2
Pharm Stat. Author manuscript; available in PMC 2014 March 13.
NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript

data, issues today with the particular arena the game is played in, etc.), but you argue you
would be surprised if his true shooting percentage is much different than 65%. If you see
him shoot five times and hit all five, for example, you might believe his current true
shooting percentage is slightly higher than 65%, but you are unlikely to believe he is
suddenly a near perfect free throw shooter. Your friend argues ‘No! you are going to take
the observed results from today and then estimate today’s shooting rate as somewhere
between the observed data and 65%. That is biased! Suppose my favorite player has
corrected his form and now has a true shooting rate of 90%. You will likely reduce the
observed rate closer to 65% and thus underestimate my favorite player’s true shooting rate
for today’. This argument is correct, point estimates constructed in this way are biased if you
use the historical data. Your counterargument here is that the data collected in the past has
value and that it is quite unlikely for a player to correct their form to this degree, particularly
midseason. So do we use what seems like very valuable historical information, or should we
be concerned about the possible biases that will result from using it?
While the basketball analogy is not serious, there are several parallels in clinical trials.
Typically, an agent is explored in many clinical trials over the course of several years, in
situations analogous to the study we want to undertake. We expect there to be some
variation in the response rates for our drug across these studies. In the basketball analogy,
issues like where today’s game is played, and others, may be similar to differing inclusion/
exclusion rules and so forth in the clinical trial. We want to estimate the parameter for our
drug for the current study (‘today’ in the basketball analogy) and need to know how much to
incorporate the available historical data. Statistically, incorporating the historical study will
produce biases in the presence of ‘drift’ (if the current study parameters differ from the
observed historical rate, we will see biases). For later phase trials involving hypothesis tests,
these biases result in inflated type I error depending on the direction and magnitude of the
drift. However, if the historical data is on point, we can acquire dramatically better estimates
incorporating the historical data, in terms of MSE (we see a variance reduction that more
than compensates for the bias) and simultaneous improvements in type I error and power.
Thus, fundamentally the historical data can either help or hurt depending on the relationship
between the past data and the current parameter. Our goal in this manuscript is to illustrate
these trade-offs in a practical simple analysis. Some methods are more robust to drift than
others, and we try to illustrate which methods are the most robust. After assessing the
possible benefits and risks, the user must assess whether the benefits exceed the risks, an
assessment that should include the likelihood of their occurrence. Returning to the basketball
analogy, it may be clear that if the player has corrected their form and now shoots 90%, then
borrowing the historical information is detrimental. However, this assumes a change to 90%
that may not be plausible. If such changes are unlikely, borrowing from historical data may
produce substantial gains over utilizing the limited amount of information in the current day
(basketball) or current study (clinical trials).
We describe our example trial in Section 2 as well as the methods we consider for historical
borrowing. For each method, we identify parameters the user may control and show how
they affect the borrowing behavior, MSE, type I error, and power. While our intent is
illustrative rather than a comprehensive review article, we do provide a minimal amount of
detail particular to the example and references for more technical details behind the
methods. In Section 3, we compare the methods in terms of their borrowing behavior as well
as operating characteristics such as MSE, type I error, and power. In Section 4, we provide a
‘where to go from here’ review of extensions from the current literature to complement the
simpler structure of the example, and finally in Section 5, we provide a discussion.
Viele et al.
Page 3
Pharm Stat. Author manuscript; available in PMC 2014 March 13.
NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript

2. METHODS
Suppose we are about to conduct a trial with a dichotomous endpoint where higher rates are
preferred. We will enroll 400 subjects. Generally, we will consider designs with equal
randomization (200 to control and 200 to treatment).
Looking at the available research on the control arm (this deserves a paper of its own,
generally one must be careful in any literature review to identify studies that are ‘on point’
with similar patient populations, dosing, and so forth to the currently envisioned control
arm), we have a historical study that observed 65 responses in 100 subjects on the current
control arm. Our goal is to incorporate this information into the current trial. See Section 4
for a description of more complicated scenarios (multiple historical studies, covariates, etc.)
Here, our primary goal will be to maintain our current sample size, using the historical
information to increase the power of the trial. Alternatively, we could consider using the
historical information and changing to unbalanced randomization (e.g., 2:1 randomization
preferential to the treatment arm). In the extreme, single arm trials might be conducted using
the historical 0.65 rate as a performance criteria, where the primary analysis indicates that
one must beat 0.65 to achieve a trial success. Our goal here is, as much as possible, to
perform an ‘apples to apples’ comparisons of the methods, particularly with respect to a trial
that does not borrow any information.
2.1. Methods of borrowing
We consider six methods for incorporating the historical data, the first two acting as
‘fenceposts’ for understanding our three main historical borrowing methods. We also
consider single arm trials, as these are also a form of historical borrowing in that typically
the threshold for success (e.g., a null hypothesis response probability) is determined after
looking at historical data.
In all examples except for single arm trials, our primary analysis is a hypothesis test of H
0
:
p
0
= p
T
against H
1
: p
0
< p
T
, where p
0
is the true rate for the current control arm and p
T
is
the true rate for the treatment arm. The six methods are as follows:
1.
Separate—we ignore the historical data. This would be viewed as a ‘standard
analysis’. Here, we would continue with equal randomization on the current
treatment and control, with no incorporation of the historical information. We
perform a Fisher exact test.
2.
Pooling—suppose we perform equal randomization in the current trial (n = 200 in
each arm), but we pool the historical subjects with the current control subjects
(thus, if we observe 140/200 = 0.70 in the control arm of the current study, with our
65/100 historical dataset, our actual control estimate would be (140 + 65)/(200 +
100) = 0.683). One could combine pooling with unequal randomization, but we are
attempting to maintain an equal number of treatment subjects for all methods. We
perform a Fisher exact test but here pool the historical information as if they had
been control observations in the current trial.
3.
Single arm trial—while somewhat unusual for these sample sizes, many single arm
trials are conducted that look at historical data (often with sample sizes less than
our 100 historical subjects) to create a performance criterion that must be beaten in
the current study. This performance criterion may be either a point estimate or
some upper quantile of a CI based on historical data. Single arm trials may be used
in situations where accrual is particularly difficult (thus the goal is to obtain
reasonable power from smaller sample sizes) or where it is viewed as unethical to
Viele et al.
Page 4
Pharm Stat. Author manuscript; available in PMC 2014 March 13.
NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript

include a control arm. In our example, suppose we eliminated the control arm and
placed 200 subjects on the treatment arm, with a primary analysis testing H
0
: p =
0.65 against H
1
: p > 0.65, where the 0.65 is acquired from the observed historical
rate. We perform an exact binomial test.
4.
Test-then-pool—pooling presents an obvious difficulty in that a priori we may not
be sure our historical data are sufficiently similar to our current control arm (our
efforts in reviewing the literature notwithstanding). We would like a way to avoid
pooling in situations where the current control arm appears to be different from the
historical data. In ‘test-then-pool’, we make a choice between the ‘separate’ and
‘pooling’ options by first performing a test of H
0
: p
0
= p
H
against H
1
: p
0
p
H
,
where p
0
is the current control response rate and p
H
is the historical control
response rate. If the null hypothesis of equality is not rejected, one uses the pooling
approach. If the hypothesis of equality is rejected, then one completely ignores the
historical data and performs the separate analysis. This is a basic form of dynamic
borrowing, as the amount of weight assigned to the historical data depends on the
data in the current trial.
5.
Power priors—the power prior ([4], described in more detail in Section 2.4) assigns
a ‘weight’ to the historical data some-where in between the pooling (weight=1) and
separate analyses (weight=0). Thus, the historical data are incorporated to a degree
into the current analysis.
6.
Hierarchical modeling—in a hierarchical model [1, 2, 5, 6], described in more
detail in Section 2.5), we assume a distribution across studies (here the current and
historical controls) with an explicit parameter τ measuring the variation across
studies. A prior distribution is placed on τ that is then updated using the current
data. A discrepancy between the historical and current data would put more weight
toward larger τ values in the posterior distribution than would an agreement
between the current and historical data. As with power priors, the borrowing
depends on the parameter τ and incorporates its uncertainty, producing dynamic
borrowing.
Generally, these methods move from the simplest to implement to more complicated.
Separate, pooling, or single arm trials can be quickly implemented from scratch or have
standard implementation in statistical software packages. Test-then-pool requires some basic
coding to connect the two hypothesis tests (one for whether to pool, the other to perform the
final analysis). Power priors, depending on the likelihood, may be performed in a statistical
package or may require MCMC, while hierarchical modeling almost always requires some
MCMC implementation, although some commercially available clinical trial simulation
software will perform these calculations automatically. In general, none of these methods
require excessive computation that would be an obstacle to implementation.
2.2. Comparison of pooling, separate, and single arm trials
We tend to think of the separate and ‘pooled’ analyses as fence-posts in that they represent
the extremes of borrowing. Intriguingly, a single arm trial represents a further extreme of
borrowing in that we typically use the historical data to construct a performance criterion.
Thus, given that our historical study has an observed rate of 0.65, we might choose a single
arm trial where we place 200 subjects on treatment (no control arm) and use a primary
analysis of H
0
: p
T
= 0.65 against H
1
: p
T
> 0.65. In effect, in the single arm trial, we choose
not to observe the control data (typically this is performed with smaller sample sizes, but the
principles described here remain).
Viele et al.
Page 5
Pharm Stat. Author manuscript; available in PMC 2014 March 13.
NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript

Citations
More filters
Journal ArticleDOI

Interleukin-6 Receptor Antagonists in Critically Ill Patients with Covid-19.

TL;DR: In this paper, the authors evaluated tocilizumab and sarilumab in an ongoing international, multifactorial, adaptive platform trial, which used a Bayesian statistical model with predefined criteria for superiority, efficacy, equivalence or futility.
Journal ArticleDOI

Get real in individual participant data (IPD) meta-analysis: a review of the methodology

TL;DR: A literature review as mentioned in this paper provides an overview of methods for performing an individual participant data meta-analysis (IPD-MA) using evidence from clinical trials or non-randomized studies when investigating treatment efficacy.
Journal ArticleDOI

A trial of gantenerumab or solanezumab in dominantly inherited Alzheimer's disease.

Stephen Salloway, +64 more
- 21 Jun 2021 - 
TL;DR: A randomized, placebo-controlled, multi-arm trial of gantenerumab or solanezumab in participants with DIAD across asymptomatic and symptomatic disease stages was conducted in this paper.
Posted ContentDOI

Interleukin-6 Receptor Antagonists in Critically Ill Patients with Covid-19 - Preliminary report

TL;DR: Tocilizumab and Sarilumab both met the pre-defined triggers for efficacy as discussed by the authors, yielding >99.9% and 99.5% posterior probabilities of superiority compared with control.
References
More filters
Book

Bayesian Data Analysis

TL;DR: Detailed notes on Bayesian Computation Basics of Markov Chain Simulation, Regression Models, and Asymptotic Theorems are provided.
Journal ArticleDOI

Bayesian data analysis.

TL;DR: A fatal flaw of NHST is reviewed and some benefits of Bayesian data analysis are introduced and illustrative examples of multiple comparisons in Bayesian analysis of variance and Bayesian approaches to statistical power are presented.
Journal ArticleDOI

Bayesian Model Averaging: A Tutorial

TL;DR: Bayesian model averaging (BMA) provides a coherent mechanism for ac- counting for this model uncertainty and provides improved out-of- sample predictive performance.
Journal ArticleDOI

Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper)

Andrew Gelman
- 01 Sep 2006 - 
TL;DR: In this paper, a folded-noncentral-$t$ family of conditionally conjugate priors for hierarchical standard deviation parameters is proposed, and weakly informative priors in this family are considered.
Posted Content

Prior distributions for variance parameters in hierarchical models

TL;DR: In this paper, a folded-noncentral-t family of conditionally conjugate priors for hierarchical standard deviation pa- rameters is proposed, and weakly informative priors in this family are considered.
Related Papers (5)
Frequently Asked Questions (1)
Q1. What are the contributions mentioned in the paper "Use of historical control data for assessing treatment effects in clinical trials" ?

Generally, large amounts of clinical data are available prior to the start of a study, particularly on the current study ’ s control arm. This can result in more accurate point estimates, increased power, and reduced type I error in clinical trials, provided the historical information is sufficiently similar to the current control data. In this manuscript, the authors review several methods for historical borrowing, illustrating how key parameters in each method affect borrowing behavior, and then, they compare these methods on the basis of mean square error, power and type I error. First, the authors discuss the idea of ‘ dynamic ’ ( versus ‘ static ’ ) borrowing. Their goal is to provide a clear review of the key issues involved in historical borrowing and provide a comparison of several methods useful for practitioners.