Journal Article•DOI•

Sensitivity Analyses for Robust Causal Inference from Mendelian Randomization Analyses with Multiple Genetic Variants

Q: What are the contributions in this paper?

In this article, the authors discuss a range of sensitivity analyses that will either support or question the validity of causal inference from a Mendelian randomization analysis with multiple genetic variants. 7 A diagram corresponding to these assumptions is presented in Figure 1. the authors further assume that all valid instrumental variables identify the same causal parameter ; they return to this assumption in the discussion. In this article, the authors assume that the effects of ( i ) the instrumental variables on the risk factor, ( ii ) the instrumental variables on the outcome, ( iii ) the risk factor on the outcome are linear without effect modification ; and ( iv ) the association of the genetic variant with the risk factor is homogeneous in the population. This is an open access article distributed under the creative commons Attribution license 4. 0 ( ccBY ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The authors report no conflicts of interest. Supplemental digital content is available through direct URl citations in the HtMl and PDF versions of this article ( www. epidem. com ). Editor ’ s Note: A commentary on this article appears on p. 43. The authors focus on sensitivity analyses of greatest practical relevance for ensuring robust causal inferences, and those that can be undertaken using summarized data.

Q: What is the practical difficulty of determining which variants to include in a mendelian?

A practical difficulty of determining which variants to include in a Mendelian randomization analysis using measured covariates, aside from that of distinguishing between pleiotropy and mediation, is that of multiple testing.

Q: what is the role of anakinra in reducing interleukin-1 levels?

For instance, inhibition of interleukin-1 by the drug anakinra has been observed to lead to decreased levels of c-reactive protein and interleukin-6 in clinical trials.

Q: what are the main approaches to assess the association of genetic variants with the risk factor?

if there are covariates that by biological considerations should be downstream consequences of the risk factor, then the associations of genetic variants with these covariates can be assessed as positive controls to give confidence that the function of the genetic variants matches the known consequences of the risk factor.

Q: What methods allow more general departures from the instrumental variable assumptions for the invalid instruments?

the penalization and median-based methods allow more general departures from the instrumental variable assumptions for the invalid instruments.

Q: what are the main approaches to assess the association of genetic variants with a measured covari?

23For instance, if increasing body mass index leads to increased blood pressure, then genetic variants that are instrumental variables for body mass index should also be associated with blood pressure.

Q: what is the pleiotropic effect of the egger regression method?

under an assumption that is weaker than standard instrumental variable assumptions, the slope coefficient from the egger regression method provides an estimate of the causal effect that is consistent asymptotically even if all the genetic variants have pleiotropic effects on the outcome.

Q: what is the l1 penalization method for cAD?

this approach has been applied for investigating the causal effect of lipid fractions on cAD risk.50 More formal penalizationmethods have been proposed using l1-penalization to downweight the contribution of outlying variants to the analysis in a continuous way.

Stephen Burgess¹, Jack Bowden², Tove Fall³, Erik Ingelsson, Simon G. Thompson - Show less +1 more•Institutions (3)

University of Cambridge¹, University of Bristol², Uppsala University³

01 Jan 2017-Epidemiology (Lippincott Williams and Wilkins)-Vol. 28, Iss: 1, pp 30-42

TL;DR: A range of sensitivity analyses are discussed that will either support or question the validity of causal inference from a Mendelian randomization analysis with multiple genetic variants, and those that can be undertaken using summarized data are focused on.

read less

Abstract: Mendelian randomization investigations are becoming more powerful and simpler to perform, due to the increasing size and coverage of genome-wide association studies and the increasing availability of summarized data on genetic associations with risk factors and disease outcomes. However, when using multiple genetic variants from different gene regions in a Mendelian randomization analysis, it is highly implausible that all the genetic variants satisfy the instrumental variable assumptions. This means that a simple instrumental variable analysis alone should not be relied on to give a causal conclusion. In this article, we discuss a range of sensitivity analyses that will either support or question the validity of causal inference from a Mendelian randomization analysis with multiple genetic variants. We focus on sensitivity analyses of greatest practical relevance for ensuring robust causal inferences, and those that can be undertaken using summarized data. Aside from cases in which the justification of the instrumental variable assumptions is supported by strong biological understanding, a Mendelian randomization analysis in which no assessment of the robustness of the findings to violations of the instrumental variable assumptions has been made should be viewed as speculative and incomplete. In particular, Mendelian randomization investigations with large numbers of genetic variants without such sensitivity analyses should be treated with skepticism.

...read moreread less

Summary (5 min read)

Jump to: [FIGURE 1.] – [ASSESSING THE INSTRUMENTAL VARIABLE ASSUMPTIONS] – [Use of Measured Covariates] – [Gene-Environment Interaction] – [Scatter Plot and Test for Heterogeneity] – [FIGURE 3.] – [ROBUST ANALYSIS METHODS] – [Penalization Methods] – [Median-based Methods] – [Egger Regression] – [Example: C-reactive Protein and Coronary Artery Disease Risk] – [DISCUSSION] – [Comparison with Previous Literature] – [Summarized Data and Two-sample Mendelian Randomization] – [Genetic Variants with Different Functional Effects] – [Pleiotropy and Other Violations of the Instrumental Variable Assumptions] and [CONCLUSIONS]

FIGURE 1.

Diagram of instrumental variable assumptions for Mendelian randomization.
The three assumptions (i, ii, iii) are illustrated by the presence of an arrow, indicating the effect of one variable on the other (assumption i), or by a dashed line with a cross, indicating that there is no direct effect of one variable on the other (assumptions ii and iii).

ASSESSING THE INSTRUMENTAL VARIABLE ASSUMPTIONS

The first set of approaches the authors consider are those to assess whether the instrumental variable assumptions are likely to be satisfied or not for a set of genetic variants.
The authors consider in turn the assessment of the association with measured confounders, the exploitation of a natural experiment in the form of a gene-environment interaction, examination of a scatter plot combined with a heterogeneity test, and of a funnel plot combined with a test for directional pleiotropy.

Use of Measured Covariates

The assumption that an instrumental variable is not associated with confounders of the risk factor-outcome association is not fully testable, as not all confounders will be known or measured.
Associations are no stronger than would be expected by chance alone.
Inhibition of interleukin-1 by the drug anakinra has been observed to lead to decreased levels of c-reactive protein and interleukin-6 in clinical trials.
In some cases, valid causal inference may still be possible even if a genetic variant has a pleiotropic association with a measured covariate; for instance, by adjusting for the covariate in the analysis model.
An alternative approach with summarized data is a multivariable Mendelian randomization analysis, in which genetic associations with the outcome are regressed on the genetic associations with the risk factor and covariates in a multivariable weighted regression model.

Gene-Environment Interaction

For some applications of Mendelian randomization, a further natural experiment may be available if the postulated causal effect is present in one stratum of the population, but absent in another.
32 For example, the association of alcoholrelated genetic variants with esophageal cancer risk is present in those who drink alcohol, but absent in abstainers.
One potential complication of such an analysis is the possibility of collider bias; 34 by stratifying on the risk factor, associations between the genetic variants and the outcome may be distorted in the strata (in the examples above, in alcohol consumers/abstainers).
35, 36 Associations (estimates in standard deviation units and 95% confidence intervals) of four genetic variants in the CRP gene region with a range of covariates per C-reactive protein increasing allele.
16 Copyright © 2016 Wolters Kluwer Health, Inc. Unauthorized reproduction of this article is prohibited.

Scatter Plot and Test for Heterogeneity

Even if the instrumental variable assumptions are in doubt for some or all of the variants, if several independent genetic variants in different gene regions are concordantly associated with the outcome, then a causal conclusion would seem reasonable.
Any point that substantially deviates from this line should be investigated for potential pleiotropy.
A statistical test for heterogeneity can be performed using cochran's Q test on the causal estimates from each 2 are the inverse-variance weights.
This statistic can be calculated using only summarized data.

FIGURE 3.

Diagram to illustrate the difference between pleiotropy (left, the association of the genetic variant with the covariate is independent of the risk factor) and mediation (right, the association of the genetic variant with the covariate is mediated entirely via the risk factor).
Egger regression is a method for detecting small study bias (often interpreted as publication bias) in a meta-analysis of separate studies.
45 the method can also be used for detecting directional pleiotropy from separate genetic variants.
47 the genetic associations should be orientated so that the associations with the risk factor all have the same sign.
If there is no intercept term in this regression, the slope parameter is the inverse-variance weighted causal estimate.

ROBUST ANALYSIS METHODS

The second category of sensitivity analyses is that of robust analysis methods.
Robust analysis methods allow different (and when the main purpose is to test the causal null hypothesis, weaker) assumptions than standard instrumental variable methods.
In turn, the authors consider penalization methods, median-based methods, and egger regression.

Penalization Methods

The authors first consider methods in which the contribution of some genetic variants (e.g., heterogeneous or outlying variants) to the analysis is downweighted (or penalized).
The simplest way of performing a penalization method is to omit some of the variants from the analysis.
With a small number of genetic variants, the causal estimates omitting one variant at a time could be considered.
This sensitivity analysis has been undertaken for the effect of lDl-c on aortic stenosis.
They require individual-level data and a one-sample setting (genetic variants, risk factor, and outcome measurements are available for the same individuals).

Median-based Methods

An alternative family of methods that gives consistent estimates when up to half the genetic variants are not valid instrumental variables, but that can be performed using summarized data rather than individual-level data, are medianbased methods.
The weighted median estimate is consistent under the assumption that genetic variants representing over 50% of the weight in the analysis are valid instruments.
This is for Mendelian randomization analysis of C-reactive protein on coronary artery disease risk using genetic variants throughout the genome that have been demonstrated as associated with C-reactive protein at a genome-wide level of significance.
Horizontal lines represent 95% confidence intervals for the instrumental variable estimates.
Confidence intervals for the median and weighted estimates can be estimated using bootstrapping.

Egger Regression

The egger regression method was introduced above as a test for directional pleiotropy; this test does not make any assumption about the genetic variants.
Under an assumption that is weaker than standard instrumental variable assumptions, the slope coefficient from the egger regression method provides an estimate of the causal effect that is consistent asymptotically even if all the genetic variants have pleiotropic effects on the outcome.
There is some evidence for the general plausibility of the inSiDe assumption, as associations of genetic variants with different phenotypic variables have been shown to be largely uncorrelated in an empirical study.
The penalization and median-based methods allow more general departures from the instrumental variable assumptions for the invalid instruments.
Using genetic variants chosen solely on the basis of their association with the risk factor, a broad range of methods affirmed that lDl-c was a causal risk factor for cAD risk.

Example: C-reactive Protein and Coronary Artery Disease Risk

The inverse-variance weighted method was originally proposed as a fixed-effect meta-analysis of the causal estimates from each of the genetic variants.
The authors consider fixed-effect and multiplicative random-effects models for both the inversevariance weighted and egger regression methods.
56 Also, the authors consider simple (i.e., unweighted) median and weighted median estimates.
The corresponding randomeffects analyses imply that there is no convincing evidence for a causal effect.

DISCUSSION

When multiple genetic variants from different gene regions are used in a Mendelian randomization analysis, it is highly implausible that all the genetic variants satisfy the instrumental variable assumptions.
This does not preclude a causal conclusion; however, it means that a simple instrumental variable analysis alone should not be relied on to give a causal conclusion.
Inappropriate and naive application of standard Mendelian randomization methods may lead to exactly the same problems of unmeasured confounding that the technique was designed to avoid.
The authors have discussed a range of sensitivity analyses that can be used to question the plausibility of a Mendelian randomization analysis using multiple variants, focusing on those analyses that are judged to be most useful to an applied analyst and those that can be performed using summarized data.
Not every sensitivity analysis may be appropriate for each case, but some effort should be made to investigate whether a causal finding is robust to violations of the instrumental variable assumptions.

Comparison with Previous Literature

From its initial popularization, proponents of Mendelian randomization have been candid about the stringent and untestable assumptions required in Mendelian randomization.
3, 14 However, applied investigations have not always reflected this need for caution.
In comparison with previous attempts to offer robust approaches for causal inference in Mendelian randomization, the authors have here repeated some of the guidance of Glymour et al., 32 specifically relating to the search for gene-environment interactions and to testing for heterogeneity between the estimates from different variants.
Substantial attenuation of the association on adjustment for the risk factor is expected if the genetic variant is a valid instrumental variable; however, such attenuation may not occur in practice, for example, due to measurement error in the exposure 58 -conversely, some attenuation may occur for an invalid instrumental variable.
Violations of the assumptions of homogeneity and/or linearity of the causal effect would also lead to difficulties in interpreting the causal estimate, although they are unlikely to lead to inappropriate causal inferences or inflated type 1 error rates under the null.

Summarized Data and Two-sample Mendelian Randomization

All of the sensitivity analyses discussed in this article can equally be performed b.
Odds ratio for coronary artery disease per 1-SD (1.05 unit) increase in log-transformed c-reactive protein concentration (equivalent to a 2.86-fold increase in c-reactive protein concentration).
A further concern with summarized data is the use of two-sample analyses, in which data on the gene-risk factor and gene-outcome associations are taken from nonoverlapping datasets.
This is not to discourage the use of summarized data or two-sample Mendelian randomization analyses, but to acknowledge that the bar for evidential quality is even higher in this case.

Genetic Variants with Different Functional Effects

The authors have assumed that there is a single causal effect of the risk factor on the outcome, and interpreted deviation from this (i.e., heterogeneity of causal effect estimates) as evidence that the instrumental variable assumptions are violated for some of the genetic variants.
In reality, if genetic variants have different functional effects on the risk factor, then different magnitudes of causal effect may be expected.
Genetic variants associated with body mass index may have different biological mechanisms giving rise to the association, and may affect the outcome to different extents.
Heterogeneity between causal estimates based on sets of genetic variants grouped according to their biological function may help reveal which mechanisms are causal.
The causal estimates presented in this article still provide a valid test of the causal null hypothesis, but do not have an interpretation as estimates of a causal parameter.

Pleiotropy and Other Violations of the Instrumental Variable Assumptions

The authors have discussed violations of the instrumental variable assumptions primarily using the language of pleiotropy.
In particular, violations of the exclusion restriction assumption (i.e., no effect of the genetic variant on the outcome except for that via the risk factor) can be expressed as pleiotropic effects.
63 while this adjustment has proved successful in some cases, it is not guaranteed to eliminate population stratification.
32 classical (nondifferential, zero mean) measurement error in the risk factor does not lead to bias in instrumental variable estimates.
If there are multiple versions of the risk factor, then this would lead to difficulties in interpreting the causal findings.

CONCLUSIONS

The increasing size and coverage of genome-wide association studies and the increasing availability of summarized data on genetic associations are making the application of Mendelian randomization simpler.
The methods for sensitivity analysis described in this article will help to judge whether a causal conclusion from a Mendelian randomization analysis is reasonable or not.
Aside from cases in which the selection of the genetic variants and their justification as instrumental variables is motivated by strong biological understanding, a Mendelian randomization analysis in which no assessment of the robustness of the findings has been made should be viewed as speculative.

Did you find this useful? Give us your feedback

Figures (8)

FIGURE 3. Diagram to illustrate the difference between pleiotropy (left, the association of the genetic variant with the covariate is independent of the risk factor) and mediation (right, the association of the genetic variant with the covariate is mediated entirely via the risk factor).

TABLE 2. Summary of Sensitivity Analyses Considered in this Article, and Limitations of Each of the Proposed Analyses

FIGURE 4. Scatter plots of genetic associations with the outcome against genetic associations with the risk factor (lines represent 95% confidence intervals) for Mendelian randomization analysis of CRP on coronary artery disease risk using genetic variants in the CRP gene region (left) and genetic variants throughout the genome (right) that have been demonstrated as associated with C-reactive protein at a genome-wide level of significance.

TABLE 1. Estimates of Causal Effect of C-reactive Protein on Coronary Artery Disease Risk Based on 17 Genome-wide Significant Variants

FIGURE 5. Funnel plot of instrument precision ˆ ( ˆ ) β β Xj

FIGURE 1. Diagram of instrumental variable assumptions for Mendelian randomization. The three assumptions (i, ii, iii) are illustrated by the presence of an arrow, indicating the effect of one variable on the other (assumption i), or by a dashed line with a cross, indicating that there is no direct effect of one variable on the other (assumptions ii and iii).

FIGURE 6. Estimates (ordered by magnitude) of causal effect of CRP on CAD risk from inverse-variance weighted method using 17 genome-wide significant genetic variants omitting variants systematically two at a time.

FIGURE 2. Associations (estimates in standard deviation units and 95% confidence intervals) of four genetic variants in the CRP gene region with a range of covariates per C-reactive protein increasing allele. Adapted from CRP CHD Genetics Collaboration.16

Content maybe subject to copyright Report

Burgess, S., Bowden, J., Fall, T., Ingelsson, E., & Thompson, S. G.

(2017). Sensitivity analyses for robust causal inference from

Mendelian randomization analyses with multiple genetic variants.

Epidemiology

(1), 30-42.

https://doi.org/10.1097/EDE.0000000000000559

Publisher's PDF, also known as Version of record

License (if available):

CC BY

Link to published version (if available):

10.1097/EDE.0000000000000559

Link to publication record in Explore Bristol Research

PDF-document

This is the final published version of the article (version of record). It first appeared online via Wolters Kluwer at

http://journals.lww.com/epidem/fulltext/2017/01000/Sensitivity_Analyses_for_Robust_Causal_Inference.6.aspx.

Please refer to any applicable terms of use of the publisher.

University of Bristol - Explore Bristol Research

General rights

This document is made available in accordance with publisher policies. Please cite only the

published version using the reference above. Full terms of use are available:

http://www.bristol.ac.uk/red/research-policy/pure/user-guides/ebr-terms/

30 | www.epidem.com Epidemiology • Volume 28, Number 1, January 2017

REVIEW ARTICLE

Abstract: Mendelian randomization investigations are becoming

more powerful and simpler to perform, due to the increasing size

and coverage of genome-wide association studies and the increas-

ing availability of summarized data on genetic associations with risk

factors and disease outcomes. However, when using multiple genetic

variants from different gene regions in a Mendelian randomization

analysis, it is highly implausible that all the genetic variants satisfy

the instrumental variable assumptions. This means that a simple

instrumental variable analysis alone should not be relied on to give

a causal conclusion. In this article, we discuss a range of sensitivity

analyses that will either support or question the validity of causal

inference from a Mendelian randomization analysis with multiple

genetic variants. We focus on sensitivity analyses of greatest practi-

cal relevance for ensuring robust causal inferences, and those that can

be undertaken using summarized data. Aside from cases in which the

justiﬁcation of the instrumental variable assumptions is supported by

strong biological understanding, a Mendelian randomization analysis

in which no assessment of the robustness of the ﬁndings to viola-

tions of the instrumental variable assumptions has been made should

be viewed as speculative and incomplete. In particular, Mendelian

randomization investigations with large numbers of genetic variants

without such sensitivity analyses should be treated with skepticism.

(Epidemiology 2017;28: 30–42)

n instrumental variable in an observational study behaves

similarly to random treatment assignment in an experi-

mental setting.

It provides a natural experiment, whereby

individuals with different levels of the instrumental variable

differ on average with respect to the putative risk factor, but

not with respect to any confounders of the risk factor–out-

come association.

Mendelian randomization is the use of a

genetic variant as a proxy for a modiﬁable risk factor.

3,4

a genetic variant satisﬁes the assumptions of an instrumental

variable for the risk factor, then whether there is an associa-

tion between the genetic variant and the outcome is a test of

whether the risk factor is a cause of the outcome.

The instrumental variable assumptions are satisﬁed for

a genetic variant if

(i) the genetic variant is associated with the risk factor;

(ii) the genetic variant is not associated with confound-

ers of the risk factor–outcome relationship; and

(iii) the genetic variant is not associated with the out-

come conditional on the risk factor and confound-

ers of the risk factor–outcome relationship.

These assumptions imply that the only causal pathway

from the genetic variant to the outcome is via the risk factor,

and there is no other causal pathway either directly to the out-

come or via a confounder.

A diagram corresponding to these

assumptions is presented in Figure 1.

We further assume that all valid instrumental vari-

ables identify the same causal parameter; we return to this

assumption in the discussion. For this interpretation to hold,

it is necessary for certain parametric assumptions to hold. In

this article, we assume that the effects of (i) the instrumen-

tal variables on the risk factor, (ii) the instrumental variables

on the outcome, (iii) the risk factor on the outcome are lin-

ear without effect modiﬁcation; and (iv) the association of

the genetic variant with the risk factor is homogeneous in

the population.

These assumptions are not necessary for the

identiﬁcation of a causal effect, but they ensure that the esti-

mate from each instrumental variable targets the same average

causal effect.

Weaker assumptions can identify a local aver-

age causal effect;

however, the local average causal effect is

likely to differ for each instrumental variable. Although these

an open access article distributed under the Creative Commons Attribution

License 4.0 (CCBY), which permits unrestricted use, distribution, and repro-

duction in any medium, provided the original work is properly cited.

ISSN: 1044-3983/16/2801-0030

DOI: 10.1097/EDE.0000000000000559

Submitted 9 October 2015; accepted 13 September 2016.

From the

Cardiovascular Epidemiology Unit, Department of Public Health

and Primary Care, University of Cambridge, Cambridge, United King-

dom;

Medical Research Council Integrative Epidemiology Unit, School

of Social and Community Medicine, University of Bristol, Bristol, United

Kingdom; and

Department of Medical Sciences, Molecular Epidemiol-

ogy, Uppsala University, Uppsala, Sweden.

Stephen Burgess is funded by a fellowship from the Wellcome Trust (100114).

Jack Bowden is supported by a Methodology Research Fellowship from

the UK Medical Research Council (Grant Number MR/N501906/1).

Simon G. Thompson is supported by the British Heart Foundation (Grant

Number CH/12/2/29428).

The authors report no conﬂicts of interest.

Supplemental digital content is available through direct URL citations

in the HTML and PDF versions of this article (www.epidem.com).

Editor’s Note: A Commentary on this article appears on p. 43.

Correspondence: Stephen Burgess, Department of Public Health & Primary

Care, Strangeways Research Laboratory, 2 Worts Causeway, Cambridge,

CB1 8RN, United Kingdom. E-mail: sb452@medschl.cam.ac.uk.

Sensitivity Analyses for Robust Causal Inference from

Mendelian Randomization Analyses with Multiple

Genetic Variants

Stephen Burgess,

Jack Bowden,

Tove Fall,

Erik Ingelsson,

and Simon G. Thompson

Epidemiology • Volume 28, Number 1, January 2017 Sensitivity Analyses for Mendelian Randomization

assumptions are strict, the causal estimate from an instru-

mental variable analysis is a valid test statistic for the causal

null hypothesis without requiring the assumptions of linearity,

homogeneity, or monotonicity.

In any case, the causal effect

of intervention on a risk factor is likely to depend on several

aspects of the intervention (e.g., its magnitude, duration, and

pathway), and therefore will not precisely correspond to the

estimate from a Mendelian randomization analysis.

Hence,

we would urge practitioners to view the assessment of causal-

ity as the primary result of a Mendelian randomization, and

not to interpret any causal estimate too literally.

We also assume that the genetic variants are mutu-

ally independent in their distributions, although extensions

are available for most of the analysis methods in the case of

correlated variants, provided that the correlation structure is

known.

Genetic variants are particularly suitable candidate

instrumental variables, as they are ﬁxed at conception, and

hence cannot be affected by environmental factors that could

otherwise lead to confounding or reverse causation.

How-

ever, there are many well-documented ways in which the

instrumental variable assumptions may be violated for any

particular genetic variant, such as pleiotropy, linkage disequi-

librium, and population stratiﬁcation.

3,15

For risk factors that are soluble protein biomarkers,

there is often a gene region that encodes the protein (for exam-

ple, the CRP gene region for C-reactive protein

), or a regula-

tor or inhibitor of the protein (e.g., the IL6R gene region for

interleukin-6

). Using one or more variants from such a gene

region as instrumental variables would be ideal for a Mende-

lian randomization analysis, as these genetic variants would

be the most likely to satisfy the instrumental variable assump-

tions, and the most informative proxies for intervention on the

risk factor.

However, such genetic variants do not exist for

many risk factors.

The approach of using multiple genetic variants in dif-

ferent gene regions is particularly suitable for complex risk

factors that are multifactorial and polygenic, such as body

mass index,

height,

or blood pressure.

Summarized data

(in particular, beta-coefﬁcients and standard errors) on genetic

associations with the risk factor can be combined with sum-

marized data on genetic associations with the outcome (that

are often publicly available for download) to provide causal

effect estimates, under the assumption that the genetic vari-

ants are all instrumental variables.

22,23

Using multiple genetic

variants increases the power of a Mendelian randomization

investigation compared with an analysis based on a single

variant.

However, even if only one of the genetic variants is

not a valid instrumental variable, the causal estimate based on

all the variants from a conventional Mendelian randomization

analysis will be biased and type 1 (false positive) error rates

will be inﬂated.

25,26

In this article, we describe a range of sensitivity analy-

ses that either support or question the validity of causal infer-

ence from a Mendelian randomization analysis with multiple

genetic variants. These sensitivity analyses will be useful for

judging whether a causal conclusion from such an analysis is

plausible or not. We focus on those sensitivity analyses that

can be implemented using summarized data only. We consider

approaches under two broad categories: methods for assess-

ing the instrumental variable assumptions, and robust analysis

methods that rely on a less stringent set of assumptions than a

conventional Mendelian randomization analysis.

We illustrate the approaches using the example of esti-

mating the causal effect of C-reactive protein (CRP) on coro-

nary artery disease (CAD) risk using four genetic variants in

the CRP gene region,

and using 17 genetic variants (eTable

A1; http://links.lww.com/EDE/B114) that have been shown to

be associated with CRP at a genome-wide level of signiﬁcance

in a large meta-analysis—see eFigure in Ref. 27—beta-coef-

ﬁcients represent per allele associations with log-transformed

CRP concentrations. Genetic associations with CAD risk

were taken from the CARDIoGRAM consortium;

beta-

coefﬁcients represent per allele log odds ratios for CAD risk.

Ethical approval for the analyses using four genetic variants in

the CRP gene region was granted by the Cambridgeshire eth-

ics review committee; for the analyses using 17 genetic vari-

ants associated with CRP concentrations and with CAD risk,

ethical approval was granted to the constituent studies by local

institutional review boards.

For reference, the causal estimate based on the genetic

variants in the CRP gene region is null (odds ratio: 1.00, 95%

conﬁdence interval: 0.90, 1.13 per 1-SD increase in CRP con-

centrations [equal to a 1.05-unit increase in log-transformed

CRP or a 2.86-fold increase]), whereas the “causal” estimate

using an inverse-variance weighted method based on the

genome-wide signiﬁcant variants (a less reliable approach)

is negative (odds ratio: 0.87, 95% conﬁdence interval: 0.79,

0.96 per 1-SD increase). Software code for performing the

proposed sensitivity analyses is provided in eAppendix A.1

and A.2 (http://links.lww.com/EDE/B114).

Genetic

variant

Risk factor

Confounders

Outcome

iii.

ii.

FIGURE 1. Diagram of instrumental variable assumptions for

Mendelian randomization. The three assumptions (i, ii, iii) are

illustrated by the presence of an arrow, indicating the effect of

one variable on the other (assumption i), or by a dashed line

with a cross, indicating that there is no direct effect of one vari-

able on the other (assumptions ii and iii).

Burgess et al. Epidemiology • Volume 28, Number 1, January 2017

ASSESSING THE INSTRUMENTAL VARIABLE

ASSUMPTIONS

The ﬁrst set of approaches we consider are those to

assess whether the instrumental variable assumptions are

likely to be satisﬁed or not for a set of genetic variants. We

consider in turn the assessment of the association with mea-

sured confounders, the exploitation of a natural experiment in

the form of a gene–environment interaction, examination of a

scatter plot combined with a heterogeneity test, and of a fun-

nel plot combined with a test for directional pleiotropy.

Use of Measured Covariates

The assumption that an instrumental variable is not

associated with confounders of the risk factor–outcome

association is not fully testable, as not all confounders will

be known or measured. However, the associations of genetic

variants with measured covariates can be assessed. Lack

of association of the instrumental variable with measured

covariates does not imply lack of association with all con-

founders; however, an association with a measured covariate

should be investigated carefully for a potential pleiotropic

effect of the genetic variant. Figure 2, adapted from Wens-

ley et al.,

shows the associations of the four variants in

the CRP gene region with a range of potential confound-

ers. Associations are no stronger than would be expected by

chance alone.

If there are covariates that by biological considerations

should be downstream consequences of the risk factor, then

the associations of genetic variants with these covariates can

be assessed as positive controls to give conﬁdence that the

function of the genetic variants matches the known conse-

quences of the risk factor. For instance, inhibition of inter-

leukin-1 by the drug anakinra has been observed to lead to

decreased levels of C-reactive protein and interleukin-6 in

clinical trials. If genetic variants associated with interleukin-1

are also associated with both these covariates, this makes it

more plausible that the variants are good proxies of interven-

tion on interleukin-1 levels.

A beneﬁt of the use of multiple genetic variants is the

possibility to differentiate between pleiotropy and mediation,

two mechanisms by which a genetic variant may be associated

with a measured covariate (Figure 3). If a genetic variant is

associated with a covariate independently of the risk factor

(pleiotropy, or “horizontal pleiotropy”), then the instrumental

variable assumptions are likely to be violated and the genetic

variant should be excluded from an instrumental variable

analysis, as the association with the covariate is likely to open

a causal pathway from the variant to the outcome not via the

risk factor. However, if the genetic variant is associated with a

covariate due to its association with the risk factor of interest

(mediation or “vertical pleiotropy”), and there is no alterna-

tive causal pathway from the variant to the outcome except

for that via the risk factor, then the genetic variant is a valid

instrumental variable.

For instance, if increasing body mass index leads to

increased blood pressure, then genetic variants that are instru-

mental variables for body mass index should also be associ-

ated with blood pressure. If multiple genetic variants that are

candidate instrumental variables for body mass index are all

concordantly associated with blood pressure, then it is plau-

sible that the associations are due to mediation, not pleiot-

ropy. In contrast, if only one or two variants are associated

with blood pressure, then this is likely to be a manifestation of

pleiotropy. Pleiotropy and mediation are not mutually exclu-

sive (both could occur for the same covariate); however, this

approach may give an insight into whether the association

relates to a single genetic variant or to variants associated with

the risk factor more widely.

In some cases, valid causal inference may still be pos-

sible even if a genetic variant has a pleiotropic association

with a measured covariate; for instance, by adjusting for the

covariate in the analysis model. However, if the Mendelian

randomization investigation is performed using summarized

data, then the investigator is unlikely to be able to adjust for

covariates. An alternative approach with summarized data is

a multivariable Mendelian randomization analysis, in which

genetic associations with the outcome are regressed on the

genetic associations with the risk factor and covariates in a

multivariable weighted regression model.

A practical difﬁculty of determining which variants to

include in a Mendelian randomization analysis using mea-

sured covariates, aside from that of distinguishing between

pleiotropy and mediation, is that of multiple testing. If there

are large numbers of genetic variants and several measured

covariates, then it is difﬁcult to set a statistical signiﬁcance

threshold for rejecting a genetic variant as pleiotropic to

balance between the desire to exclude invalid instrumental

variables and the need to acknowledge the multiple tests. A

sensible compromise is to consider multiple thresholds, for

example, a conservative threshold to maximize robustness (a

ﬁxed threshold such as P < 0.01), and a liberal threshold to

maximize power (such as a Bonferroni-corrected threshold

taking into account the number of comparisons made).

similar approach was previously taken to assess the causal role

of lipid fractions on CAD risk.

If no causal effect is detected

even in a liberal analysis, then the plausibility of a null causal

ﬁnding increases.

Gene–Environment Interaction

For some applications of Mendelian randomization, a

further natural experiment may be available if the postulated

causal effect is present in one stratum of the population, but

absent in another.

For example, the association of alcohol-

related genetic variants with esophageal cancer risk is present

in those who drink alcohol, but absent in abstainers.

A gene–

environment interaction provides evidence that a genetic asso-

ciation with the outcome in the population is a result of the

risk factor; if it were a result of pleiotropy, then it would be

Epidemiology • Volume 28, Number 1, January 2017 Sensitivity Analyses for Mendelian Randomization

likely to be present in both strata of the population. Gene–

environment interactions may be difﬁcult to ﬁnd, but can pro-

vide convincing evidence of a causal effect.

One potential complication of such an analysis is the

possibility of collider bias;

by stratifying on the risk fac-

tor, associations between the genetic variants and the out-

come may be distorted in the strata (in the examples above,

in alcohol consumers/abstainers). To our knowledge, no sys-

tematic investigation has been conducted as to the degree

that collider bias may lead to inappropriate causal infer-

ences in a Mendelian randomization setting, although sen-

sitivity analyses to assess the potential bias in the context of

instrumental variable analysis with a single instrument are

available.

35,36

−0.10.0 0.10.2

0.13 ( 0.11 , 0.14 )

0.00 ( 0.00 , 0.01 )

0.01 ( 0.00 , 0.02 )

0.00 ( −0.01 , 0.01 )

0.00 ( 0.00 , 0.01 )

−0.01 ( −0.02 , 0.00 )

0.00 ( −0.01 , 0.01 )

−0.01 ( −0.02 , 0.00 )

0.00 ( −0.02 , 0.02 )

−0.01 ( −0.03 , 0.02 )

0.00 ( −0.01 , 0.01 )

−0.01 ( −0.03 , 0.02 )

0.01 ( 0.00 , 0.02 )

−0.02 ( −0.06 , 0.01 )

0.01 ( 0.00 , 0.02 )

0.00 ( −0.01 , 0.02 )

0.02 ( −0.01 , 0.04 )

Per allele effect

rs1130864

−0.2 0.00.1 0.20.3

0.21 ( 0.17 , 0.24 )

0.00 ( −0.02 , 0.02 )

0.01 ( −0.01 , 0.03 )

0.02 ( 0.00 , 0.05 )

0.01 ( −0.02 , 0.03 )

0.00 ( −0.03 , 0.02 )

0.00 ( −0.02 , 0.03 )

−0.01 ( −0.04 , 0.02 )

0.01 ( −0.01 , 0.03 )

0.00 ( −0.02 , 0.03 )

0.01 ( −0.03 , 0.05 )

0.01 ( −0.02 , 0.05 )

−0.10 ( −0.44 , 0.24 )

−0.02 ( −0.08 , 0.03 )

0.01 ( −0.04 , 0.06 )

0.01 ( −0.01 , 0.04 )

−0.08 ( −0.25 , 0.09 )

−0.01 ( −0.05 , 0.02 )

−0.15 ( −0.35 , 0.05 )

0.02 ( −0.01 , 0.04 )

0.01 ( −0.02 , 0.04 )

0.00 ( −0.02 , 0.02 )

Per allele effect

rs3093077

−0.1 0.0 0.1 0.2

0.17 ( 0.15 , 0.19 )

0.00 ( −0.01 , 0.00 )

0.00 ( −0.01 , 0.01 )

0.01 ( 0.00 , 0.02 )

0.00 ( −0.01 , 0.01 )

0.00 ( −0.01 , 0.00 )

0.00 ( 0.00 , 0.01 )

0.00 ( −0.01 , 0.01 )

0.00 ( −0.01 , 0.00 )

0.01 ( 0.00 , 0.02 )

0.00 ( −0.01 , 0.01 )

0.01 ( −0.02 , 0.03 )

0.00 ( −0.02 , 0.02 )

−0.01 ( −0.02 , 0.00 )

−0.01 ( −0.03 , 0.01 )

0.01 ( 0.00 , 0.02 )

−0.02 ( −0.06 , 0.01 )

0.01 ( −0.01 , 0.02 )

0.01 ( 0.00 , 0.02 )

Variable

log C−reactive protein (mg/l)

Age at survey (yrs)

Body mass index (kg/m²)

Systolic BP (mmHg)

Diastolic BP (mmHg)

To tal cholesterol (mmol/l)

Non−HDL−C (mmol/l)

HDL−C (mmol/l)

log Tr iglycerides (mmol/l)

LDL−C (mmol/l)

Apo A1 (g/l)

Apo B (g/l)

Albumin (g/l)

Lipoprotein(a) (mg/dl)

log Interleukin−6 (mg/l)

Fibrinogen (µmol/l)

log Leukocyte count (× 10^9/l)

Glucose (mmol/l)

Smoking amount (pack yrs)

Weight (kg)

Height (cm)

Waist/Hip ratio

Per allele effect

rs1205

−0.2 0.0 0.1 0.2 0.3

0.26 ( 0.23 , 0.30 )

−0.02 ( −0.04 , 0.01 )

0.00 ( −0.03 , 0.03 )

0.00 ( −0.02 , 0.03 )

0.01 ( −0.02 , 0.05 )

0.00 ( −0.04 , 0.04 )

0.02 ( 0.00 , 0.05 )

−0.01 ( −0.06 , 0.03 )

0.01 ( −0.03 , 0.05 )

0.00 ( −0.04 , 0.04 )

0.01 ( −0.03 , 0.05 )

0.00 ( −0.04 , 0.05 )

−0.05 ( −0.11 , 0.01 )

0.00 ( −0.05 , 0.05 )

0.00 ( −0.04 , 0.04 )

0.00 ( −0.05 , 0.06 )

0.00 ( −0.03 , 0.04 )

−0.04 ( −0.11 , 0.04 )

−0.02 ( −0.05 , 0.02 )

0.00 ( −0.03 , 0.03 )

0.00 ( −0.03 , 0.04 )

Variable

log C−reactive protein (mg/l)

Age at survey (yrs)

Body mass index (kg/m²)

Systolic BP (mmHg)

Diastolic BP (mmHg)

To tal cholesterol (mmol/l)

Non−HDL−C (mmol/l)

HDL−C (mmol/l)

log Tr iglycerides (mmol/l)

LDL−C (mmol/l)

Apo A1 (g/l)

Apo B (g/l)

Albumin (g/l)

Lipoprotein(a) (mg/dl)

log Interleukin−6 (mg/l)

Fibrinogen (µmol/l)

log Leukocyte count (× 10^9/l)

Glucose (mmol/l)

Smoking amount (pack yrs)

Weight (kg)

Height (cm)

Waist/Hip ratio

Per allele effect

rs1800947

FIGURE 2. Associations (estimates in standard deviation units and 95% condence intervals) of four genetic variants in the CRP

gene region with a range of covariates per C-reactive protein increasing allele. Adapted from CRP CHD Genetics Collaboration.

HTML Viewer

Frequently Asked Questions (8)

Q1. What are the contributions in this paper?

In this article, the authors discuss a range of sensitivity analyses that will either support or question the validity of causal inference from a Mendelian randomization analysis with multiple genetic variants. 7 A diagram corresponding to these assumptions is presented in Figure 1. the authors further assume that all valid instrumental variables identify the same causal parameter ; they return to this assumption in the discussion. In this article, the authors assume that the effects of ( i ) the instrumental variables on the risk factor, ( ii ) the instrumental variables on the outcome, ( iii ) the risk factor on the outcome are linear without effect modification ; and ( iv ) the association of the genetic variant with the risk factor is homogeneous in the population. This is an open access article distributed under the creative commons Attribution license 4. 0 ( ccBY ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The authors report no conflicts of interest. Supplemental digital content is available through direct URl citations in the HtMl and PDF versions of this article ( www. epidem. com ). Editor ’ s Note: A commentary on this article appears on p. 43. The authors focus on sensitivity analyses of greatest practical relevance for ensuring robust causal inferences, and those that can be undertaken using summarized data.

Q2. What is the practical difficulty of determining which variants to include in a mendelian?

A practical difficulty of determining which variants to include in a Mendelian randomization analysis using measured covariates, aside from that of distinguishing between pleiotropy and mediation, is that of multiple testing.

Q3. what is the role of anakinra in reducing interleukin-1 levels?

For instance, inhibition of interleukin-1 by the drug anakinra has been observed to lead to decreased levels of c-reactive protein and interleukin-6 in clinical trials.

Q4. what are the main approaches to assess the association of genetic variants with the risk factor?

if there are covariates that by biological considerations should be downstream consequences of the risk factor, then the associations of genetic variants with these covariates can be assessed as positive controls to give confidence that the function of the genetic variants matches the known consequences of the risk factor.

Q5. What methods allow more general departures from the instrumental variable assumptions for the invalid instruments?

the penalization and median-based methods allow more general departures from the instrumental variable assumptions for the invalid instruments.

Q6. what are the main approaches to assess the association of genetic variants with a measured covari?

23For instance, if increasing body mass index leads to increased blood pressure, then genetic variants that are instrumental variables for body mass index should also be associated with blood pressure.

Q7. what is the pleiotropic effect of the egger regression method?

under an assumption that is weaker than standard instrumental variable assumptions, the slope coefficient from the egger regression method provides an estimate of the causal effect that is consistent asymptotically even if all the genetic variants have pleiotropic effects on the outcome.

Q8. what is the l1 penalization method for cAD?

this approach has been applied for investigating the causal effect of lipid fractions on cAD risk.50 More formal penalizationmethods have been proposed using l1-penalization to downweight the contribution of outlying variants to the analysis in a continuous way.

Sensitivity Analyses for Robust Causal Inference from Mendelian Randomization Analyses with Multiple Genetic Variants

Summary (5 min read)

FIGURE 1.

ASSESSING THE INSTRUMENTAL VARIABLE ASSUMPTIONS

Use of Measured Covariates

Gene-Environment Interaction

Scatter Plot and Test for Heterogeneity

FIGURE 3.

ROBUST ANALYSIS METHODS

Penalization Methods

Median-based Methods

Egger Regression

Example: C-reactive Protein and Coronary Artery Disease Risk

DISCUSSION

Comparison with Previous Literature

Summarized Data and Two-sample Mendelian Randomization

Genetic Variants with Different Functional Effects

Pleiotropy and Other Violations of the Instrumental Variable Assumptions

CONCLUSIONS

Figures (8)

Citations

Cites methods from "Sensitivity Analyses for Robust Cau..."

References

Related Papers (5)

Frequently Asked Questions (8)

Q1. What are the contributions in this paper?

Q2. What is the practical difficulty of determining which variants to include in a mendelian?

Q3. what is the role of anakinra in reducing interleukin-1 levels?

Q4. what are the main approaches to assess the association of genetic variants with the risk factor?

Q5. What methods allow more general departures from the instrumental variable assumptions for the invalid instruments?

Q6. what are the main approaches to assess the association of genetic variants with a measured covari?

Q7. what is the pleiotropic effect of the egger regression method?

Q8. what is the l1 penalization method for cAD?