Sensitivity Analyses for Robust Causal Inference from Mendelian Randomization Analyses with Multiple Genetic Variants
Summary (5 min read)
FIGURE 1.
- Diagram of instrumental variable assumptions for Mendelian randomization.
- The three assumptions (i, ii, iii) are illustrated by the presence of an arrow, indicating the effect of one variable on the other (assumption i), or by a dashed line with a cross, indicating that there is no direct effect of one variable on the other (assumptions ii and iii).
ASSESSING THE INSTRUMENTAL VARIABLE ASSUMPTIONS
- The first set of approaches the authors consider are those to assess whether the instrumental variable assumptions are likely to be satisfied or not for a set of genetic variants.
- The authors consider in turn the assessment of the association with measured confounders, the exploitation of a natural experiment in the form of a gene-environment interaction, examination of a scatter plot combined with a heterogeneity test, and of a funnel plot combined with a test for directional pleiotropy.
Use of Measured Covariates
- The assumption that an instrumental variable is not associated with confounders of the risk factor-outcome association is not fully testable, as not all confounders will be known or measured.
- Associations are no stronger than would be expected by chance alone.
- Inhibition of interleukin-1 by the drug anakinra has been observed to lead to decreased levels of c-reactive protein and interleukin-6 in clinical trials.
- In some cases, valid causal inference may still be possible even if a genetic variant has a pleiotropic association with a measured covariate; for instance, by adjusting for the covariate in the analysis model.
- An alternative approach with summarized data is a multivariable Mendelian randomization analysis, in which genetic associations with the outcome are regressed on the genetic associations with the risk factor and covariates in a multivariable weighted regression model.
Gene-Environment Interaction
- For some applications of Mendelian randomization, a further natural experiment may be available if the postulated causal effect is present in one stratum of the population, but absent in another.
- 32 For example, the association of alcoholrelated genetic variants with esophageal cancer risk is present in those who drink alcohol, but absent in abstainers.
- One potential complication of such an analysis is the possibility of collider bias; 34 by stratifying on the risk factor, associations between the genetic variants and the outcome may be distorted in the strata (in the examples above, in alcohol consumers/abstainers).
- 35, 36 Associations (estimates in standard deviation units and 95% confidence intervals) of four genetic variants in the CRP gene region with a range of covariates per C-reactive protein increasing allele.
- 16 Copyright © 2016 Wolters Kluwer Health, Inc. Unauthorized reproduction of this article is prohibited.
Scatter Plot and Test for Heterogeneity
- Even if the instrumental variable assumptions are in doubt for some or all of the variants, if several independent genetic variants in different gene regions are concordantly associated with the outcome, then a causal conclusion would seem reasonable.
- Any point that substantially deviates from this line should be investigated for potential pleiotropy.
- A statistical test for heterogeneity can be performed using cochran's Q test on the causal estimates from each 2 are the inverse-variance weights.
- This statistic can be calculated using only summarized data.
FIGURE 3.
- Diagram to illustrate the difference between pleiotropy (left, the association of the genetic variant with the covariate is independent of the risk factor) and mediation (right, the association of the genetic variant with the covariate is mediated entirely via the risk factor).
- Egger regression is a method for detecting small study bias (often interpreted as publication bias) in a meta-analysis of separate studies.
- 45 the method can also be used for detecting directional pleiotropy from separate genetic variants.
- 47 the genetic associations should be orientated so that the associations with the risk factor all have the same sign.
- If there is no intercept term in this regression, the slope parameter is the inverse-variance weighted causal estimate.
ROBUST ANALYSIS METHODS
- The second category of sensitivity analyses is that of robust analysis methods.
- Robust analysis methods allow different (and when the main purpose is to test the causal null hypothesis, weaker) assumptions than standard instrumental variable methods.
- In turn, the authors consider penalization methods, median-based methods, and egger regression.
Penalization Methods
- The authors first consider methods in which the contribution of some genetic variants (e.g., heterogeneous or outlying variants) to the analysis is downweighted (or penalized).
- The simplest way of performing a penalization method is to omit some of the variants from the analysis.
- With a small number of genetic variants, the causal estimates omitting one variant at a time could be considered.
- This sensitivity analysis has been undertaken for the effect of lDl-c on aortic stenosis.
- They require individual-level data and a one-sample setting (genetic variants, risk factor, and outcome measurements are available for the same individuals).
Median-based Methods
- An alternative family of methods that gives consistent estimates when up to half the genetic variants are not valid instrumental variables, but that can be performed using summarized data rather than individual-level data, are medianbased methods.
- The weighted median estimate is consistent under the assumption that genetic variants representing over 50% of the weight in the analysis are valid instruments.
- This is for Mendelian randomization analysis of C-reactive protein on coronary artery disease risk using genetic variants throughout the genome that have been demonstrated as associated with C-reactive protein at a genome-wide level of significance.
- Horizontal lines represent 95% confidence intervals for the instrumental variable estimates.
- Confidence intervals for the median and weighted estimates can be estimated using bootstrapping.
Egger Regression
- The egger regression method was introduced above as a test for directional pleiotropy; this test does not make any assumption about the genetic variants.
- Under an assumption that is weaker than standard instrumental variable assumptions, the slope coefficient from the egger regression method provides an estimate of the causal effect that is consistent asymptotically even if all the genetic variants have pleiotropic effects on the outcome.
- There is some evidence for the general plausibility of the inSiDe assumption, as associations of genetic variants with different phenotypic variables have been shown to be largely uncorrelated in an empirical study.
- The penalization and median-based methods allow more general departures from the instrumental variable assumptions for the invalid instruments.
- Using genetic variants chosen solely on the basis of their association with the risk factor, a broad range of methods affirmed that lDl-c was a causal risk factor for cAD risk.
Example: C-reactive Protein and Coronary Artery Disease Risk
- The inverse-variance weighted method was originally proposed as a fixed-effect meta-analysis of the causal estimates from each of the genetic variants.
- The authors consider fixed-effect and multiplicative random-effects models for both the inversevariance weighted and egger regression methods.
- 56 Also, the authors consider simple (i.e., unweighted) median and weighted median estimates.
- The corresponding randomeffects analyses imply that there is no convincing evidence for a causal effect.
DISCUSSION
- When multiple genetic variants from different gene regions are used in a Mendelian randomization analysis, it is highly implausible that all the genetic variants satisfy the instrumental variable assumptions.
- This does not preclude a causal conclusion; however, it means that a simple instrumental variable analysis alone should not be relied on to give a causal conclusion.
- Inappropriate and naive application of standard Mendelian randomization methods may lead to exactly the same problems of unmeasured confounding that the technique was designed to avoid.
- The authors have discussed a range of sensitivity analyses that can be used to question the plausibility of a Mendelian randomization analysis using multiple variants, focusing on those analyses that are judged to be most useful to an applied analyst and those that can be performed using summarized data.
- Not every sensitivity analysis may be appropriate for each case, but some effort should be made to investigate whether a causal finding is robust to violations of the instrumental variable assumptions.
Comparison with Previous Literature
- From its initial popularization, proponents of Mendelian randomization have been candid about the stringent and untestable assumptions required in Mendelian randomization.
- 3, 14 However, applied investigations have not always reflected this need for caution.
- In comparison with previous attempts to offer robust approaches for causal inference in Mendelian randomization, the authors have here repeated some of the guidance of Glymour et al., 32 specifically relating to the search for gene-environment interactions and to testing for heterogeneity between the estimates from different variants.
- Substantial attenuation of the association on adjustment for the risk factor is expected if the genetic variant is a valid instrumental variable; however, such attenuation may not occur in practice, for example, due to measurement error in the exposure 58 -conversely, some attenuation may occur for an invalid instrumental variable.
- Violations of the assumptions of homogeneity and/or linearity of the causal effect would also lead to difficulties in interpreting the causal estimate, although they are unlikely to lead to inappropriate causal inferences or inflated type 1 error rates under the null.
Summarized Data and Two-sample Mendelian Randomization
- All of the sensitivity analyses discussed in this article can equally be performed b.
- Odds ratio for coronary artery disease per 1-SD (1.05 unit) increase in log-transformed c-reactive protein concentration (equivalent to a 2.86-fold increase in c-reactive protein concentration).
- A further concern with summarized data is the use of two-sample analyses, in which data on the gene-risk factor and gene-outcome associations are taken from nonoverlapping datasets.
- This is not to discourage the use of summarized data or two-sample Mendelian randomization analyses, but to acknowledge that the bar for evidential quality is even higher in this case.
Genetic Variants with Different Functional Effects
- The authors have assumed that there is a single causal effect of the risk factor on the outcome, and interpreted deviation from this (i.e., heterogeneity of causal effect estimates) as evidence that the instrumental variable assumptions are violated for some of the genetic variants.
- In reality, if genetic variants have different functional effects on the risk factor, then different magnitudes of causal effect may be expected.
- Genetic variants associated with body mass index may have different biological mechanisms giving rise to the association, and may affect the outcome to different extents.
- Heterogeneity between causal estimates based on sets of genetic variants grouped according to their biological function may help reveal which mechanisms are causal.
- The causal estimates presented in this article still provide a valid test of the causal null hypothesis, but do not have an interpretation as estimates of a causal parameter.
Pleiotropy and Other Violations of the Instrumental Variable Assumptions
- The authors have discussed violations of the instrumental variable assumptions primarily using the language of pleiotropy.
- In particular, violations of the exclusion restriction assumption (i.e., no effect of the genetic variant on the outcome except for that via the risk factor) can be expressed as pleiotropic effects.
- 63 while this adjustment has proved successful in some cases, it is not guaranteed to eliminate population stratification.
- 32 classical (nondifferential, zero mean) measurement error in the risk factor does not lead to bias in instrumental variable estimates.
- If there are multiple versions of the risk factor, then this would lead to difficulties in interpreting the causal findings.
CONCLUSIONS
- The increasing size and coverage of genome-wide association studies and the increasing availability of summarized data on genetic associations are making the application of Mendelian randomization simpler.
- The methods for sensitivity analysis described in this article will help to judge whether a causal conclusion from a Mendelian randomization analysis is reasonable or not.
- Aside from cases in which the selection of the genetic variants and their justification as instrumental variables is motivated by strong biological understanding, a Mendelian randomization analysis in which no assessment of the robustness of the findings has been made should be viewed as speculative.
Did you find this useful? Give us your feedback
Citations
3,154 citations
2,362 citations
1,066 citations
987 citations
911 citations
Cites methods from "Sensitivity Analyses for Robust Cau..."
...Several publications have used some or all of these methods, and the use of all three methods is recommended when there are multiple genetic variants to assess robustness of any causal finding to different sets of assumptions.(13) Additionally, several variations on these methods have been proposed, such as the use of robust regression instead of standard linear regression in the IVW or MR-Egger methods, or the penalization of weights from genetic variants with heterogeneous causal estimates....
[...]
References
45,105 citations
37,989 citations
9,387 citations
4,518 citations
3,646 citations
Related Papers (5)
Frequently Asked Questions (8)
Q2. What is the practical difficulty of determining which variants to include in a mendelian?
A practical difficulty of determining which variants to include in a Mendelian randomization analysis using measured covariates, aside from that of distinguishing between pleiotropy and mediation, is that of multiple testing.
Q3. what is the role of anakinra in reducing interleukin-1 levels?
For instance, inhibition of interleukin-1 by the drug anakinra has been observed to lead to decreased levels of c-reactive protein and interleukin-6 in clinical trials.
Q4. what are the main approaches to assess the association of genetic variants with the risk factor?
if there are covariates that by biological considerations should be downstream consequences of the risk factor, then the associations of genetic variants with these covariates can be assessed as positive controls to give confidence that the function of the genetic variants matches the known consequences of the risk factor.
Q5. What methods allow more general departures from the instrumental variable assumptions for the invalid instruments?
the penalization and median-based methods allow more general departures from the instrumental variable assumptions for the invalid instruments.
Q6. what are the main approaches to assess the association of genetic variants with a measured covari?
23For instance, if increasing body mass index leads to increased blood pressure, then genetic variants that are instrumental variables for body mass index should also be associated with blood pressure.
Q7. what is the pleiotropic effect of the egger regression method?
under an assumption that is weaker than standard instrumental variable assumptions, the slope coefficient from the egger regression method provides an estimate of the causal effect that is consistent asymptotically even if all the genetic variants have pleiotropic effects on the outcome.
Q8. what is the l1 penalization method for cAD?
this approach has been applied for investigating the causal effect of lipid fractions on cAD risk.50 More formal penalizationmethods have been proposed using l1-penalization to downweight the contribution of outlying variants to the analysis in a continuous way.