scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Efficient Design for Mendelian Randomization Studies: Subsample and 2-Sample Instrumental Variable Estimators

01 Oct 2013-American Journal of Epidemiology (Oxford University Press)-Vol. 178, Iss: 7, pp 1177-1184
TL;DR: It is shown that obtaining exposure data for a subset of participants is a cost-efficient strategy, often having negligible effects on power in comparison with a traditional complete-data analysis, and maximum power is approximately equal to the power of traditional IV estimators.
Abstract: Mendelian randomization (MR) is a method for estimating the causal relationship between an exposure and an outcome using a genetic factor as an instrumental variable (IV) for the exposure. In the traditional MR setting, data on the IV, exposure, and outcome are available for all participants. However, obtaining complete exposure data may be difficult in some settings, due to high measurement costs or lack of appropriate biospecimens. We used simulated data sets to assess statistical power and bias for MR when exposure data are available for a subset (or an independent set) of participants. We show that obtaining exposure data for a subset of participants is a cost-efficient strategy, often having negligible effects on power in comparison with a traditional complete-data analysis. The size of the subset needed to achieve maximum power depends on IV strength, and maximum power is approximately equal to the power of traditional IV estimators. Weak IVs are shown to lead to bias towards the null when the subsample is small and towards the confounded association when the subset is relatively large. Various approaches for confidence interval calculation are considered. These results have important implications for reducing the costs and increasing the feasibility of MR studies.
Citations
More filters
Journal ArticleDOI
TL;DR: An adaption of Egger regression can detect some violations of the standard instrumental variable assumptions, and provide an effect estimate which is not subject to these violations, and provides a sensitivity analysis for the robustness of the findings from a Mendelian randomization investigation.
Abstract: Background: The number of Mendelian randomization analyses including large numbers of genetic variants is rapidly increasing. This is due to the proliferation of genome-wide association studies, and the desire to obtain more precise estimates of causal effects. However, some genetic variants may not be valid instrumental variables, in particular due to them having more than one proximal phenotypic correlate (pleiotropy). Methods: We view Mendelian randomization with multiple instruments as a meta-analysis, and show that bias caused by pleiotropy can be regarded as analogous to small study bias. Causal estimates using each instrument can be displayed visually by a funnel plot to assess potential asymmetry. Egger regression, a tool to detect small study bias in meta-analysis, can be adapted to test for bias from pleiotropy, and the slope coefficient from Egger regression provides an estimate of the causal effect. Under the assumption that the association of each genetic variant with the exposure is independent of the pleiotropic effect of the variant (not via the exposure), Egger’s test gives a valid test of the null causal hypothesis and a consistent causal effect estimate even when all the genetic variants are invalid instrumental variables. Results: We illustrate the use of this approach by re-analysing two published Mendelian randomization studies of the causal effect of height on lung function, and the causal effect of blood pressure on coronary artery disease risk. The conservative nature of this approach is illustrated with these examples. Conclusions: An adaption of Egger regression (which we call MR-Egger) can detect some violations of the standard instrumental variable assumptions, and provide an effect estimate which is not subject to these violations. The approach provides a sensitivity analysis for the robustness of the findings from a Mendelian randomization investigation.

3,392 citations


Cites background from "Efficient Design for Mendelian Rand..."

  • ...This is in line with bias from weak instruments, which in a two-sample setting acts towards the null.(39) As before, the IVW estimate is considerably more precise, and consequently has greater power to reject the causal null hypothesis....

    [...]

  • ...In a two-sample setting, weak instrument bias is in the direction of the null, and hence is a less serious problem, as it will not lead to false-positive findings.(37,39) One solution proposed for weak instrument bias is the use of allele scores, whereby the number of exposure-increasing alleles across multiple genetic variants is summed across individuals....

    [...]

Journal ArticleDOI
TL;DR: A novel weighted median estimator for combining data on multiple genetic variants into a single causal estimate is presented, which is consistent even when up to 50% of the information comes from invalid instrumental variables.
Abstract: Developments in genome-wide association studies and the increasing availability of summary genetic association data have made application of Mendelian randomization relatively straightforward. However, obtaining reliable results from a Mendelian randomization investigation remains problematic, as the conventional inverse-variance weighted method only gives consistent estimates if all of the genetic variants in the analysis are valid instrumental variables. We present a novel weighted median estimator for combining data on multiple genetic variants into a single causal estimate. This estimator is consistent even when up to 50% of the information comes from invalid instrumental variables. In a simulation analysis, it is shown to have better finite-sample Type 1 error rates than the inverse-variance weighted method, and is complementary to the recently proposed MR-Egger (Mendelian randomization-Egger) regression method. In analyses of the causal effects of low-density lipoprotein cholesterol and high-density lipoprotein cholesterol on coronary artery disease risk, the inverse-variance weighted method suggests a causal effect of both lipid fractions, whereas the weighted median and MR-Egger regression methods suggest a null effect of high-density lipoprotein cholesterol that corresponds with the experimental evidence. Both median-based and MR-Egger regression methods should be considered as sensitivity analyses for Mendelian randomization investigations with multiple genetic variants.

2,959 citations


Cites background from "Efficient Design for Mendelian Rand..."

  • ...…were obtained, then the resulting analysis suffers from bias and inflated type 1 error rates when the included variants are “weak” (i.e., they do not explain a substantial proportion of variation in the exposure in the dataset under analysis) (Burgess et al., 2011; Pierce and Burgess, 2013)....

    [...]

  • ...If exposure and outcome data are collected on different sets of individuals (known as two-sample Mendelian randomization (Pierce and Burgess, 2013)), then these error terms are independent....

    [...]

  • ..., they do not explain a substantial proportion of variation in the exposure in the dataset under analysis) (Burgess et al., 2011; Pierce and Burgess, 2013)....

    [...]

Journal ArticleDOI
30 May 2018-eLife
TL;DR: MR-Base is a platform that integrates a curated database of complete GWAS results (no restrictions according to statistical significance) with an application programming interface, web app and R packages that automate 2SMR, and includes several sensitivity analyses for assessing the impact of horizontal pleiotropy and other violations of assumptions.
Abstract: Results from genome-wide association studies (GWAS) can be used to infer causal relationships between phenotypes, using a strategy known as 2-sample Mendelian randomization (2SMR) and bypassing the need for individual-level data. However, 2SMR methods are evolving rapidly and GWAS results are often insufficiently curated, undermining efficient implementation of the approach. We therefore developed MR-Base ( http://www.mrbase.org ): a platform that integrates a curated database of complete GWAS results (no restrictions according to statistical significance) with an application programming interface, web app and R packages that automate 2SMR. The software includes several sensitivity analyses for assessing the impact of horizontal pleiotropy and other violations of assumptions. The database currently comprises 11 billion single nucleotide polymorphism-trait associations from 1673 GWAS and is updated on a regular basis. Integrating data with software ensures more rigorous application of hypothesis-driven analyses and allows millions of potential causal relationships to be efficiently evaluated in phenome-wide association studies.

2,520 citations


Cites background or methods from "Efficient Design for Mendelian Rand..."

  • ...These steps are supported by the database of GWAS results and R packages (‘TwoSampleMR’ and ‘MRInstruments’) curated by MR-Base and the following R packages curated by other researchers: ’MendelianRandomization’ (Yavorska and Burgess, 2017), ’RadialMR’ (Bowden et al., 2017b), ’MR-PRESSO’ (Verbanck et al., 2018) and ’mr.raps’ (Zhao et al., 2018)....

    [...]

  • ...Crucially, MR can be performed using results from GWAS, in a strategy known as 2-sample MR ( 2SMR) (Pierce and Burgess, 2013)....

    [...]

  • ...F statistic much greater than 10 for the instrument-exposure association) (Pierce and Burgess, 2013)....

    [...]

  • ...See Pierce and Burgess (Pierce and Burgess, 2013) for further details on the relationship between instrument strength and bias from sample overlap....

    [...]

  • ...The second is the development of statistical methods for causal inference that exploit the principles of Mendelian randomization (MR) using GWAS summary data (Davey Smith and Ebrahim, 2003; Davey Smith and Hemani, 2014; Zhu et al., 2016; Pierce and Burgess, 2013)....

    [...]

Journal ArticleDOI
TL;DR: It is concluded that Mendelian randomization investigations using summarized data from uncorrelated variants are similarly efficient to those using individual‐level data, although the necessary assumptions cannot be so fully assessed.
Abstract: Genome-wide association studies, which typically report regression coefficients summarizing the associations of many genetic variants with various traits, are potentially a powerful source of data for Mendelian randomization investigations. We demonstrate how such coefficients from multiple variants can be combined in a Mendelian randomization analysis to estimate the causal effect of a risk factor on an outcome. The bias and efficiency of estimates based on summarized data are compared to those based on individual-level data in simulation studies. We investigate the impact of gene–gene interactions, linkage disequilibrium, and ‘weak instruments’ on these estimates. Both an inverse-variance weighted average of variant-specific associations and a likelihood-based approach for summarized data give similar estimates and precision to the two-stage least squares method for individual-level data, even when there are gene–gene interactions. However, these summarized data methods overstate precision when variants are in linkage disequilibrium. If the P-value in a linear regression of the risk factor for each variant is less than , then weak instrument bias will be small. We use these methods to estimate the causal association of low-density lipoprotein cholesterol (LDL-C) on coronary artery disease using published data on five genetic variants. A 30% reduction in LDL-C is estimated to reduce coronary artery disease risk by 67% (95% CI: 54% to 76%). We conclude that Mendelian randomization investigations using summarized data from uncorrelated variants are similarly efficient to those using individual-level data, although the necessary assumptions cannot be so fully assessed.

2,003 citations


Cites methods from "Efficient Design for Mendelian Rand..."

  • ...This is known as a two-sample IV analysis [Pierce and Burgess, 2013]....

    [...]

Journal ArticleDOI
TL;DR: Developments of MR, including two-sample MR, bidirectional MR, network MR, two-step MR, factorial MR and multiphenotype MR, are outlined in this review.
Abstract: Observational epidemiological studies are prone to confounding, reverse causation and various biases and have generated findings that have proved to be unreliable indicators of the causal effects of modifiable exposures on disease outcomes. Mendelian randomization (MR) is a method that utilizes genetic variants that are robustly associated with such modifiable exposures to generate more reliable evidence regarding which interventions should produce health benefits. The approach is being widely applied, and various ways to strengthen inference given the known potential limitations of MR are now available. Developments of MR, including two-sample MR, bidirectional MR, network MR, two-step MR, factorial MR and multiphenotype MR, are outlined in this review. The integration of genetic information into population-based epidemiological studies presents translational opportunities, which capitalize on the investment in genomic discovery research.

1,686 citations


Cites methods from "Efficient Design for Mendelian Rand..."

  • ...Pierce, B.L. and Burgess, S. (2013) Efficient design for Mendelian randomization studies: subsample and 2-sample instrumental variable estimators....

    [...]

  • ...Few software examples exist for the specific types of MR that have been described in this review, but STATA routines for performing subsample and two-sample IV estimation are provided by Pierce and Burgess (31). significance thresholds are also included, the rationale being that these will include false-negatives owing to small effect size (56)....

    [...]

  • ...Few software examples exist for the specific types of MR that have been described in this review, but STATA routines for performing subsample and two-sample IV estimation are provided by Pierce and Burgess (31)....

    [...]

  • ...Wensley, F., Gao, P., Burgess, S., Kaptoge, S., Di Angelantonio, E., Shah, T., Engert, J.C., Clarke, R., Davey Smith, G., Nordestgaard, B.G. et al. (2011) Association between C reactive protein and coronary heart disease: mendelian randomisation analysis based on individual participant data....

    [...]

  • ...Burgess, S. and Thompson, S.G. (2013) Use of allele scores as instrumental variables for Mendelian randomization....

    [...]

References
More filters
01 Jan 2001

7,653 citations

ReportDOI
TL;DR: In this paper, the authors developed asymptotic distribution theory for instrumental variable regression when the partial correlation between the instruments and a single included endogenous variable is weak, here modeled as local to zero.
Abstract: This paper develops asymptotic distribution theory for instrumental variable regression when the partial correlation between the instruments and a single included endogenous variable is weak, here modeled as local to zero. Asymptotic representations are provided for various instrumental variable statistics, including the two-stage least squares (TSLS) and limited information maximum- likelihood (LIML) estimators and their t-statistics. The asymptotic distributions are found to provide good approximations to sampling distributions with just 20 observations per instrument. Even in large samples, TSLS can be badly biased, but LIML is, in many cases, approximately median unbiased. The theory suggests concrete quantitative guidelines for applied work. These guidelines help to interpret Angrist and Krueger's (1991) estimates of the returns to education: whereas TSLS estimates with many instruments approach the OLS estimate of 6%, the more reliable LIML and TSLS estimates with fewer instruments fall between 8% and 10%, with a typical confidence interval of (6%, 14%).

5,249 citations

Journal ArticleDOI
TL;DR: In this article, the authors discuss instrumental variables (IV) estimation in the broader con- text of the generalized method of moments (GMM), and describe an extended IV estimation routine that provides GMM estimates as well as additional diagnostic tests.
Abstract: We discuss instrumental variables (IV) estimation in the broader con- text of the generalized method of moments (GMM), and describe an extended IV estimation routine that provides GMM estimates as well as additional diagnostic tests. Stand{alone test procedures for heteroskedasticity, overidentication, and endogeneity in the IV context are also described.

2,444 citations


"Efficient Design for Mendelian Rand..." refers methods in this paper

  • ...We did not use the traditional 2-stage least-squares procedure (11), because this method discards persons with missing data on X, whereas the Wald method can include such persons in the reducedform regression....

    [...]

Journal ArticleDOI
TL;DR: The use of germline genetic variants that proxy for environmentally modifiable exposures as instruments for these exposures is one form of IV analysis that can be implemented within observational epidemiological studies and can be considered as analogous to randomized controlled trials.
Abstract: Observational epidemiological studies suffer from many potential biases, from confounding and from reverse causation, and this limits their ability to robustly identify causal associations. Several high-profile situations exist in which randomized controlled trials of precisely the same intervention that has been examined in observational studies have produced markedly different findings. In other observational sciences, the use of instrumental variable (IV) approaches has been one approach to strengthening causal inferences in non-experimental situations. The use of germline genetic variants that proxy for environmentally modifiable exposures as instruments for these exposures is one form of IV analysis that can be implemented within observational epidemiological studies. The method has been referred to as 'Mendelian randomization', and can be considered as analogous to randomized controlled trials. This paper outlines Mendelian randomization, draws parallels with IV methods, provides examples of implementation of the approach and discusses limitations of the approach and some methods for dealing with these.

2,364 citations

Journal ArticleDOI
TL;DR: In this article, the fiducial distributions of a simple equation (i.e., a ratio), and the roots of a quadratic equation with variable coefficients with respect to the region of the (aC, t2) plane lying above the curve are investigated.
Abstract: THE object of this paper is to propose for discussion the following topic: b1, b2, ... are unbiased estimates of P1, ,2, ... , distributed normally with variances and covariances jointly estimated, with f degrees of freedom and independently of bl, b2, . .. , as vll, v12, V22, . .. , and the functions F,(cc) do not involve the parameters P3, What can we say about the roots of the equation in F(r, M) = rlFl (a) + P2 F2 (a) + . . . = 0? Numerical examples are discussed in detail to illustrate the problems of determining the fiducial distributions of (i) the root of a simple equation (i.e., a ratio), (ii) the roots of a quadratic equation with variable coefficients. The solutions proposed are based on a consideration of the region of the (aC, t2) plane lying above the curve

949 citations