Transcriptome-wide association study in UK Biobank Europeans identifies associations with blood cell traits
Summary (3 min read)
Depression Genes and Networks (DGN).
- The DGN study was designed to collect samples of individuals with and without major depressive disorder, ages 21-60, from a survey research panel broadly representative of the United States population.
- 10 . Genotyping and RNA-sequencing procedures have been described previously.
- 10 For 922 European ancestry participants from the DGN study, the authors obtained both genotype data imputed to the TOPMed Freeze 8 reference panel and RNA-seq data.
- 14, 15 For training gene expression prediction models, the authors included bi-allelic variants that are common and well-imputed (MAF > 0.05, Rsq > 0.8) in both DGN and in the UK Biobank.
- 10 As described previously, quantified gene expression values were normalized using the hidden covariates with prior (HCP) method 16 , correcting for technical and biological factors.
Million Veteran Program (MVP) Europeans
- The MVP is an observational cohort study and mega-biobank in the Department of Veteran Affairs healthcare system which began enrollment in 2011.
- After quality control largely following the guidelines established in Marees et al 2018, 308,778 individuals of European ancestry remained.
- The authors trained gene expression prediction models using an elastic net pipeline following the well-established PrediXcan methodology.
- In order to assess which marginally significant TWAS genes provide novel findings above and beyond the discoveries in GWAS of blood cell traits in Europeans, the authors tested the association between predicted gene expression and phenotype while conditioning on reported blood cell trait GWAS variants.
- The authors conducted two replication analyses in MVP Europeans to follow up on their results from the UKB TWAS: one for the marginal TWAS results and a second restricted to only conditionally significant genes.
TWAS variant-to-gene assignments.
- The authors assigned the distinct GWAS variants from Vuckovic et al. to putative target genes using their TWAS results.
- These 10 traits were chosen based on data availability for eQTLs in relevant cell types including platelets, CD4+, CD8+, CD14+, CD15+, and CD19+ cells.
- The authors compare the TWAS and coloc variant-to-gene assignments to the sets of potentially causal genes identified by Open Targets.
- CC-BY-NC-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
- The authors classified genes as cell type group-specific or shared via Shannon entropy across the five cell type groups.
Marginal TWAS Results.
- Using an elastic net-based pipeline, the authors trained gene expression prediction models using imputed genotypes and whole blood RNA-seq data from 922 European ancestry participants from the DGN cohort 10 .
- The 11,759 associations were grouped into 4,835 trait-specific TWAS loci (see Methods) with the most significant gene at each TWAS locus assigned as the sentinel TWAS gene, resulting in 1,792 unique sentinel genes.
- For the replication analysis, 15 out of the 29 UKB analyzed blood cell traits were available in MVP.
- 9,492 out of the 10,004 (94.8%) gene expression prediction models were comprised of variants that overlapped completely with variants available in MVP.
Conditional Analyses Adjusting for Nearby GWAS Variants.
- The authors then used conditional analysis to determine which of the 11,759 gene-phenotype associations in UKB represent novel findings beyond the recently published GWAS (see Methods for details) 3 .
- These 557 associations represent 395 distinct genes in 463 traitspecific loci; 276 genes were conditionally significant for one trait, and 119 for multiple traits .
- Of the 557 conditionally significant associations discovered from UKB, 301 had both matching phenotypes and predicted gene expression in MVP, and thus were subject for replication.
- These 36 associations reflect TWAS's increased power over single-variant GWAS analyses by aggregating multiple sub-genome-wide significant GWAS variants.
- The association between RFTN2 and white blood cell count replicated in MVP (p = 1.9×10 -8 ) with the same direction of effect.
Novel TWAS Loci
- The authors discovered 10 conditionally significant gene-trait associations that have no previously identified distinct GWAS variants within ±1Mb of the locus for any blood cell trait.
- After conditioning on all distinct platelet-related variants on chromosome 6, including chr6:71326034_G_A, the marginal TWAS association for IRAK1BP1 and mean platelet volume (p = 9.47×10 -12 ) was not attenuated (p = 5.33×10 -14 ), demonstrating that the IRAK1BP1 TWAS signal is distinct from previously reported GWAS variants.
- Figure 4 shows that there is cell-type specific epigenetic evidence that supports their findings.
- 27 Variants in the gene expression prediction model for IRAK1BP1 in high LD with chr6:79617522 overlapped with megakaryocyte ATACseq peaks from BLUEPRINT .
- 29 Integration of their TWAS results with expression and chromatin conformation data in platelet producing megakaryocyte cells reveals novel candidate genes at this genomic locus; it is possible that the variants in IRAK1BP1 aggregated by the TWAS prediction model impact the expression of LCA5 through spatial proximity to the promoter region of the gene.
TWAS Genes Implicated in Novel Phenotype Categories
- The authors TWAS conditional analysis identified 92 conditionally significant associations grouped into 70 loci with no distinct GWAS sentinels for the corresponding phenotype category within 1Mb of the gene.
- These gene-trait associations represent novel TWAS findings; their results support that the previously reported association at the locus is extended to a new class of correlated phenotypes (for example, extension of loci already associated with red blood cell related traits to platelet or white blood cell indices).
- The association with lymphocyte count remained nominally significant (p = 3.03×10 -4 ) and the white blood cell count association was attenuated (p = 0.16).
TWAS fine mapping via Conditional Analysis
- TWAS conditional analysis was also used to fine map TWAS loci in which multiple genes achieved the Bonferroni adjusted significance threshold (see Supplemental Table S2 ).
- The EPO gene was not included in the 95% FINEMAP credible set.
- Yet after the authors condition the TWAS predicted expression on the distinct red blood cell signals at this locus, EPO was the only conditionally significant gene at the locus (p = 2.19×10 -6 ).
- The association between SLC12A9 and hemoglobin was completely attenuated after conditioning (p = 0.71).
TWAS-based assignment of variants to target genes
- To compare co-localization and TWAS approaches of assigning GWAS variants to potential causal genes, the authors considered 10,239 variant-trait associations across 10 hematological traits from Vuckovic et al. (see Methods).
- While both LIME1 and ZGPAT correlations pass the r2 cutoff for the TWAS-based gene assignment (r2 > 0.2), LIME1 predicted expression is much more correlated with rs6062304, and is the most likely target gene at this locus according to the TWAS based approach.
- The authors found that the TWAS-based approach assigned GWAS variants to genes identified by external datasets at a slightly lower rate than the coloc assignments, but identified target genes for more than double the number of variants .
- In comparison, 88% of the coloc assigned genes are supported by OT Any genes and 78% as the OT Max gene.
- The authors then applied their TWAS-based variant-to-gene assignment to all 29 hematological traits considered in their UKB TWAS.
Discussion.
- The authors TWAS of blood cell traits in UKB Europeans demonstrates the utility of TWAS to identify novel loci and to extend known loci to additional phenotype categories, even in well-studied hematological traits for which over 10,000 loci have been reported by previous GWAS studies 2, 3 .
- Often, these conditionally significant TWAS genes are not the most marginally significant genes at their respective TWAS locus, suggesting that marginal TWAS results can be driven by previously discovered GWAS variants.
- CC-BY-NC-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
- 6, 7, 44 Additionally, the TWAS variant-to-gene assignment approach would benefit from larger expression datasets to train cell/tissue type specific gene expression prediction models to assess the correlation between predicted expression and a GWAS variant of interest across several relevant models.
- The authors careful use of conditional analysis, TWAS-based fine mapping, and TWAS-based variant-to-gene assignments in the context of blood cell traits will be broadly useful to the practice of TWAS for other complex traits.
Did you find this useful? Give us your feedback
References
4,658 citations
4,489 citations
2,556 citations
1,511 citations
1,372 citations
Related Papers (5)
Frequently Asked Questions (15)
Q2. What are the contributions mentioned in the paper "Transcriptome-wide association study in uk biobank europeans identifies associations with blood cell traits" ?
The authors performed a transcriptome-wide association study ( TWAS ) of 29 hematological traits in 399,835 UK Biobank ( UKB ) participants of European ancestry using gene expression prediction models trained from whole blood RNA-seq data in 922 individuals. 4. 0 International license available under a ( which was not certified by peer review ) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
Q3. What are the future works in "Transcriptome-wide association study in uk biobank europeans identifies associations with blood cell traits" ?
In addition to previous research, future methodological research and software development should be done to address this challenge. By leveraging cell/tissue type specific gene expression datasets to train gene expression models in the future, the TWAS based approach could be extended to match the common practice of conducting eQTL colocalization or other target gene assignment analyses in trait-relevant cell/tissue types. There are still several future directions for the improvement of biobank-scale TWAS studies. First, increasing sample sizes in tissue specific expression datasets will allow future TWAS studies to train gene expression prediction models in cell/tissue types which are directly relevant to tissues of interest.
Q4. How many variants were used to fit the REGENIE null model?
The authors used an LD (linkage disequilibrium) pruned (plink --indep-pairwise 50 5 0.1) set of 174,957 variants with MAF > 0.01 in the genotype data available for UKB Europeans to fit the REGENIE null model accounting for cryptic relatedness.
Q5. How many of the novel loci are available for replication in MVP?
In addition, the authors replicate 3 out of 6 of the novel TWAS loci available for replication in MVP at a nominal significance threshold, and 1 at a Bonferroni adjusted significance threshold.
Q6. Why is LCA5 not present in the DGN reference panel?
LCA5 is not present in the DGN reference panel, and thus unavailable to fit a prediction model, likely because of low expression in whole blood (median TPM 0.018 in GTEx v8).29 Integration of their TWAS results with expression and chromatin conformation data in platelet producing megakaryocyte cells reveals novel candidate genes at this genomic locus; it is possible that the variants in IRAK1BP1 aggregated by the TWAS prediction model impact the expression of LCA5 through spatial proximity to the promoter region of the gene.
Q7. What is the GWAS association of the CD79B locus?
The CD79B locus demonstrates a robust association with lymphocyte count despite conditioning on previously identified white blood cell, red blood cell and platelet associated variants at the locus.
Q8. How many of the 557 conditionally significant associations were found in MVP?
Of the 557 conditionally significant associations discovered from UKB, 301 had bothmatching phenotypes and predicted gene expression in MVP, and thus were subject forreplication.
Q9. What is the role of LCA5 in platelet trait variability?
While both IRAK1BP1 and LCA5 are expressed in megakaryocyte cells using expression data from BLUEPRINT, the expression level is higher in LCA5, suggesting a potential role for LCA5 in platelet trait variability, despite not being captured by TWAS (Figure 4d).
Q10. What is the LD of the variants with the largest effect size in the TWAS?
The variants with the largest effect sizes in the TWAS prediction model for LIME1 are in high LD with rs6062304, whereas those for ZGPAT are not.
Q11. What is the author's license to display the preprint?
Direct conditional analysis is only possible when using individual level genotype data in the discovery cohort, which presents an advantage of using non-summary statistics based methods to perform TWAS.CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
Q12. How many varianttrait associations were assigned to the UKB TWAS?
The authors successfully assigned 4,261 varianttrait associations to 1,842 distinct potentially causal TWAS genes with an average of 1.45 (SD = 0.81) genes assigned per variant-trait association (see Supplemental Table S3).
Q13. What is the way to assign a GWAS variant to a target gene?
Colocalization based approaches assign the variant to a target gene based on evidence that the GWAS signal is not distinct from an eQTL signal for a target gene (green star, Gene B).
Q14. What is the significance of the association between CD79B and white blood cell count?
After conditioning on the set of 186 white blood cell count distinct variants identified by GWAS conditional analysis on chromosome 17, including 17:57929535_A_G and 17:65087308_G_C, CD79B continued to demonstrate evidence of association with lymphocyte count (p = 9.8×10-10) and white blood cell count (p = 8.5×10-9).
Q15. Who is the author/funder of the preprint?
TOPMed Imputation Server: https://imputation.biodatacatalyst.nhlbi.nih.gov/#.CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.