scispace - formally typeset
Search or ask a question
Posted ContentDOI

Transcriptome-wide association study in UK Biobank Europeans identifies associations with blood cell traits

TL;DR: This paper performed a transcriptome-wide association study (TWAS) of 29 hematological traits in 399,835 UK Biobank (UKB) participants of European ancestry using gene expression prediction models trained from whole blood RNA-seq data in 922 individuals.
Abstract: Previous genome-wide association studies (GWAS) of hematological traits have identified over 10,000 distinct trait-specific risk loci, but the underlying causal mechanisms at these loci remain incompletely characterized. We performed a transcriptome-wide association study (TWAS) of 29 hematological traits in 399,835 UK Biobank (UKB) participants of European ancestry using gene expression prediction models trained from whole blood RNA-seq data in 922 individuals. We discovered 557 TWAS signals associated with hematological traits distinct from previously discovered GWAS variants, including 10 completely novel gene-trait pairs corresponding to 9 unique genes. Among the 557 associations, 301 were available for replication in a cohort of 141,286 participants of European ancestry from the Million Veteran Program (MVP). Of these 301 associations, 199 replicated at a nominal threshold (α = 0.05) and 108 replicated at a strict Bonferroni adjusted threshold (α = 0.05/301). Using our TWAS results, we systematically assigned 4,261 out of 16,900 previously identified hematological trait GWAS variants to putative target genes. Compared to coloc, our TWAS results show reduced specificity and increased sensitivity to assign variants to target genes.

Summary (3 min read)

Depression Genes and Networks (DGN).

  • The DGN study was designed to collect samples of individuals with and without major depressive disorder, ages 21-60, from a survey research panel broadly representative of the United States population.
  • 10 . Genotyping and RNA-sequencing procedures have been described previously.
  • 10 For 922 European ancestry participants from the DGN study, the authors obtained both genotype data imputed to the TOPMed Freeze 8 reference panel and RNA-seq data.
  • 14, 15 For training gene expression prediction models, the authors included bi-allelic variants that are common and well-imputed (MAF > 0.05, Rsq > 0.8) in both DGN and in the UK Biobank.
  • 10 As described previously, quantified gene expression values were normalized using the hidden covariates with prior (HCP) method 16 , correcting for technical and biological factors.

Million Veteran Program (MVP) Europeans

  • The MVP is an observational cohort study and mega-biobank in the Department of Veteran Affairs healthcare system which began enrollment in 2011.
  • After quality control largely following the guidelines established in Marees et al 2018, 308,778 individuals of European ancestry remained.
  • The authors trained gene expression prediction models using an elastic net pipeline following the well-established PrediXcan methodology.
  • In order to assess which marginally significant TWAS genes provide novel findings above and beyond the discoveries in GWAS of blood cell traits in Europeans, the authors tested the association between predicted gene expression and phenotype while conditioning on reported blood cell trait GWAS variants.
  • The authors conducted two replication analyses in MVP Europeans to follow up on their results from the UKB TWAS: one for the marginal TWAS results and a second restricted to only conditionally significant genes.

TWAS variant-to-gene assignments.

  • The authors assigned the distinct GWAS variants from Vuckovic et al. to putative target genes using their TWAS results.
  • These 10 traits were chosen based on data availability for eQTLs in relevant cell types including platelets, CD4+, CD8+, CD14+, CD15+, and CD19+ cells.
  • The authors compare the TWAS and coloc variant-to-gene assignments to the sets of potentially causal genes identified by Open Targets.
  • CC-BY-NC-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
  • The authors classified genes as cell type group-specific or shared via Shannon entropy across the five cell type groups.

Marginal TWAS Results.

  • Using an elastic net-based pipeline, the authors trained gene expression prediction models using imputed genotypes and whole blood RNA-seq data from 922 European ancestry participants from the DGN cohort 10 .
  • The 11,759 associations were grouped into 4,835 trait-specific TWAS loci (see Methods) with the most significant gene at each TWAS locus assigned as the sentinel TWAS gene, resulting in 1,792 unique sentinel genes.
  • For the replication analysis, 15 out of the 29 UKB analyzed blood cell traits were available in MVP.
  • 9,492 out of the 10,004 (94.8%) gene expression prediction models were comprised of variants that overlapped completely with variants available in MVP.

Conditional Analyses Adjusting for Nearby GWAS Variants.

  • The authors then used conditional analysis to determine which of the 11,759 gene-phenotype associations in UKB represent novel findings beyond the recently published GWAS (see Methods for details) 3 .
  • These 557 associations represent 395 distinct genes in 463 traitspecific loci; 276 genes were conditionally significant for one trait, and 119 for multiple traits .
  • Of the 557 conditionally significant associations discovered from UKB, 301 had both matching phenotypes and predicted gene expression in MVP, and thus were subject for replication.
  • These 36 associations reflect TWAS's increased power over single-variant GWAS analyses by aggregating multiple sub-genome-wide significant GWAS variants.
  • The association between RFTN2 and white blood cell count replicated in MVP (p = 1.9×10 -8 ) with the same direction of effect.

Novel TWAS Loci

  • The authors discovered 10 conditionally significant gene-trait associations that have no previously identified distinct GWAS variants within ±1Mb of the locus for any blood cell trait.
  • After conditioning on all distinct platelet-related variants on chromosome 6, including chr6:71326034_G_A, the marginal TWAS association for IRAK1BP1 and mean platelet volume (p = 9.47×10 -12 ) was not attenuated (p = 5.33×10 -14 ), demonstrating that the IRAK1BP1 TWAS signal is distinct from previously reported GWAS variants.
  • Figure 4 shows that there is cell-type specific epigenetic evidence that supports their findings.
  • 27 Variants in the gene expression prediction model for IRAK1BP1 in high LD with chr6:79617522 overlapped with megakaryocyte ATACseq peaks from BLUEPRINT .
  • 29 Integration of their TWAS results with expression and chromatin conformation data in platelet producing megakaryocyte cells reveals novel candidate genes at this genomic locus; it is possible that the variants in IRAK1BP1 aggregated by the TWAS prediction model impact the expression of LCA5 through spatial proximity to the promoter region of the gene.

TWAS Genes Implicated in Novel Phenotype Categories

  • The authors TWAS conditional analysis identified 92 conditionally significant associations grouped into 70 loci with no distinct GWAS sentinels for the corresponding phenotype category within 1Mb of the gene.
  • These gene-trait associations represent novel TWAS findings; their results support that the previously reported association at the locus is extended to a new class of correlated phenotypes (for example, extension of loci already associated with red blood cell related traits to platelet or white blood cell indices).
  • The association with lymphocyte count remained nominally significant (p = 3.03×10 -4 ) and the white blood cell count association was attenuated (p = 0.16).

TWAS fine mapping via Conditional Analysis

  • TWAS conditional analysis was also used to fine map TWAS loci in which multiple genes achieved the Bonferroni adjusted significance threshold (see Supplemental Table S2 ).
  • The EPO gene was not included in the 95% FINEMAP credible set.
  • Yet after the authors condition the TWAS predicted expression on the distinct red blood cell signals at this locus, EPO was the only conditionally significant gene at the locus (p = 2.19×10 -6 ).
  • The association between SLC12A9 and hemoglobin was completely attenuated after conditioning (p = 0.71).

TWAS-based assignment of variants to target genes

  • To compare co-localization and TWAS approaches of assigning GWAS variants to potential causal genes, the authors considered 10,239 variant-trait associations across 10 hematological traits from Vuckovic et al. (see Methods).
  • While both LIME1 and ZGPAT correlations pass the r2 cutoff for the TWAS-based gene assignment (r2 > 0.2), LIME1 predicted expression is much more correlated with rs6062304, and is the most likely target gene at this locus according to the TWAS based approach.
  • The authors found that the TWAS-based approach assigned GWAS variants to genes identified by external datasets at a slightly lower rate than the coloc assignments, but identified target genes for more than double the number of variants .
  • In comparison, 88% of the coloc assigned genes are supported by OT Any genes and 78% as the OT Max gene.
  • The authors then applied their TWAS-based variant-to-gene assignment to all 29 hematological traits considered in their UKB TWAS.

Discussion.

  • The authors TWAS of blood cell traits in UKB Europeans demonstrates the utility of TWAS to identify novel loci and to extend known loci to additional phenotype categories, even in well-studied hematological traits for which over 10,000 loci have been reported by previous GWAS studies 2, 3 .
  • Often, these conditionally significant TWAS genes are not the most marginally significant genes at their respective TWAS locus, suggesting that marginal TWAS results can be driven by previously discovered GWAS variants.
  • CC-BY-NC-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
  • 6, 7, 44 Additionally, the TWAS variant-to-gene assignment approach would benefit from larger expression datasets to train cell/tissue type specific gene expression prediction models to assess the correlation between predicted expression and a GWAS variant of interest across several relevant models.
  • The authors careful use of conditional analysis, TWAS-based fine mapping, and TWAS-based variant-to-gene assignments in the context of blood cell traits will be broadly useful to the practice of TWAS for other complex traits.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

1
Transcriptome-wide association study in UK Biobank Europeans identifies
associations with blood cell traits.
Authors: Bryce Rowland
1
, Sanan Venkatesh
2
, Manuel Tardaguila
3
, Jia Wen
4
, Jonathan D
Rosen
1
, Amanda L Tapia
1
, Quan Sun
1
, Mariaelisa Graff
5
, Dragana Vuckovic
6
, Guillaume
Lettre
7
, Vijay G. Sankaran
8,9,10
, Alexander P. Reiner
11
, Nicole Soranzo
3
, Jennifer E.
Huffman
12
, Georgios Voloudakis
13
, Panos Roussos
2,14,15
, Laura Raffield
4
, Yun Li
1,4,16
1
Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC,
27599,
2
Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount
Sinai, New York City, NY, 10029,
3
Department of Human Genetics, Wellcome Sanger
Institute, Hinxton, CB10 1SA, UK,
4
Department of Genetics, University of North Carolina at
Chapel Hill, Chapel Hill, NC, 27599,
5
Department of Epidemiology, University of North
Carolina at Chapel Hill, Chapel Hill, NC, 27599,
6
Department of Epidemiology and
Biostatistics, School of Public Health, Faculty of Medicine, Imperial College London, London,
UK,
7
Montreal Heart Institute, Université de Montréal, Montreal, Quebec,
8
Division of
Hematology/Oncology, Boston Children's Hospital, Boston, MA, "02115",
9
Department of
Pediatric Oncology, Dana-Farber Cancer Institute, Boston, MA, "02115",
10
Broad Institute of
MIT and Harvard, Cambridge, MA, 2142,
11
Department of Epidemiology, University of
Washington, Seattle, WA,
12
Center for Population Genomics, Massachusetts Veterans
Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System,
Boston, MA,
13
Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York
City, NY, 10029,
14
Mental Illness Research, Education, and Clinical Center (VISN 2 South),
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted August 31, 2021. ; https://doi.org/10.1101/2021.08.03.453690doi: bioRxiv preprint

2
James J. Peters VA Medical Center, Bronx, NY, 10468,
15
Department of Genetics and
Genomic Sciences, Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New
York City, NY, 10029,
16
Department of Computer Science, University of North Carolina at
Chapel Hill, Chapel Hill, NC, 27599
Abstract: Previous genome-wide association studies (GWAS) of hematological traits have
identified over 10,000 distinct trait-specific risk loci, but the underlying causal mechanisms
at these loci remain incompletely characterized. We performed a transcriptome-wide
association study (TWAS) of 29 hematological traits in 399,835 UK Biobank (UKB)
participants of European ancestry using gene expression prediction models trained from
whole blood RNA-seq data in 922 individuals. We discovered 557 TWAS signals associated
with hematological traits distinct from previously discovered GWAS variants, including 10
completely novel gene-trait pairs corresponding to 9 unique genes. Among the 557
associations, 301 were available for replication in a cohort of 141,286 participants of
European ancestry from the Million Veteran Program (MVP). Of these 301 associations, 199
replicated at a nominal threshold (𝛼 = 0.05) and 108 replicated at a strict Bonferroni
adjusted threshold (𝛼 = 0.05/301). Using our TWAS results, we systematically assigned
4,261 out of 16,900 previously identified hematological trait GWAS variants to putative
target genes. Compared to coloc, our TWAS results show reduced specificity and increased
sensitivity to assign variants to target genes.
Introduction.
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted August 31, 2021. ; https://doi.org/10.1101/2021.08.03.453690doi: bioRxiv preprint

3
Blood cells facilitate key physiological processes in human health such as immunity,
oxygen transport, and clotting. Blood cell traits have been associated with risk for complex
diseases, including asthma, autoimmune conditions, and cardiovascular disease. Genome-
wide association studies (GWAS) in both large European and trans-ethnic cohorts have
identified thousands of loci associated with hematological traits including red blood cell,
white blood cell, and platelet indices.
13
While variant-level analyses provide general insights into the genetic architecture of
blood cell traits, functional mechanisms for these mostly non-coding signals remain elusive.
Transcriptome-wide association studies (TWAS) have been successful in identifying new
genetic loci and prioritizing potential causal genes at known loci for many complex traits
4
7
. TWAS associates phenotypes of interest with gene expressions predicted from genotype-
based prediction models built in a reference eQTL dataset. TWAS results can lead to an
increased understanding of the functional mechanisms underlying previously observed
variant-trait associations by positing relationships between genetic variants, effector
gene(s), and phenotypes. Additionally, TWAS has increased statistical power compared to
single variant association tests by aggregating multiple modest strength single variant
signals into a combined test
8
. Here, we conducted a large TWAS of 29 hematological traits
by studying 399,835 participants of European ancestry from the UK Biobank (Figure 1)
9
.
First, we trained gene expression prediction models using a reference dataset of 922
participants of European ancestry from the Depression Genes and Networks (DGN) cohort
with both genotype and RNA-seq data from whole blood
10
. Second, we applied the gene
expression prediction models trained in DGN to our discovery UK Biobank participants
(n=399,835) to obtain predicted gene expression levels and performed association testing
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted August 31, 2021. ; https://doi.org/10.1101/2021.08.03.453690doi: bioRxiv preprint

4
between predicted gene expression values and blood cell phenotypes. Third, we attempted
to replicate associations identified in UK Biobank in 141,286 European ancestry
participants from the Million Veteran Program (MVP) study.
11
Finally, we performed
follow-up analyses including conditional association tests on known GWAS variants, fine-
mapping of TWAS loci, and TWAS-based gene assignment for GWAS variants. We
demonstrate advantages of TWAS over single-variant analyses by comparing to a recent
large GWAS of hematological traits in UK Biobank Europeans
3
.
As previously mentioned, TWAS results can shed light on the functional mechanisms
underlying variant-trait associations by linking variants to target genes. Designing
appropriate functional experiments to interrogate biological mechanisms or to identify
potential drug targets necessitates accurately assigning GWAS variants to target genes.
Often, variants are linked to target genes using distance based approaches, which can lead
to inaccurate assignments (“Nearest Gene” in Figure 2).
12,13
Colocalization based methods
(“eQTL Colocalization” in Figure 2) evaluate the evidence that a GWAS variant coincides
with an eQTL signal for a gene in a relevant cell type and if these signals are likely driven by
the same biological process or the same set of variants. While useful, colocalization
methods may be underpowered in situations where there are multiple variants which are
associated both with a complex trait in GWAS and linked to the same target gene but with
low or moderate effect size.
We leveraged our TWAS results to assign GWAS variants to target genes. For each
GWAS variant-trait association, our TWAS based approach identified a set of potential
target genes associated with the same phenotype utilizing TWAS association results and
gene expression prediction models (“TWAS” in Figure 2). We then used individual level
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted August 31, 2021. ; https://doi.org/10.1101/2021.08.03.453690doi: bioRxiv preprint

5
genotypes in our testing cohort (UK Biobank) to identify variant-gene pairs where the
variant genotype and the predicted gene expression values are correlated (r2 > 0.2). By
effectively aggregating multiple smaller effect eQTLs for a gene, we hypothesized that our
TWAS based approach would be better powered than coloc to identify target gene(s) for a
GWAS variant . We systematically assigned the 16,900 conditionally distinct variant-trait
associations identified by Vuckovic et al. to target genes and compared our TWAS-based
assignments to those from coloc, a commonly used eQTL colocalization method.
Methods.
Included Cohorts.
Depression Genes and Networks (DGN). The DGN study was designed to collect samples of
individuals with and without major depressive disorder, ages 21-60, from a survey
research panel broadly representative of the United States population.
10
. Genotyping and
RNA-sequencing procedures have been described previously.
10
For 922 European ancestry
participants from the DGN study, we obtained both genotype data imputed to the TOPMed
Freeze 8 reference panel and RNA-seq data.
14,15
For training gene expression prediction
models, we included bi-allelic variants that are common and well-imputed (MAF > 0.05,
Rsq > 0.8) in both DGN and in the UK Biobank. In all, 5,652,397 variants were included,
here forward referred to as QC variants. DGN whole-blood RNA-seq data was obtained for
922 European ancestry participants.
10
As described previously, quantified gene expression
values were normalized using the hidden covariates with prior (HCP) method
16
, correcting
for technical and biological factors.
10
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted August 31, 2021. ; https://doi.org/10.1101/2021.08.03.453690doi: bioRxiv preprint

References
More filters
Journal ArticleDOI
TL;DR: The Ensembl Variant Effect Predictor can simplify and accelerate variant interpretation in a wide range of study designs.
Abstract: The Ensembl Variant Effect Predictor is a powerful toolset for the analysis, annotation, and prioritization of genomic variants in coding and non-coding regions. It provides access to an extensive collection of genomic annotation, with a variety of interfaces to suit different requirements, and simple options for configuring and extending analysis. It is open source, free to use, and supports full reproducibility of results. The Ensembl Variant Effect Predictor can simplify and accelerate variant interpretation in a wide range of study designs.

4,658 citations

Journal ArticleDOI
11 Oct 2018-Nature
TL;DR: Deep phenotype and genome-wide genetic data from 500,000 individuals from the UK Biobank is described, describing population structure and relatedness in the cohort, and imputation to increase the number of testable variants to 96 million.
Abstract: The UK Biobank project is a prospective cohort study with deep genetic and phenotypic data collected on approximately 500,000 individuals from across the United Kingdom, aged between 40 and 69 at recruitment. The open resource is unique in its size and scope. A rich variety of phenotypic and health-related information is available on each participant, including biological measurements, lifestyle indicators, biomarkers in blood and urine, and imaging of the body and brain. Follow-up information is provided by linking health and medical records. Genome-wide genotype data have been collected on all participants, providing many opportunities for the discovery of new genetic associations and the genetic bases of complex traits. Here we describe the centralized analysis of the genetic data, including genotype quality, properties of population structure and relatedness of the genetic data, and efficient phasing and genotype imputation that increases the number of testable variants to around 96 million. Classical allelic variation at 11 human leukocyte antigen genes was imputed, resulting in the recovery of signals with known associations between human leukocyte antigen alleles and many diseases.

4,489 citations

Journal ArticleDOI
TL;DR: Improvements to imputation machinery are described that reduce computational requirements by more than an order of magnitude with no loss of accuracy in comparison to standard imputation tools.
Abstract: Christian Fuchsberger, Goncalo Abecasis and colleagues describe a new web-based imputation service that enables rapid imputation of large numbers of samples and allows convenient access to large reference panels of sequenced individuals. Their state space reduction provides a computationally efficient solution for genotype imputation with no loss in imputation accuracy.

2,556 citations

Journal ArticleDOI
TL;DR: A method is proposed that integrates summary-level data from GWAS with data from expression quantitative trait locus (eQTL) studies to identify genes whose expression levels are associated with a complex trait because of pleiotropy, and prioritize 126 genes that provide important leads to design future functional studies.
Abstract: Genome-wide association studies (GWAS) have identified thousands of genetic variants associated with human complex traits. However, the genes or functional DNA elements through which these variants exert their effects on the traits are often unknown. We propose a method (called SMR) that integrates summary-level data from GWAS with data from expression quantitative trait locus (eQTL) studies to identify genes whose expression levels are associated with a complex trait because of pleiotropy. We apply the method to five human complex traits using GWAS data on up to 339,224 individuals and eQTL data on 5,311 individuals, and we prioritize 126 genes (for example, TRAF1 and ANKRD55 for rheumatoid arthritis and SNX19 and NMRAL1 for schizophrenia), of which 25 genes are new candidates; 77 genes are not the nearest annotated gene to the top associated GWAS SNP. These genes provide important leads to design future functional studies to understand the mechanism whereby DNA variation leads to complex trait variation.

1,511 citations

Journal ArticleDOI
TL;DR: The results demonstrate that PrediXcan can detect known and new genes associated with disease traits and provide insights into the mechanism of these associations.
Abstract: Genome-wide association studies (GWAS) have identified thousands of variants robustly associated with complex traits. However, the biological mechanisms underlying these associations are, in general, not well understood. We propose a gene-based association method called PrediXcan that directly tests the molecular mechanisms through which genetic variation affects phenotype. The approach estimates the component of gene expression determined by an individual's genetic profile and correlates 'imputed' gene expression with the phenotype under investigation to identify genes involved in the etiology of the phenotype. Genetically regulated gene expression is estimated using whole-genome tissue-dependent prediction models trained with reference transcriptome data sets. PrediXcan enjoys the benefits of gene-based approaches such as reduced multiple-testing burden and a principled approach to the design of follow-up experiments. Our results demonstrate that PrediXcan can detect known and new genes associated with disease traits and provide insights into the mechanism of these associations.

1,372 citations

Related Papers (5)
Frequently Asked Questions (15)
Q1. What is the role of IRAK1BP1 in platelet trait variability?

IRAK1BP1 is a component of the IRAK1-dependent TNFRSF1A signaling pathway, which can activate NF-kappa-B and regulate cellular apoptosis and inflammation. 

The authors performed a transcriptome-wide association study ( TWAS ) of 29 hematological traits in 399,835 UK Biobank ( UKB ) participants of European ancestry using gene expression prediction models trained from whole blood RNA-seq data in 922 individuals. 4. 0 International license available under a ( which was not certified by peer review ) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. 

In addition to previous research, future methodological research and software development should be done to address this challenge. By leveraging cell/tissue type specific gene expression datasets to train gene expression models in the future, the TWAS based approach could be extended to match the common practice of conducting eQTL colocalization or other target gene assignment analyses in trait-relevant cell/tissue types. There are still several future directions for the improvement of biobank-scale TWAS studies. First, increasing sample sizes in tissue specific expression datasets will allow future TWAS studies to train gene expression prediction models in cell/tissue types which are directly relevant to tissues of interest. 

The authors used an LD (linkage disequilibrium) pruned (plink --indep-pairwise 50 5 0.1) set of 174,957 variants with MAF > 0.01 in the genotype data available for UKB Europeans to fit the REGENIE null model accounting for cryptic relatedness. 

In addition, the authors replicate 3 out of 6 of the novel TWAS loci available for replication in MVP at a nominal significance threshold, and 1 at a Bonferroni adjusted significance threshold. 

LCA5 is not present in the DGN reference panel, and thus unavailable to fit a prediction model, likely because of low expression in whole blood (median TPM 0.018 in GTEx v8).29 Integration of their TWAS results with expression and chromatin conformation data in platelet producing megakaryocyte cells reveals novel candidate genes at this genomic locus; it is possible that the variants in IRAK1BP1 aggregated by the TWAS prediction model impact the expression of LCA5 through spatial proximity to the promoter region of the gene. 

The CD79B locus demonstrates a robust association with lymphocyte count despite conditioning on previously identified white blood cell, red blood cell and platelet associated variants at the locus. 

Of the 557 conditionally significant associations discovered from UKB, 301 had bothmatching phenotypes and predicted gene expression in MVP, and thus were subject forreplication. 

While both IRAK1BP1 and LCA5 are expressed in megakaryocyte cells using expression data from BLUEPRINT, the expression level is higher in LCA5, suggesting a potential role for LCA5 in platelet trait variability, despite not being captured by TWAS (Figure 4d). 

The variants with the largest effect sizes in the TWAS prediction model for LIME1 are in high LD with rs6062304, whereas those for ZGPAT are not. 

Direct conditional analysis is only possible when using individual level genotype data in the discovery cohort, which presents an advantage of using non-summary statistics based methods to perform TWAS.CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. 

The authors successfully assigned 4,261 varianttrait associations to 1,842 distinct potentially causal TWAS genes with an average of 1.45 (SD = 0.81) genes assigned per variant-trait association (see Supplemental Table S3). 

Colocalization based approaches assign the variant to a target gene based on evidence that the GWAS signal is not distinct from an eQTL signal for a target gene (green star, Gene B). 

After conditioning on the set of 186 white blood cell count distinct variants identified by GWAS conditional analysis on chromosome 17, including 17:57929535_A_G and 17:65087308_G_C, CD79B continued to demonstrate evidence of association with lymphocyte count (p = 9.8×10-10) and white blood cell count (p = 8.5×10-9). 

TOPMed Imputation Server: https://imputation.biodatacatalyst.nhlbi.nih.gov/#.CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.