scispace - formally typeset
Search or ask a question
Posted ContentDOI

A map of transcriptional heterogeneity and regulatory variation in human microglia

TL;DR: This study provides the first population-scale transcriptional map of a critically important cell for neurodegenerative disorders and fine-map candidate causal variants at risk loci for Alzheimer’s disease.
Abstract: Microglia, the tissue resident macrophages of the CNS, are implicated in a broad range of neurological pathologies, from acute brain injury to dementia. Here, we profiled gene expression variation in primary human microglia isolated from 141 patients undergoing neurosurgery. Using single cell and bulk RNA sequencing, we defined distinct cellular populations of acutely in vivo-activated microglia, and characterised a dramatic switch in microglial population composition in patients suffering from acute brain injury. We mapped expression quantitative trait loci (eQTLs) in human microglia and show that many disease-associated eQTLs in microglia replicate well in a human induced pluripotent stem cell (hIPSC) derived macrophage model system. Using ATAC-seq from 95 individuals in this hIPSC model we fine-map candidate causal variants at risk loci for Alzheimer9s disease, the most prevalent neurodegenerative condition in acute brain injury patients. Our study provides the first population-scale transcriptional map of a critically important cell for neurodegenerative disorders.

Summary (3 min read)

Introduction

  • Microglia are tissue resident macrophages of the central nervous system and play critical roles in neurological immune defence, development and homeostasis (Schafer and Stevens 2015; Q. Li and Barres 2018; Salter and Stevens 2017).
  • Two populations (C and D) were common in patients with acute brain injury (25-76% of cells) but rare in other pathologies (<5% of cells).
  • This analysis retrieved colocalisations at other AD GWAS loci, such as CD33 and CASS4.
  • The authors also used the three-way model to evaluate the extent of sharing between the microglia eQTLs, IPSDM or monocyte eQTLs, and AD risk loci.
  • The authors identified multiple microglial subpopulations and showed how these populations are shaped by insult, injury and other life history factors.

Tissue sampling

  • Human brain tissue was obtained with informed consent under protocol REC 16/LO/2168 approved by the NHS Health Research Authority.
  • Adult brain tissue biopsies were taken from the site of neurosurgery resection for the original clinical indication.
  • Paired venous blood was sampled at the induction of anaesthesia.

Dissociation of brain tissue

  • The prepared mix was spun in HBSS+ (Life Technologies) at 300g for 5 mins and supernatant discarded.
  • The digested tissue was rigorously triturated at 4°C and filtered through a 70 m nylon cell strainer to remove large cell debris and undigested tissue.
  • Supernatant was discarded and the pellet was re-suspended in ice cold supplemented HALF.

Fluorescence-activated cell sorting

  • For single cell smart sequencing, human microglia were using fluorescence-activated cell sorting.
  • The isolated cell suspension was incubated with conjugated PE anti-human CD11b antibody for 20 mins at 4°C.
  • Cells were washed twice in ice cold supplemented HALF and stained with Helix NP viability marker.
  • Cell sorting was performed on BD AriaIII cell sorter (Becton, Dickinson and Company, Franklin Lakes, New Jersey, US) at the University of Cambridge Cell Phenotyping Hub at Cambridge University Hospital, Cambridge, UK.
  • Cells were either sorted into 98 well plates, prepared by the Wellcome Trust Sanger Institute for the purposes of single cell sequencing.

Magnetic-activated cell sorting

  • To avoid sustained stress on microglia as a result of prolonged sorting times for bulk sequencing magnetic-activated cell sorting was performed on these cells.
  • An isolated cell suspension of cells were incubated with anti-CD11b conjugated magnetic beads for 15 mins at 4°C.
  • Cells were washed twice with supplemented HALF and passed through an MS column .
  • Each sample was washed three times in the column and then extracted.

Blood preparation

  • DNA extraction was performed from the venous blood.
  • 10 ml of whole blood was washed with 1% phosphate buffered saline (PBS) and layered on pancoll human (PAN biotech) and spun at 500g for 25 mins.
  • The white cell component was extracted and transferred to a 1.5ml Eppendorf and stored as a frozen pellet at -80C prior to sequencing.

SNP genotyping

  • Genomic DNA was extracted from blood using the QIAamp DNA mini and blood mini kit (Qiagen, 51104).
  • IPS cell culture and macrophage differentiation was carried as previously described (Alasoo et al. 2018) but with some minor modifications (see Supplementary Methods for details).
  • Tagmentation was quenched with 0.2 % sodium dodecyl sulphate.
  • Low-input bulk RNA-seq and ATAC-seq library preparation for primary microglia and iPS-derived macrophages For RNA-seq samples, between 0.3 ng and 10 ng of bulk total RNA from primary microglia cells or iPS-derived macrophage cells was used as input for a modified Smart-seq2 library preparation (Picelli et al. 2014) (see Supplementary Methods for detailed protocol).

Sequencing data preprocessing

  • All sequence data sets were aligned to human genome assembly GRCh38.
  • All other RNA-seq data were also aligned as same as their RNA-seq data without adapter trimming.
  • The authors fit the latent factor linear mixed model in which the three different studies were treated as a random effect (see Supplementary Note Section 1 for details).
  • The authors processed the data using the provided R script and obtained the cell type annotation for PBMCs.
  • The count data from two studies were joined by gene IDs and converted into CPM (count per million) along with their primary microglia read count data.

Variance component analysis

  • A linear mixed model of log(TPM+1) values across genome-wide genes (whose TPM>0 for 10% of total cells) was used to estimate the transcriptional variation.
  • The 13 different factors (Patient, the number of expressed genes per cell, pathology, plate ID, ERCC percentage, the number of expressed genes in each cell, 96 well plate position, age of patient, mitochondria RNA percentage, brain region, brain hemisphere, ethnicity and sex) were fitted as random effects with independent variance parameters 𝜙"#.
  • The variance explained by the factor k was measured by the intraclass correlation 𝜙"#/(1 + 𝜙"#), where the other 12 factors were fixed constant.
  • The standard error of the intraclass correlation was computed by the delta method with the standard error of the variance parameter estimator.
  • See Supplementary Note Section 1.1 for details.

Detection of microglia subpopulations

  • The authors used the linear mixed model to estimate the latent factors with the 13 known confounding factors (see Supplementary Note Section 1.2 for details).
  • There are 2l-1 contrasts which were tested against the null model (removing the focal factor k in the model) to compute Bayes factors.
  • The fragment counts were GC corrected as described in (Kumasaka, Knights, and Gaffney 2016), normalised into TPM (transcripts per million) and then log transformed (log of TPM+1).
  • 25 principal components (PCs) were calculated and regressed out from the normalised expression levels.
  • The authors picked up the minimum BH Q-value for each gene to perform the multiple testing correction genome-wide.

Bayesian hierarchical model

  • The authors extended a standard Bayesian hierarchical model (Veyrieras et al. 2008) to jointly map eQTLs in three different cell types.
  • It can provide posterior probability that a gene is an eQTL for each cell type.
  • See Supplementary Note Section 2 for more details.
  • Alzheimer’s disease GWAS summary statistics GWAS of diagnosed AD (Kunkle et al. 2019) and a GWAS for family history of AD that the authors conducted in the UK Biobank (see Liu and Schwartzentruber 2019 for details) across 10,687,126 overlapping variants.
  • The authors lumped the true and proxy-cases together (53,042 unique affected individuals, 355,900 controls) and performed association tests using BOLT-LMM (Loh et al., 2015).

URL

  • Welch, Joshua D., Velina Kozareva, Ashley Ferreira, Charles Vanderburg, Carly Martin, and Evan Z. Macosko.
  • Zhang, Hanrui, Chenyi Xue, Rhia Shah, Kate Bermingham, Christine C. Hinkle, Wenjun Li, Amrith Rodrigues, et al. 2015.

Did you find this useful? Give us your feedback

Figures (6)

Content maybe subject to copyright    Report

A map of transcriptional heterogeneity and regulatory
variation in human microglia
Adam MH Young
1-3*
, Natsuhiko Kumasaka
2*
, Fiona Calvert
2
, Timothy R. Hammond
4,5
, Andrew
Knights
2
, Nikolaos Panousis
2
, Jeremy Schwartzentruber
6
, Jimmy Liu
7
, Kousik Kundu
2
, Michael
Segel
1
, Natalia Murphy
1
, Christopher E McMurran
1
, Harry Bulstrode
3
, Jason Correia
3
, Karol P
Budohoski
3
, Alexis Joannides
3
, Mathew R Guilfoyle
3
, Rikin Trivedi
3
, Ramez Kirollos
3
, Robert
Morris
3
, Matthew R Garnett
3
, Helen Fernandes
3
, Ivan Timofeev
3
, Ibrahim Jalloh
3
, Katherine
Holland
3
, Richard Mannion
3
, Richard Mair
3
, Colin Watts
3,8
, Stephen J Price
3
, Peter J
Kirkpatrick
3
, Thomas Santarius
3
, Nicole Soranzo
2
, Beth Stevens
4,5
, Peter J Hutchinson
3
, Robin
JM Franklin
1$
& Daniel J Gaffney
2$
.
1. Wellcome Trust MRC Stem Cell Institute, University of Cambridge, Cambridge, UK, CB2
0QQ. 2. Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire,
UK, CB10 1SA. 3. Division of Neurosurgery, Department of Clinical Neurosciences,
Cambridge University Hospitals, Cambridge, UK, CB2 0QQ. 4. FM Kirby Neurobiology Center,
Boston Children's Hospital, Harvard University, Boston, USA. 5. Howard Hughes Medical
Institute, Broad Institute of Harvard and MIT, Boston, USA. 6. EMBL-EBI, Wellcome Genome
Campus, Hinxton, Cambridgeshire, CB10 1SD. 7. Biogen, Cambridge, MA, 02142, USA. 8.
Institute of Cancer and Genomic Sciences, College of Medical and Dental Sciences,
Birmingham UK, B15 2TT $ Corresponding author
Abstract
Microglia, the tissue resident macrophages of the CNS, are implicated in a broad range of
neurological pathologies, from acute brain injury to dementia. Here, we profiled gene
expression variation in primary human microglia isolated from 141 patients undergoing
neurosurgery. Using single cell and bulk RNA sequencing, we defined distinct cellular
populations of acutely in vivo-activated microglia, and characterised a dramatic switch in
microglial population composition in patients suffering from acute brain injury. We mapped
expression quantitative trait loci (eQTLs) in human microglia and show that many disease-
associated eQTLs in microglia replicate well in a human induced pluripotent stem cell (hIPSC)
derived macrophage model system. Using ATAC-seq from 95 individuals in this hIPSC model
we fine-map candidate causal variants at risk loci for Alzheimer’s disease, the most prevalent
neurodegenerative condition in acute brain injury patients. Our study provides the first
population-scale transcriptional map of a critically important cell for neurodegenerative
disorders.
preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for thisthis version posted December 20, 2019. ; https://doi.org/10.1101/2019.12.20.874099doi: bioRxiv preprint

Introduction
Microglia are tissue resident macrophages of the central nervous system and play critical roles
in neurological immune defence, development and homeostasis (Schafer and Stevens 2015;
Q. Li and Barres 2018; Salter and Stevens 2017). These highly dynamic cells are challenging
to study in the laboratory and are strongly influenced by different experimental environments
(Gosselin et al. 2017). Genetic studies also strongly implicate microglial dysfunction in
neurodegeneration (Guerreiro et al. 2013; Jonsson et al. 2013; Tansey, Cameron, and Hill
2018; Gjoneska et al. 2015) particularly in the context of the injured brain (Johnson and
Stewart 2015). Single cell transcriptomics has suggested that microglial function may vary
across age, sex and brain region (Olah et al. 2018; Keren-Shaul et al. 2017; Hammond et al.
2019; Masuda et al. 2019; Mrdjen et al. 2018; Mathys et al. 2017). Previous studies have used
frozen post-mortem tissue from existing brain banks or fresh surgical samples typically from
restricted patient groups, typically temporal lobe resections for epilepsy or peritumoral
sampling. However, variability in the post-mortem index produces substantial variation in
cellular expression (Welch et al. 2019). Because of this, studies of microglial activation in
humans have relied on ex vivo stimulation with no available data from acutely injured human
brains. The challenge of sampling also means that large scale genetic studies of microglia
have not been attempted to date. Population studies have demonstrated that individuals
subject to mild brain trauma are 5-fold more likely to develop Alzheimer’s Disease (Mackay et
al. 2019). Consequently, it is of particular importance to understand the activation of human
microglia in the context of acute brain injury together with the underpinning genetic contribution
to neurodegeneration.
Characterisation microglial cell populations
Here, we describe the analysis of human microglia isolated from 141 patients undergoing a
range of neurosurgical procedures (Figure 1a). We recruited patients from a range of
pathologies, including 51 individuals with acute brain injury (haemorrhage and trauma), who
sustained substantial parenchymal injury, enabling us to observe in vivo microglial activation.
For each individual, we isolated CD11b-positive cells and performed both single cell
(SmartSeq2) (Picelli et al. 2014) and bulk RNA-seq on each individual. After QC, we retained
112 bulk RNA-seq samples, and 9,538 single cells from 129 patients (Figure 1b). All but three
of our bulk RNA-seq samples formed a single cluster with microglia from two previous studies
(Y. Zhang et al. 2016; Gosselin et al. 2017), and were distinct from both GTEx brain and
BLUEPRINT monocytes (Figure 1c). We then compared our single cell data to public datasets
of 68K PBMCs isolated from a healthy donor (Zheng et al. 2017) and 15K brain cells from 5
GTEx donors (Habib et al. 2017). A total of 8,662 cells formed a cluster with the microglia
population found in GTEx samples and distinct from PBMCs (Figure 1d) and expressed a
range of known microglial marker genes, including P2RY12, CX3CR1 and TMEM119, to a
high level (Extended Data Figure 1a). We defined this population of cells as microglia for the
remainder of our analysis. We found three less common populations of cells that closely
resembled other blood cell types, including NKT cells, monocytes or B-cells that comprised
8.4%, 0.5% and 0.3% of our single cell dataset, respectively. These cell types may reflect
either infiltration of immune cells as a result of blood-brain barrier breakdown or intravascular
contamination within the tissue. In support of the former hypothesis, we also found that the
abundance of infiltrating cells strongly correlated with patient pathology, with trauma patients
in particular enriched (OR=7.6, Fisher exact test P=1.2x10
-155
) (Figure 1d). We also found a
preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for thisthis version posted December 20, 2019. ; https://doi.org/10.1101/2019.12.20.874099doi: bioRxiv preprint

significant effect of age on the abundance of infiltrating cells (3.4% increase per year, Wald
test P=0.014) after adjusting for all known confounding factors, which could reflect blood brain
barrier degeneration over the lifespan (Extended Data Figure 1b).
Within microglia, we found four subpopulations of cells (Figure 2a). Two populations (C and
D) were common in patients with acute brain injury (25-76% of cells) but rare in other
pathologies (<5% of cells). Population B was enriched in tumor patients (OR=4.9, P=7.6x10
-
169
) while population A was most common in control and hydrocephalus patients (Figure 2b,
c). In populations A and B, we observed higher expression of microglial markers including
P2RY12 and CX3CR1. Cells from B, C and D also demonstrated an upregulation of general
immune response and cell activation (IL1B, CD83 & CCL3) (Figure 2d, e; Extended Data
Figure 2; Supplementary Table 1). Cells from C and D exhibited additional upregulation of
acute immune response pathways, including NF-kappa B, STAT3, RUNX1 as well as MHC-I
expression. Population C also showed differential expression of genes associated with stress
induced senescence and DNA damage (HIST1H2BG), populations D expressed genes
associated with cell proliferation (FLT1) and chemotaxis (CCL4, CXCL8, CXCL16), the latter
of which is shared with population B. Population B additionally showed strong upregulation of
catabolic process and metabolism (GPX1) and phagocytosis (TREM2). Our cells partially
overlapped with the transcriptional signatures of disease-associated microglia established in
previous literature (Keren-Shaul et al. 2017; Xue et al. 2014) (Figure 2f, g). Taken together,
these results suggest that our data contain a mix of naive microglia (population A), with three
distinct states of activation that, in part, are driven by patient pathology.
Biological drivers of microglial expression
Our sampling design enabled us to explore the relative importance of a wide range of biological
factors in driving microglial gene expression while controlling for important technical
confounders, using variance components analysis. Of the biological factors we examined,
clinical pathology explained more variation than all other factors combined, although all factors
except sex, including age, brain region, dominant hemisphere and ethnicity, explained a
fraction of variation that was significantly different from zero (LR test FWER 0.05) (Figure 3a).
Patient explained the most variability of any single factor in the model. Although this factor
captured the contribution of genetic background, it is also likely to reflect unmeasured
technical effects, such as variability in cell dissociation and surgical sampling, which are
confounded with patient in the model. The cellular pathways that differed between patient
pathologies closely resembled the differences we observed between different subpopulations
of microglia (Extended Data Figure 3a, b). We also detected 260 genes that varied
significantly by patient age, showing upregulation of inflammation (CLEC7A, CIITA and TLR2)
and downregulation of cell identity (P2RY12, CX3CR1), motility and proliferation (CSF1R) with
increasing age (Figure 3b-e; Extended Data Figure 3c; Supplementary Table 2). Although
sex explained little variation globally, we found 97 genes that were differentially expressed
between males and females (Figure 3f). These included multiple genes in the complement
pathway and synaptic pruning mechanisms (C1QA, C1QC and C3) that were more highly
expressed in females than males (Figure 3g; Extended Data Figure 3d; Supplementary
Table 3). Anatomical region of sampling also had a subtle effect on transcriptional variation,
with cerebellar microglia, which are known to exhibit a distinct, less ramified morphology
upregulating multiple recruitment chemokines (CCL4, CCL3, CCL4L2, CCL3L3) (Figure 3h;
Extended Data Figure 3e; Supplementary Table 4).
preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for thisthis version posted December 20, 2019. ; https://doi.org/10.1101/2019.12.20.874099doi: bioRxiv preprint

eQTL mapping in human microglia and neurodegeneration
We constructed a map of expression quantitative trait loci (eQTLs) in primary human microglia.
After excluding samples with low genotyping quality or substantial non-European ancestry, we
mapped eQTLs using our bulk RNA-seq data from 93 individuals, and detected 401 eQTLs,
summing over hierarchical model posteriors (585 eQTLs at FDR 5% using linear model). The
low number of eQTLs reflected the high between-sample heterogeneity in microglia, compared
with other cell types (Extended Data Figure 4a). We tested for colocalization of risk loci from
18 genome wide association studies (GWAS) with microglia eQTLs (Figure 4a), including five
previous studies of Alzheimer’s disease (AD), and our own meta-analysis of these five studies
for comparison (Online Methods). Across all AD GWAS, we found up to 11 risk loci with a
posterior probability of colocalisation (PP4) greater than 0.5 (Table 1). These included well-
known AD loci, such as BIN1, and less well-studied AD associations, for example EPHA1-
AS1. We repeated the analysis using microglia eQTLs mapped by RASQUAL to support the
colocalisation result using the allele specific expression signature (Supplementary Table 5).
This analysis retrieved colocalisations at other AD GWAS loci, such as CD33 and CASS4.
However, the test statistics may be inflated due to the additional overdispersion in our data
set (Extended Data Figure 4a).
Next, we compared AD risk loci from our meta-analysis with eQTLs from the GTEx project
(v7), in circulating blood monocytes and in a novel dataset of IPSC-derived macrophages
(IPSDMs) from 133 healthy individuals (Online Methods) (Figure 4b). We found more
colocalised AD eQTLs in microglia than in any GTEx brain region. We also observed many
AD risk loci that colocalised with eQTLs in blood monocytes and IPSDMs. To explore the level
of cell-type specificity, we mapped eQTLs jointly analysing data from microglia, monocytes
and IPSDMs using a three-way Bayesian hierarchical model (Extended Data Figure 4a, b;
Online Methods). Using this approach, we discovered 855 eQTLs, of which 108 were
microglia-specific, 449 were found in all three cell types, and 192 were shared with IPSDMs
but not monocytes. We also used the three-way model to evaluate the extent of sharing
between the microglia eQTLs, IPSDM or monocyte eQTLs, and AD risk loci. Many colocalised
AD loci, including BIN1, are found in both microglia and IPSDMs, but absent in monocytes
(Figure 4c, d). There were also multiple AD loci where an eQTL was only detectable in
circulating monocytes (e.g., CASS4 locus), although this is likely to reflect primarily the
differences in power between the monocyte (n=193) and microglia data sets.
IPS models of AD risk loci are an invaluable resource for the development of future
therapeutics. We next identified three AD association signals (BIN1, the EPHA1 locus and
PTK2B) that colocalised with both microglia and IPSDM eQTLs (Figure 4d). The association
for EPHA1-AS1 was shared across many cell types (Extended Data Figure 5a, b), while the
direction of effect at the PTK2B locus was inconsistent (Extended Data Figure 5c). To fine-
map causal variants we generated ATAC-seq data from 5 primary microglia and 89 IPSDMs.
Colocalisation analysis revealed that the AD association signal at BIN1 was highly cell type
specific (Figure 5a). The lead SNP of this association signal, rs6733839:C>T, was located in
a region of open chromatin in both microglia and IPSDMs in which the AD risk allele.
rs6733839:C>T was also associated with a significant change in chromatin accessibility
(Figure 5b, P<6.1x10
-10
), and the association signal for chromatin also colocalised
(PP4=0.996) with the AD association signal (Figure 5c-f). The AD risk allele at
rs6733839:C>T created a predicted high-affinity binding site for the MEF2A transcription factor
preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for thisthis version posted December 20, 2019. ; https://doi.org/10.1101/2019.12.20.874099doi: bioRxiv preprint

(Extended Data Figure 5d). We found that, although BIN1 and MEF2A are broadly expressed
in many tissues, co-expression of both genes was found only in primary microglia and IPSDMs
(Extended Data Figure 5e).
Discussion
Here we present a population-level study of human primary microglia. By sampling cells from
living donors, we defined transcriptional signatures of in vivo microglial activation avoiding
artefacts from post mortem index and cell culture. We identified multiple microglial
subpopulations and showed how these populations are shaped by insult, injury and other life
history factors. We also created the first map of eQTLs in microglia, identified high confidence
causal genes and variants underlying risk loci for Alzheimer’s disease, and identified a subset
that replicated in a scalable IPS model system.
Our results underscore the variability between microglia cells from different individuals. One
implication of the variation we observed between different patient pathologies is that the full
spectrum of microglial function is not well cannot be captured by small studies of a single
patient population. The most obvious example of this are the populations of activated microglia
we identified that account for less than 5% of cells in non-trauma patients. Our results also
provide a picture of the function of microglia following severe trauma, producing cell
populations that exhibit a mixture of a proinflammatory and chemotactic phenotypes. Notably,
although animal models of acute brain injury suggest rapid expansion of microglia following
trauma (Vela et al. 2002), we only observed one population we identified had a proliferative
phenotype, and both showed downregulation of CSF1R. Also in contrast to previous reports
(Olah et al. 2018), we found relatively subtle effects of age on microglial transcription. The
modest changes we did detect were consistent with increased inflammatory senescence in
microglia over the lifespan. Likewise, differences in microglia expression between males and
females were relatively small, although we did observe increased complement activity in
females, perhaps suggesting a role for complement pathways in the higher incidence of AD in
women.
Our eQTL analysis revealed a number of candidate risk genes for AD that function in microglia.
This included well-known genes, such as BIN1, and a number of less well understood loci.
One example we discovered was the EPHA1-AS1 locus, where AD risk appeared to be driven
by a change in the expression of a long noncoding RNA, rather than the neighbouring protein-
coding gene EPHA1 (Extended Data Figure 5a, b). We did not detect some well-known AD
risk loci, such as CD33, with suspected function in myeloid cells. In the case of CD33, analysis
of splicing patterns did indeed reveal a splice QTL at exon 2 (Extended Data Figure 6a-c),
consistent with previous studies (Raj et al. 2014). In other cases, we found strong
colocalisation between AD risk loci and monocyte, but not microglia, eQTLs. While it is
tempting to conclude that this reflects a monocyte-specific function, we believe it is more
plausible that this reflects lower power in our microglia dataset and that, with an increased our
sample size, many of these eQTLs would be found to be shared between the two cell types.
One example of this is the CASS4 locus, where the minor allele frequency of the GWAS lead
variant (rs6014724:A>G) was >10% in the monocyte data, but <5% in our microglia data set
(Extended Data Figure 6d). Other examples of apparently spurious monocyte-specific
preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for thisthis version posted December 20, 2019. ; https://doi.org/10.1101/2019.12.20.874099doi: bioRxiv preprint

Citations
More filters
Journal ArticleDOI
TL;DR: In this paper, the authors performed an updated genome-wide AD meta-analysis, which identified 37 risk loci, including new associations near CCDC6, TSPAN14, NCK2 and SPRED2.
Abstract: Genome-wide association studies have discovered numerous genomic loci associated with Alzheimer's disease (AD); yet the causal genes and variants are incompletely identified. We performed an updated genome-wide AD meta-analysis, which identified 37 risk loci, including new associations near CCDC6, TSPAN14, NCK2 and SPRED2. Using three SNP-level fine-mapping methods, we identified 21 SNPs with >50% probability each of being causally involved in AD risk and others strongly suggested by functional annotation. We followed this with colocalization analyses across 109 gene expression quantitative trait loci datasets and prioritization of genes by using protein interaction networks and tissue-specific expression. Combining this information into a quantitative score, we found that evidence converged on likely causal genes, including the above four genes, and those at previously discovered AD loci, including BIN1, APH1B, PTK2B, PILRA and CASS4.

189 citations

Journal ArticleDOI
TL;DR: In this article, the authors integrated Alzheimer's disease (AD) GWAS data with myeloid cell genomics, and reported that myELoid active enhancers are most burdened by AD risk alleles.
Abstract: Genome-wide association studies (GWAS) have identified more than 40 loci associated with Alzheimer’s disease (AD), but the causal variants, regulatory elements, genes and pathways remain largely unknown, impeding a mechanistic understanding of AD pathogenesis. Previously, we showed that AD risk alleles are enriched in myeloid-specific epigenomic annotations. Here, we show that they are specifically enriched in active enhancers of monocytes, macrophages and microglia. We integrated AD GWAS with myeloid epigenomic and transcriptomic datasets using analytical approaches to link myeloid enhancer activity to target gene expression regulation and AD risk modification. We identify AD risk enhancers and nominate candidate causal genes among their likely targets (including AP4E1, AP4M1, APBB3, BIN1, MS4A4A, MS4A6A, PILRA, RABEP1, SPI1, TP53INP1, and ZYX) in twenty loci. Fine-mapping of these enhancers nominates candidate functional variants that likely modify AD risk by regulating gene expression in myeloid cells. In the MS4A locus we identified a single candidate functional variant and validated it in human induced pluripotent stem cell (hiPSC)-derived microglia and brain. Taken together, this study integrates AD GWAS with multiple myeloid genomic datasets to investigate the mechanisms of AD risk alleles and nominates candidate functional variants, regulatory elements and genes that likely modulate disease susceptibility. This study integrates Alzheimer’s disease (AD) GWAS data with myeloid cell genomics, and reports that myeloid active enhancers are most burdened by AD risk alleles. The authors also nominate candidate causal regulatory elements, variants and genes that likely modulate the risk for AD.

95 citations

Journal ArticleDOI
TL;DR: In this paper , the authors describe the transcriptome analysis of 255 primary human microglial samples isolated at autopsy from multiple brain regions of 100 individuals and performed systematic analyses to investigate various aspects of microglia heterogeneities, including brain region and aging.
Abstract: Microglia have emerged as important players in brain aging and pathology. To understand how genetic risk for neurological and psychiatric disorders is related to microglial function, large transcriptome studies are essential. Here we describe the transcriptome analysis of 255 primary human microglial samples isolated at autopsy from multiple brain regions of 100 individuals. We performed systematic analyses to investigate various aspects of microglial heterogeneities, including brain region and aging. We mapped expression and splicing quantitative trait loci and showed that many neurological disease susceptibility loci are mediated through gene expression or splicing in microglia. Fine-mapping of these loci nominated candidate causal variants that are within microglia-specific enhancers, finding associations with microglial expression of USP6NL for Alzheimer’s disease and P2RY12 for Parkinson’s disease. We have built the most comprehensive catalog to date of genetic effects on the microglial transcriptome and propose candidate functional variants in neurological and psychiatric disorders. Transcriptomic analyses of 255 primary human microglial samples from 100 individuals highlight brain region, age, sex and disease states as sources of microglial heterogeneity. Molecular quantitative trait locus analyses implicate variants involved in neurological diseases through effects on gene expression and splicing.

66 citations

Posted ContentDOI
29 Jan 2020-bioRxiv
TL;DR: The eQTL Catalogue is presented, a resource which contains quality controlled, uniformly recomputed QTLs from 21 eQtl studies, and it is found that for matching cell types and tissues, the eZTL effect sizes are highly reproducible between studies, enabling the integrative analysis of these data.
Abstract: An increasing number of gene expression quantitative trait locus (QTL) studies have made summary statistics publicly available, which can be used to gain insight into human complex traits by downstream analyses such as fine-mapping and colocalisation. However, differences between these datasets in their variants tested, allele codings, and in the transcriptional features quantified are a barrier to their widespread use. Here, we present the eQTL Catalogue, a resource which contains quality controlled, uniformly re-computed QTLs from 19 eQTL publications. In addition to gene expression QTLs, we have also identified QTLs at the level of exon expression, transcript usage, and promoter, splice junction and 3ʹ end usage. Our summary statistics can be downloaded by FTP or accessed via a REST API and are also accessible via the Open Targets Genetics Portal. We demonstrate how the eQTL Catalogue and GWAS Catalog APIs can be used to perform colocalisation analysis between GWAS and QTL results without downloading and reformatting summary statistics. New datasets will continuously be added to the eQTL Catalogue, enabling systematic interpretation of human GWAS associations across a large number of cell types and tissues. The eQTL Catalogue is available at https://www.ebi.ac.uk/eqtl/.

62 citations

Journal ArticleDOI
TL;DR: In this article , the authors describe the transcriptome analysis of 255 primary human microglial samples isolated at autopsy from multiple brain regions of 100 individuals and performed systematic analyses to investigate various aspects of microglia heterogeneities, including brain region and aging.
Abstract: Microglia have emerged as important players in brain aging and pathology. To understand how genetic risk for neurological and psychiatric disorders is related to microglial function, large transcriptome studies are essential. Here we describe the transcriptome analysis of 255 primary human microglial samples isolated at autopsy from multiple brain regions of 100 individuals. We performed systematic analyses to investigate various aspects of microglial heterogeneities, including brain region and aging. We mapped expression and splicing quantitative trait loci and showed that many neurological disease susceptibility loci are mediated through gene expression or splicing in microglia. Fine-mapping of these loci nominated candidate causal variants that are within microglia-specific enhancers, finding associations with microglial expression of USP6NL for Alzheimer’s disease and P2RY12 for Parkinson’s disease. We have built the most comprehensive catalog to date of genetic effects on the microglial transcriptome and propose candidate functional variants in neurological and psychiatric disorders. Transcriptomic analyses of 255 primary human microglial samples from 100 individuals highlight brain region, age, sex and disease states as sources of microglial heterogeneity. Molecular quantitative trait locus analyses implicate variants involved in neurological diseases through effects on gene expression and splicing.

60 citations

References
More filters
Journal ArticleDOI
TL;DR: Burrows-Wheeler Alignment tool (BWA) is implemented, a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps.
Abstract: Motivation: The enormous amount of short reads generated by the new DNA sequencing technologies call for the development of fast and accurate read alignment programs. A first generation of hash table-based methods has been developed, including MAQ, which is accurate, feature rich and fast enough to align short reads from a single individual. However, MAQ does not support gapped alignment for single-end reads, which makes it unsuitable for alignment of longer reads where indels may occur frequently. The speed of MAQ is also a concern when the alignment is scaled up to the resequencing of hundreds of individuals. Results: We implemented Burrows-Wheeler Alignment tool (BWA), a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps. BWA supports both base space reads, e.g. from Illumina sequencing machines, and color space reads from AB SOLiD machines. Evaluations on both simulated and real data suggest that BWA is ~10–20× faster than MAQ, while achieving similar accuracy. In addition, BWA outputs alignment in the new standard SAM (Sequence Alignment/Map) format. Variant calling and other downstream analyses after the alignment can be achieved with the open source SAMtools software package. Availability: http://maq.sourceforge.net Contact: [email protected]

43,862 citations


"A map of transcriptional heterogene..." refers methods in this paper

  • ...The ATAC-seq data were aligned using bwa (H. Li and Durbin 2009) (version 0.7.4)....

    [...]

Journal ArticleDOI
TL;DR: The Spliced Transcripts Alignment to a Reference (STAR) software based on a previously undescribed RNA-seq alignment algorithm that uses sequential maximum mappable seed search in uncompressed suffix arrays followed by seed clustering and stitching procedure outperforms other aligners by a factor of >50 in mapping speed.
Abstract: Motivation Accurate alignment of high-throughput RNA-seq data is a challenging and yet unsolved problem because of the non-contiguous transcript structure, relatively short read lengths and constantly increasing throughput of the sequencing technologies. Currently available RNA-seq aligners suffer from high mapping error rates, low mapping speed, read length limitation and mapping biases. Results To align our large (>80 billon reads) ENCODE Transcriptome RNA-seq dataset, we developed the Spliced Transcripts Alignment to a Reference (STAR) software based on a previously undescribed RNA-seq alignment algorithm that uses sequential maximum mappable seed search in uncompressed suffix arrays followed by seed clustering and stitching procedure. STAR outperforms other aligners by a factor of >50 in mapping speed, aligning to the human genome 550 million 2 × 76 bp paired-end reads per hour on a modest 12-core server, while at the same time improving alignment sensitivity and precision. In addition to unbiased de novo detection of canonical junctions, STAR can discover non-canonical splices and chimeric (fusion) transcripts, and is also capable of mapping full-length RNA sequences. Using Roche 454 sequencing of reverse transcription polymerase chain reaction amplicons, we experimentally validated 1960 novel intergenic splice junctions with an 80-90% success rate, corroborating the high precision of the STAR mapping strategy. Availability and implementation STAR is implemented as a standalone C++ code. STAR is free open source software distributed under GPLv3 license and can be downloaded from http://code.google.com/p/rna-star/.

30,684 citations


"A map of transcriptional heterogene..." refers methods in this paper

  • ...Both Smart-seq2 and bulk RNA-seq data were aligned using STAR (Dobin et al. 2013) (version 2....

    [...]

  • ...Both Smart-seq2 and bulk RNA-seq data were aligned using STAR (Dobin et al. 2013) (version 2.5.3a; see URLs) using ENSEMBL human gene assembly 90 as the reference transcriptome....

    [...]

  • ...GTEx: https://www.gtexportal.org/home/datasets PBMC 68k cell data: https://github.com/10XGenomics/single-cell-3prime-paper/tree/master/pbmc68k_analysis RASQUAL (https://github.com/natsuhiko/rasqual) 1000 Genomes Phase III integrated variant set (http://www.internationalgenome.org/data) Beagle 4.0 (https://faculty.washington.edu/browning/beagle/b4_0.html) bwa 0.7.4 (https://sourceforge.net/projects/bio-bwa/files/) skewer 0.1.127 (https://github.com/relipmoc/skewer) STAR (https://github.com/alexdobin/STAR/releases) featureCounts (http://subread.sourceforge.net/)...

    [...]

Journal ArticleDOI
TL;DR: FeatureCounts as discussed by the authors is a read summarization program suitable for counting reads generated from either RNA or genomic DNA sequencing experiments, which implements highly efficient chromosome hashing and feature blocking techniques.
Abstract: MOTIVATION: Next-generation sequencing technologies generate millions of short sequence reads, which are usually aligned to a reference genome. In many applications, the key information required for downstream analysis is the number of reads mapping to each genomic feature, for example to each exon or each gene. The process of counting reads is called read summarization. Read summarization is required for a great variety of genomic analyses but has so far received relatively little attention in the literature. RESULTS: We present featureCounts, a read summarization program suitable for counting reads generated from either RNA or genomic DNA sequencing experiments. featureCounts implements highly efficient chromosome hashing and feature blocking techniques. It is considerably faster than existing methods (by an order of magnitude for gene-level summarization) and requires far less computer memory. It works with either single or paired-end reads and provides a wide range of options appropriate for different sequencing applications. AVAILABILITY AND IMPLEMENTATION: featureCounts is available under GNU General Public License as part of the Subread (http://subread.sourceforge.net) or Rsubread (http://www.bioconductor.org) software packages.

14,103 citations

Journal ArticleDOI
TL;DR: A droplet-based system that enables 3′ mRNA counting of tens of thousands of single cells per sample is described and sequence variation in the transcriptome data is used to determine host and donor chimerism at single-cell resolution from bone marrow mononuclear cells isolated from transplant patients.
Abstract: Characterizing the transcriptome of individual cells is fundamental to understanding complex biological systems. We describe a droplet-based system that enables 3′ mRNA counting of tens of thousands of single cells per sample. Cell encapsulation, of up to 8 samples at a time, takes place in ∼6 min, with ∼50% cell capture efficiency. To demonstrate the system’s technical performance, we collected transcriptome data from ∼250k single cells across 29 samples. We validated the sensitivity of the system and its ability to detect rare populations using cell lines and synthetic RNAs. We profiled 68k peripheral blood mononuclear cells to demonstrate the system’s ability to characterize large immune populations. Finally, we used sequence variation in the transcriptome data to determine host and donor chimerism at single-cell resolution from bone marrow mononuclear cells isolated from transplant patients. Single-cell gene expression analysis is challenging. This work describes a new droplet-based single cell RNA-seq platform capable of processing tens of thousands of cells across 8 independent samples in minutes, and demonstrates cellular subtypes and host–donor chimerism in transplant patients.

4,219 citations


"A map of transcriptional heterogene..." refers methods or result in this paper

  • ...We then compared our single cell data to public datasets of 68K PBMCs isolated from a healthy donor (Zheng et al. 2017) and 15K brain cells from 5 GTEx donors (Habib et al....

    [...]

  • ...We then compared our single cell data to public datasets of 68K PBMCs isolated from a healthy donor (Zheng et al. 2017) and 15K brain cells from 5 GTEx donors (Habib et al. 2017)....

    [...]

  • ...Pink dots show our samples. d. UMAP of single-cell RNA-seq data combined with 68K PBMC scRNA-seq (Zheng et al. 2017) and whole brain DroNc-seq (Habib et al. 2017)....

    [...]

  • ...We then performed cell type clustering with other primary single cell RNA-seq of 68k PBMCs and GTEx brain tissues (Zheng et al. 2017; Habib et al. 2017) (see below)....

    [...]

Journal ArticleDOI
TL;DR: In addition to the APOE locus (encoding apolipoprotein E), 19 loci reached genome-wide significance (P < 5 × 10−8) in the combined stage 1 and stage 2 analysis, of which 11 are newly associated with Alzheimer's disease.
Abstract: Eleven susceptibility loci for late-onset Alzheimer's disease (LOAD) were identified by previous studies; however, a large portion of the genetic risk for this disease remains unexplained. We conducted a large, two-stage meta-analysis of genome-wide association studies (GWAS) in individuals of European ancestry. In stage 1, we used genotyped and imputed data (7,055,881 SNPs) to perform meta-analysis on 4 previously published GWAS data sets consisting of 17,008 Alzheimer's disease cases and 37,154 controls. In stage 2, 11,632 SNPs were genotyped and tested for association in an independent set of 8,572 Alzheimer's disease cases and 11,312 controls. In addition to the APOE locus (encoding apolipoprotein E), 19 loci reached genome-wide significance (P < 5 × 10−8) in the combined stage 1 and stage 2 analysis, of which 11 are newly associated with Alzheimer's disease.

3,726 citations

Related Papers (5)
Frequently Asked Questions (8)
Q1. What are the contributions in "A map of transcriptional heterogeneity and regulatory variation in human microglia" ?

The authors mapped expression quantitative trait loci ( eQTLs ) in human microglia and show that many diseaseassociated eQTLs in microglia replicate well in a human induced pluripotent stem cell ( hIPSC ) derived macrophage model system. Preprint ( which was not certified by peer review ) is the author/funder. Using ATAC-seq from 95 individuals in this hIPSC model the authors fine-map candidate causal variants at risk loci for Alzheimer ’ s disease, the most prevalent neurodegenerative condition in acute brain injury patients. 

Samples were added to a 1.5 ml Eppendorf to which 350 µl of RNAlater (Qiagen) was added, samples were stored at -80°C prior to sequencingDNA extraction was performed from the venous blood. 

200 ng of gDNA was used for input for the SNP array (Infinium Omni2.5-8 v1.4 Kit) and genotyping was performed according to the manufacturer's instructions. 

Across 36 associated loci the authors used GCTA to identify independently associated SNPs with a threshold of p < 10-5, based on LD from 10,000 randomly-sampled UK Biobank individuals. 

The authors used --no-posteriorupdate option to keep the posterior genotype dosage identical to the prior genotype dosage, that allowed us to stabilise the convergence of model fitting. 

The white cell component was extracted and transferred to a 1.5ml Eppendorf and stored as a frozen pellet at -80C prior to sequencing. 

In total the authors sequenced 26,496 cells, of which 9,538 cells passed the quality control criteria: the minimum number of sequenced fragments (>10,000 autosomal fragments), the minimum number of expressed genes (>500 autosomal genes), mitochondrial fragment percentage (<20%) and the library complexity (percentage of autosomal fragment counts for the top 100 highly expressed genes<30%). 

For the BIN1 and PTK2B loci, the authors used GCTA --cojo-cond to determine summary statistics for each of the two independent signals at each locus, with a window of +/- 500 kb around each lead SNP.