scispace - formally typeset
Open accessPosted ContentDOI: 10.1101/2021.03.01.433439

Brain expression quantitative trait locus and network analysis reveals downstream effects and putative drivers for brain-related diseases

02 Mar 2021-bioRxiv (Cold Spring Harbor Laboratory)-
Abstract: Gaining insight into the downstream consequences of non-coding variants is an essential step towards the identification of therapeutic targets from genome-wide association study (GWAS) findings. Here we have harmonized and integrated 8,727 RNA-seq samples with accompanying genotype data from multiple brain-regions from 14 datasets. This sample size enabled us to perform both cis- and trans-expression quantitative locus (eQTL) mapping. Upon comparing the brain cortex cis-eQTLs (for 12,307 unique genes at FDR We inferred the brain cell type for 1,515 cis-eQTLs by using cell type proportion information. We conducted Mendelian Randomization on 31 brain-related traits using cis-eQTLs as instruments and found 159 significant findings that also passed colocalization. Furthermore, two multiple sclerosis (MS) findings had cell type specific signals, a neuron-specific cis-eQTL for CYP24A1 and a macrophage specific cis-eQTL for CLECL1. To further interpret GWAS hits, we performed trans-eQTL analysis. We identified 2,589 trans-eQTLs (at FDR We also generated a brain-specific gene-coregulation network that we used to predict which genes have brain-specific functions, and to perform a novel network analysis of Alzheimer’s disease (AD), amyotrophic lateral sclerosis (ALS), multiple sclerosis (MS) and Parkinson’s disease (PD) GWAS data. This resulted in the identification of distinct sets of genes that show significantly enriched co-regulation with genes inside the associated GWAS loci, and which might reflect drivers of these diseases.

... read more


9 results found

Open accessPosted ContentDOI: 10.21203/RS.3.RS-322430/V1
18 Mar 2021-medRxiv
Abstract: Amyotrophic lateral sclerosis (ALS) is a fatal neurodegenerative disease with a life-time risk of 1 in 350 people and an unmet need for disease-modifying therapies. We conducted a cross-ancestry GWAS in ALS including 29,612 ALS patients and 122,656 controls which identified 15 risk loci in ALS. When combined with 8,953 whole-genome sequenced individuals (6,538 ALS patients, 2,415 controls) and the largest cortex-derived eQTL dataset (MetaBrain), analyses revealed locus-specific genetic architectures in which we prioritized genes either through rare variants, repeat expansions or regulatory effects. ALS associated risk loci were shared with multiple traits within the neurodegenerative spectrum, but with distinct enrichment patterns across brain regions and cell-types. Across environmental and life-style risk factors obtained from literature, Mendelian randomization analyses indicated a causal role for high cholesterol levels. All ALS associated signals combined reveal a role for perturbations in vesicle mediated transport and autophagy, and provide evidence for cell-autonomous disease initiation in glutamatergic neurons.

... read more

9 Citations

Open accessJournal ArticleDOI: 10.1016/J.AJHG.2021.07.011
Nil Aygün1, Angela L. Elwell1, Dan Liang1, Michael J. Lafferty1  +9 moreInstitutions (1)
Abstract: Interpretation of the function of non-coding risk loci for neuropsychiatric disorders and brain-relevant traits via gene expression and alternative splicing quantitative trait locus (e/sQTL) analyses is generally performed in bulk post-mortem adult tissue. However, genetic risk loci are enriched in regulatory elements active during neocortical differentiation, and regulatory effects of risk variants may be masked by heterogeneity in bulk tissue. Here, we map e/sQTLs, and allele-specific expression in cultured cells representing two major developmental stages, primary human neural progenitors (n = 85) and their sorted neuronal progeny (n = 74), identifying numerous loci not detected in either bulk developing cortical wall or adult cortex. Using colocalization and genetic imputation via transcriptome-wide association, we uncover cell-type-specific regulatory mechanisms underlying risk for brain-relevant traits that are active during neocortical differentiation. Specifically, we identified a progenitor-specific eQTL for CENPW co-localized with common variant associations for cortical surface area and educational attainment.

... read more

2 Citations

Open accessPosted ContentDOI: 10.1101/2021.07.27.21261187
Bingxin Zhao1, Tengfei Li1, Stephen M. Smith2, Di Xiong1  +11 moreInstitutions (4)
30 Jul 2021-medRxiv
Abstract: The human cerebral cortex plays a crucial role in brain functions. However, genetic influences on the human cortical functional organizations are not well understood. Using a parcellation-based approach with resting-state and task-evoked functional magnetic resonance imaging (fMRI) from 40,253 individuals, we identified 47 loci associated with functional areas and networks at rest, 15 of which also affected the functional connectivity during task performance. Heritability and locus-specific genetic effects patterns were observed across different brain functional areas and networks. Specific functional areas and networks were identified to share genetic influences with cognition, mental health, and major brain disorders (such as Alzheimer’s disease and schizophrenia). For example, in both resting and task fMRI, the APOE e4 locus strongly associated with Alzheimer’s disease was particularly associated with the visual cortex in the secondary visual and default mode networks. In summary, by analyzing biobank-scale fMRI data in high-resolution brain parcellation, this study substantially advances our understanding of the genetic determinants of cerebral cortex functions, and the genetic links between brain functions and complex brain traits and disorders.

... read more

Topics: Functional magnetic resonance imaging (60%), Default mode network (58%), Visual cortex (56%) ... show more

1 Citations

Open accessPosted ContentDOI: 10.1101/2021.10.09.21264604
Julien Bryois, Daniela Calini, W. Macnair, L. Foo  +13 moreInstitutions (6)
14 Oct 2021-medRxiv
Abstract: Most expression quantitative trait loci (eQTL) studies to date have been performed in heterogeneous brain tissues as opposed to specific cell types. To investigate the genetics of gene expression in adult human cell types from the central nervous system (CNS), we performed an eQTL analysis using single nuclei RNA-seq from 196 individuals in eight CNS cell types. We identified 6108 eGenes, a substantial fraction (43%, 2620 out of 6108) of which show cell-type specific effects, with strongest effects in microglia. Integration of CNS cell-type eQTLs with GWAS revealed novel relationships between expression and disease risk for neuropsychiatric and neurodegenerative diseases. For most GWAS loci, a single gene colocalized in a single cell type providing new clues into disease etiology. Our findings demonstrate substantial contrast in genetic regulation of gene expression among CNS cell types and reveal genetic mechanisms by which disease risk genes influence neurological disorders.

... read more

Open accessPosted ContentDOI: 10.1101/2021.10.21.21265342
26 Oct 2021-medRxiv
Abstract: Genetic variants identified through genome-wide association studies (GWAS) are typically non-coding and exert small regulatory effects on downstream genes, but which downstream genes are ultimately impacted and how they confer risk remains mostly unclear. Conversely, variants that cause rare Mendelian diseases are often coding and have a more direct impact on disease development. We demonstrate that common and rare genetic diseases can be linked by studying the gene regulatory networks impacted by common disease-associated variants. We implemented this in the ‘Downstreamer’ method and applied it to 44 GWAS traits and find that predicted downstream “key genes” are enriched with Mendelian disease genes, e.g. key genes for height are enriched for genes that cause skeletal abnormalities and Ehlers-Danlos syndromes. We find that 82% of these key genes are located outside of GWAS loci, suggesting that they result from complex trans regulation rather than being impacted by disease-associated variants in cis. Finally, we discuss the challenges in reconstructing gene regulatory networks and provide a roadmap to improve identification of these highly connected genes for common traits and diseases.

... read more

Topics: Genome-wide association study (54%), Gene (51%)


111 results found

Open accessJournal ArticleDOI: 10.1093/BIOINFORMATICS/BTP616
01 Jan 2010-Bioinformatics
Abstract: Summary: It is expected that emerging digital gene expression (DGE) technologies will overtake microarray technologies in the near future for many functional genomics applications. One of the fundamental data analysis tasks, especially for gene expression studies, involves determining whether there is evidence that counts for a transcript or exon are significantly different across experimental conditions. edgeR is a Bioconductor software package for examining differential expression of replicated count data. An overdispersed Poisson model is used to account for both biological and technical variability. Empirical Bayes methods are used to moderate the degree of overdispersion across transcripts, improving the reliability of inference. The methodology can be used even with the most minimal levels of replication, provided at least one phenotype or experimental condition is replicated. The software may have other applications beyond sequencing data, such as proteome peptide count data. Availability: The package is freely available under the LGPL licence from the Bioconductor web site (

... read more

Topics: Bioconductor (64%)

21,575 Citations

Open accessJournal ArticleDOI: 10.1093/BIOINFORMATICS/BTS635
01 Jan 2013-Bioinformatics
Abstract: Motivation Accurate alignment of high-throughput RNA-seq data is a challenging and yet unsolved problem because of the non-contiguous transcript structure, relatively short read lengths and constantly increasing throughput of the sequencing technologies. Currently available RNA-seq aligners suffer from high mapping error rates, low mapping speed, read length limitation and mapping biases. Results To align our large (>80 billon reads) ENCODE Transcriptome RNA-seq dataset, we developed the Spliced Transcripts Alignment to a Reference (STAR) software based on a previously undescribed RNA-seq alignment algorithm that uses sequential maximum mappable seed search in uncompressed suffix arrays followed by seed clustering and stitching procedure. STAR outperforms other aligners by a factor of >50 in mapping speed, aligning to the human genome 550 million 2 × 76 bp paired-end reads per hour on a modest 12-core server, while at the same time improving alignment sensitivity and precision. In addition to unbiased de novo detection of canonical junctions, STAR can discover non-canonical splices and chimeric (fusion) transcripts, and is also capable of mapping full-length RNA sequences. Using Roche 454 sequencing of reverse transcription polymerase chain reaction amplicons, we experimentally validated 1960 novel intergenic splice junctions with an 80-90% success rate, corroborating the high precision of the STAR mapping strategy. Availability and implementation STAR is implemented as a standalone C++ code. STAR is free open source software distributed under GPLv3 license and can be downloaded from

... read more

Topics: MRNA Sequencing (57%)

20,172 Citations

Open accessJournal ArticleDOI: 10.1093/BIOINFORMATICS/BTU638
15 Jan 2015-Bioinformatics
Abstract: Motivation: A large choice of tools exists for many standard tasks in the analysis of high-throughput sequencing (HTS) data. However, once a project deviates from standard workflows, custom scripts are needed. Results: We present HTSeq, a Python library to facilitate the rapid development of such scripts. HTSeq offers parsers for many common data formats in HTS projects, as well as classes to represent data, such as genomic coordinates, sequences, sequencing reads, alignments, gene model information and variant calls, and provides data structures that allow for querying via genomic coordinates. We also present htseq-count, a tool developed with HTSeq that preprocesses RNA-Seq data for differential expression analysis by counting the overlap of reads with genes. Availability and implementation: HTSeq is released as an opensource software under the GNU General Public Licence and available from or from the Python Package Index at Contact:

... read more

11,833 Citations

Open accessBook
01 Jun 1974-
Abstract: Since the lm function provides a lot of features it is rather complicated. So we are going to instead use the function lsfit as a model. It computes only the coefficient estimates and the residuals. Now would be a good time to read the help file for lsfit. Note that lsfit supports the fitting of multiple least squares models and weighted least squares. Our function will not, hence we can omit the arguments wt, weights and yname. Also, changing tolerances is a little advanced so we will trust the default values and omit the argument tolerance as well.

... read more

6,633 Citations

Open accessJournal ArticleDOI: 10.1038/S41592-019-0686-2
Pauli Virtanen1, Ralf Gommers, Travis E. Oliphant, Matt Haberland2  +33 moreInstitutions (15)
03 Feb 2020-Nature Methods
Abstract: SciPy is an open-source scientific computing library for the Python programming language. Since its initial release in 2001, SciPy has become a de facto standard for leveraging scientific algorithms in Python, with over 600 unique code contributors, thousands of dependent packages, over 100,000 dependent repositories and millions of downloads per year. In this work, we provide an overview of the capabilities and development practices of SciPy 1.0 and highlight some recent technical developments.

... read more

6,244 Citations

No. of citations received by the Paper in previous years