scispace - formally typeset
Open accessJournal ArticleDOI: 10.1016/J.AJHG.2021.01.012

Genetic control of the human brain proteome

04 Mar 2021-American Journal of Human Genetics (Cell Press)-Vol. 108, Iss: 3, pp 400-410
Abstract: We generated an online brain pQTL resource for 7,376 proteins through the analysis of genetic and proteomic data derived from post-mortem samples of the dorsolateral prefrontal cortex of 330 older adults. The identified pQTLs tend to be non-synonymous variation, are over-represented among variants associated with brain diseases, and replicate well (77%) in an independent brain dataset. Comparison to a large study of brain eQTLs revealed that about 75% of pQTLs are also eQTLs. In contrast, about 40% of eQTLs were identified as pQTLs. These results are consistent with lower pQTL mapping power and greater evolutionary constraint on protein abundance. The latter is additionally supported by observations of pQTLs with large effects' tending to be rare, deleterious, and associated with proteins that have evidence for fewer protein-protein interactions. Mediation analyses using matched transcriptomic and proteomic data provided additional evidence that pQTL effects are often, but not always, mediated by mRNA. Specifically, we identified roughly 1.6 times more mRNA-mediated pQTLs than mRNA-independent pQTLs (550 versus 341). Our pQTL resource provides insight into the functional consequences of genetic variation in the human brain and a basis for novel investigations of genetics and disease.

... read more

Topics: Human brain (50%)

5 results found

Open accessJournal ArticleDOI: 10.1186/S13024-020-00405-4
Abstract: Tau neurofibrillary tangle pathology characterizes Alzheimer’s disease and other neurodegenerative tauopathies. Brain gene expression profiles can reveal mechanisms; however, few studies have systematically examined both the transcriptome and proteome or differentiated Tau- versus age-dependent changes. Paired, longitudinal RNA-sequencing and mass-spectrometry were performed in a Drosophila model of tauopathy, based on pan-neuronal expression of human wildtype Tau (TauWT) or a mutant form causing frontotemporal dementia (TauR406W). Tau-induced, differentially expressed transcripts and proteins were examined cross-sectionally or using linear regression and adjusting for age. Hierarchical clustering was performed to highlight network perturbations, and we examined overlaps with human brain gene expression profiles in tauopathy. TauWT induced 1514 and 213 differentially expressed transcripts and proteins, respectively. TauR406W had a substantially greater impact, causing changes in 5494 transcripts and 697 proteins. There was a ~ 70% overlap between age- and Tau-induced changes and our analyses reveal pervasive bi-directional interactions. Strikingly, 42% of Tau-induced transcripts were discordant in the proteome, showing opposite direction of change. Tau-responsive gene expression networks strongly implicate innate immune activation. Cross-species analyses pinpoint human brain gene perturbations specifically triggered by Tau pathology and/or aging, and further differentiate between disease amplifying and protective changes. Our results comprise a powerful, cross-species functional genomics resource for tauopathy, revealing Tau-mediated disruption of gene expression, including dynamic, age-dependent interactions between the brain transcriptome and proteome.

... read more

Topics: Tauopathy (59%), Transcriptome (56%), Neurofibrillary tangle (55%) ... read more

16 Citations

Open accessPosted ContentDOI: 10.1101/2021.04.05.438450
Erik C. B. Johnson1, Carter Ek1, Eric B. Dammer1, Duc M. Duong1  +19 moreInstitutions (5)
06 Apr 2021-bioRxiv
Abstract: The biological processes that are disrupted in the Alzheimer’s disease (AD) brain remain incompletely understood. We recently performed a proteomic analysis of >2000 brains to better understand these changes, which highlighted alterations in astrocytes and microglia as likely key drivers of disease. Here, we extend this analysis by analyzing >1000 brain tissues using a tandem mass tag mass spectrometry (TMT-MS) pipeline, which allowed us to nearly triple the number of quantified proteins across cases. A consensus protein co-expression network analysis of this deeper dataset revealed new co-expression modules that were highly preserved across cohorts and brain regions, and strongly altered in AD. Nearly half of the protein co-expression modules, including modules significantly altered in AD, were not observed in RNA networks from the same cohorts and brain regions, highlighting the proteopathic nature of AD. Two such AD-associated modules unique to the proteomic network included a module related to MAPK signaling and metabolism, and a module related to the matrisome. Analysis of paired genomic and proteomic data within subjects showed that expression level of the matrisome module was influenced by the APOE e4 genotype, but was not related to the rate of cognitive decline after adjustment for neuropathology. In contrast, the MAPK/metabolism module was strongly associated with the rate of cognitive decline. Disease-associated modules unique to the proteome are sources of promising therapeutic targets and biomarkers for AD.

... read more

Topics: Cognitive decline (54%)

3 Citations

Open accessJournal ArticleDOI: 10.1016/J.ISCI.2021.102925
20 Aug 2021-iScience
Abstract: Summary Health is often qualitatively defined as a status free from disease and its quantitative definition requires finding the boundary separating health from pathological conditions. Since many complex diseases have a strong genetic component, substantial efforts have been made to sequence large-scale personal genomes; however, we are not yet able to effectively quantify health status from personal genomes. Since mutational impacts are ultimately manifested at the protein level, we envision that introducing a panoramic proteomic view of complex diseases will allow us to mechanistically understand the molecular etiologies of human diseases. In this perspective article, we will highlight key proteomic approaches to identify pathogenic mutations and map their convergent pathways underlying disease pathogenesis and the integration of omics data at multiple levels to define the borderline between health and disease.

... read more

1 Citations

Open accessJournal ArticleDOI: 10.3390/GENES12060815
Selina M. Vattathil1, Yue Liu1, Nadia V. Harerimana1, Adriana Lori1  +11 moreInstitutions (5)
26 May 2021-Genes
Abstract: Cerebral atherosclerosis is a leading cause of stroke and an important contributor to dementia. Yet little is known about its genetic basis. To examine the association of common single nucleotide polymorphisms with cerebral atherosclerosis severity, we conducted a genomewide association study (GWAS) using data collected as part of two community-based cohort studies in the United States, the Religious Orders Study (ROS) and Rush Memory and Aging Project (MAP). Both studies enroll older individuals and exclude participants with signs of dementia at baseline. From our analysis of 1325 participants of European ancestry who had genotype and neuropathologically assessed cerebral atherosclerosis measures available, we found a novel locus for cerebral atherosclerosis in NTNG1. The locus comprises eight SNPs, including two independent significant SNPs: rs6664221 (β = -0.27, 95% CI = (-0.35, -0.19), p = 1.29 × 10-10) and rs10881463 (β = -0.20, 95% CI = (-0.27, -0.13), p = 3.40 × 10-8). We further found that the SNPs may influence cerebral atherosclerosis by regulating brain protein expression of CNOT3. CNOT3 is a subunit of CCR4-NOT, which has been shown to be a master regulator of mRNA stability and translation and an important complex for cholesterol homeostasis. In summary, we identify a novel genetic locus for cerebral atherosclerosis and a potential mechanism linking this variation to cerebral atherosclerosis progression. These findings offer insights into the genetic effects on cerebral atherosclerosis.

... read more

Open accessJournal ArticleDOI: 10.1016/J.TIG.2021.09.013
02 Nov 2021-Trends in Genetics
Abstract: There has been a rapid increase in human genome sequencing in the past two decades, resulting in the identification of millions of previously unknown genetic variants. However, African populations are under-represented in sequencing efforts. Additional sequencing from diverse African populations and the construction of African-specific reference genomes is needed to better characterize the full spectrum of variation in humans. However, sequencing alone is insufficient to address the molecular and cellular mechanisms underlying variable phenotypes and disease risks. Determining functional consequences of genetic variation using multi-omics approaches is a fundamental post-genomic challenge. We discuss approaches to close the knowledge gaps about African genomic diversity and review advances in African integrative genomic studies and their implications for precision medicine.

... read more

Topics: Genomics (59%)

72 results found

Open accessJournal ArticleDOI: 10.1093/BIOINFORMATICS/BTP324
Heng Li1, Richard Durbin1Institutions (1)
01 Jul 2009-Bioinformatics
Abstract: Motivation: The enormous amount of short reads generated by the new DNA sequencing technologies call for the development of fast and accurate read alignment programs. A first generation of hash table-based methods has been developed, including MAQ, which is accurate, feature rich and fast enough to align short reads from a single individual. However, MAQ does not support gapped alignment for single-end reads, which makes it unsuitable for alignment of longer reads where indels may occur frequently. The speed of MAQ is also a concern when the alignment is scaled up to the resequencing of hundreds of individuals. Results: We implemented Burrows-Wheeler Alignment tool (BWA), a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps. BWA supports both base space reads, e.g. from Illumina sequencing machines, and color space reads from AB SOLiD machines. Evaluations on both simulated and real data suggest that BWA is ~10–20× faster than MAQ, while achieving similar accuracy. In addition, BWA outputs alignment in the new standard SAM (Sequence Alignment/Map) format. Variant calling and other downstream analyses after the alignment can be achieved with the open source SAMtools software package. Availability: Contact: [email protected]

... read more

Topics: Hybrid genome assembly (54%), Sequence assembly (53%), 2 base encoding (52%) ... read more

35,234 Citations

Open accessJournal ArticleDOI: 10.1086/519795
Shaun Purcell1, Shaun Purcell2, Benjamin M. Neale3, Benjamin M. Neale1  +14 moreInstitutions (4)
Abstract: Whole-genome association studies (WGAS) bring new computational, as well as analytic, challenges to researchers. Many existing genetic-analysis tools are not designed to handle such large data sets in a convenient manner and do not necessarily exploit the new opportunities that whole-genome data bring. To address these issues, we developed PLINK, an open-source C/C++ WGAS tool set. With PLINK, large data sets comprising hundreds of thousands of markers genotyped for thousands of individuals can be rapidly manipulated and analyzed in their entirety. As well as providing tools to make the basic analytic steps computationally efficient, PLINK also supports some novel approaches to whole-genome data that take advantage of whole-genome coverage. We introduce PLINK and describe the five main domains of function: data management, summary statistics, population stratification, association analysis, and identity-by-descent estimation. In particular, we focus on the estimation and use of identity-by-state and identity-by-descent information in the context of population-based whole-genome studies. This information can be used to detect and correct for population stratification and to identify extended chromosomal segments that are shared identical by descent between very distantly related individuals. Analysis of the patterns of segmental sharing has the potential to map disease loci that contain multiple rare variants in a population-based linkage analysis.

... read more

22,115 Citations

Open accessJournal ArticleDOI: 10.1093/BIOINFORMATICS/BTS635
01 Jan 2013-Bioinformatics
Abstract: Motivation Accurate alignment of high-throughput RNA-seq data is a challenging and yet unsolved problem because of the non-contiguous transcript structure, relatively short read lengths and constantly increasing throughput of the sequencing technologies. Currently available RNA-seq aligners suffer from high mapping error rates, low mapping speed, read length limitation and mapping biases. Results To align our large (>80 billon reads) ENCODE Transcriptome RNA-seq dataset, we developed the Spliced Transcripts Alignment to a Reference (STAR) software based on a previously undescribed RNA-seq alignment algorithm that uses sequential maximum mappable seed search in uncompressed suffix arrays followed by seed clustering and stitching procedure. STAR outperforms other aligners by a factor of >50 in mapping speed, aligning to the human genome 550 million 2 × 76 bp paired-end reads per hour on a modest 12-core server, while at the same time improving alignment sensitivity and precision. In addition to unbiased de novo detection of canonical junctions, STAR can discover non-canonical splices and chimeric (fusion) transcripts, and is also capable of mapping full-length RNA sequences. Using Roche 454 sequencing of reverse transcription polymerase chain reaction amplicons, we experimentally validated 1960 novel intergenic splice junctions with an 80-90% success rate, corroborating the high precision of the STAR mapping strategy. Availability and implementation STAR is implemented as a standalone C++ code. STAR is free open source software distributed under GPLv3 license and can be downloaded from

... read more

Topics: MRNA Sequencing (57%)

20,172 Citations

Open accessJournal ArticleDOI: 10.1101/GR.107524.110
Aaron McKenna1, Matthew Hanna, Eric Banks, Andrey Sivachenko  +7 moreInstitutions (1)
01 Sep 2010-Genome Research
Abstract: Next-generation DNA sequencing (NGS) projects, such as the 1000 Genomes Project, are already revolutionizing our understanding of genetic variation among individuals. However, the massive data sets generated by NGS—the 1000 Genome pilot alone includes nearly five terabases—make writing feature-rich, efficient, and robust analysis tools difficult for even computationally sophisticated individuals. Indeed, many professionals are limited in the scope and the ease with which they can answer scientific questions by the complexity of accessing and manipulating the data produced by these machines. Here, we discuss our Genome Analysis Toolkit (GATK), a structured programming framework designed to ease the development of efficient and robust analysis tools for next-generation DNA sequencers using the functional programming philosophy of MapReduce. The GATK provides a small but rich set of data access patterns that encompass the majority of analysis tool needs. Separating specific analysis calculations from common data management infrastructure enables us to optimize the GATK framework for correctness, stability, and CPU and memory efficiency and to enable distributed and shared memory parallelization. We highlight the capabilities of the GATK by describing the implementation and application of robust, scale-tolerant tools like coverage calculators and single nucleotide polymorphism (SNP) calling. We conclude that the GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.

... read more

Topics: Variant Call Format (52%), Software framework (50%)

16,404 Citations

Open accessJournal ArticleDOI: 10.1038/NG.806
Mark A. DePristo1, Eric Banks1, Ryan Poplin1, Kiran V. Garimella1  +19 moreInstitutions (3)
01 May 2011-Nature Genetics
Abstract: Recent advances in sequencing technology make it possible to comprehensively catalogue genetic variation in population samples, creating a foundation for understanding human disease, ancestry and evolution. The amounts of raw data produced are prodigious and many computational steps are required to translate this output into high-quality variant calls. We present a unified analytic framework to discover and genotype variation among multiple samples simultaneously that achieves sensitive and specific results across five sequencing technologies and three distinct, canonical experimental designs. Our process includes (1) initial read mapping; (2) local realignment around indels; (3) base quality score recalibration; (4) SNP discovery and genotyping to find all potential variants; and (5) machine learning to separate true segregating variation from machine artifacts common to next-generation sequencing technologies. We discuss the application of these tools, instantiated in the Genome Analysis Toolkit (GATK), to deep whole-genome, whole-exome capture, and multi-sample low-pass (~4×) 1000 Genomes Project datasets.

... read more

Topics: DNA sequencing theory (61%), Variant Call Format (58%), 1000 Genomes Project (55%) ... read more

8,715 Citations

No. of citations received by the Paper in previous years