scispace - formally typeset
Search or ask a question

Showing papers by "Manolis Kellis published in 2021"


Journal ArticleDOI
03 Feb 2021-Nature
TL;DR: EpiMap as mentioned in this paper is a compendium of 10,000 epigenomic maps across more than 800 biosamples for the annotation of genome-wide association study circuitry, which are used to define chromatin states, high-resolution enhancers, enhancer modules, upstream regulators and downstream target genes.
Abstract: Annotating the molecular basis of human disease remains an unsolved challenge, as 93% of disease loci are non-coding and gene-regulatory annotations are highly incomplete1–3. Here we present EpiMap, a compendium comprising 10,000 epigenomic maps across 800 samples, which we used to define chromatin states, high-resolution enhancers, enhancer modules, upstream regulators and downstream target genes. We used this resource to annotate 30,000 genetic loci that were associated with 540 traits4, predicting trait-relevant tissues, putative causal nucleotide variants in enriched tissue enhancers and candidate tissue-specific target genes for each. We partitioned multifactorial traits into tissue-specific contributing factors with distinct functional enrichments and disease comorbidity patterns, and revealed both single-factor monotropic and multifactor pleiotropic loci. Top-scoring loci frequently had multiple predicted driver variants, converging through multiple enhancers with a common target gene, multiple genes in common tissues, or multiple genes and multiple tissues, indicating extensive pleiotropy. Our results demonstrate the importance of dense, rich, high-resolution epigenomic annotations for the investigation of complex traits. The authors present EpiMap, a compendium that comprises 10,000 epigenomic maps across more than 800 biosamples for the annotation of genome-wide association study circuitry.

160 citations


Journal ArticleDOI
TL;DR: Comparative genomics is used to provide a high-confidence protein-coding gene set, characterize protein-level and nucleotide-level evolutionary constraint, and prioritize functional mutations from the ongoing COVID-19 pandemic.
Abstract: Despite its clinical importance, the SARS-CoV-2 gene set remains unresolved, hindering dissection of COVID-19 biology. We use comparative genomics to provide a high-confidence protein-coding gene set, characterize evolutionary constraint, and prioritize functional mutations. We select 44 Sarbecovirus genomes at ideally-suited evolutionary distances, and quantify protein-coding evolutionary signatures and overlapping constraint. We find strong protein-coding signatures for ORFs 3a, 6, 7a, 7b, 8, 9b, and a novel alternate-frame gene, ORF3c, whereas ORFs 2b, 3d/3d-2, 3b, 9c, and 10 lack protein-coding signatures or convincing experimental evidence of protein-coding function. Furthermore, we show no other conserved protein-coding genes remain to be discovered. Mutation analysis suggests ORF8 contributes to within-individual fitness but not person-to-person transmission. Cross-strain and within-strain evolutionary pressures agree, except for fewer-than-expected within-strain mutations in nsp3 and S1, and more-than-expected in nucleocapsid, which shows a cluster of mutations in a predicted B-cell epitope, suggesting immune-avoidance selection. Evolutionary histories of residues disrupted by spike-protein substitutions D614G, N501Y, E484K, and K417N/T provide clues about their biology, and we catalog likely-functional co-inherited mutations. Previously reported RNA-modification sites show no enrichment for conservation. Here we report a high-confidence gene set and evolutionary-history annotations providing valuable resources and insights on SARS-CoV-2 biology, mutations, and evolution.

110 citations


Journal ArticleDOI
TL;DR: In this paper, the E4 allele of the apolipoprotein E gene (APOE) has been established as a genetic risk factor for many diseases including cardiovascular diseases and Alzheimer's disease (AD).
Abstract: The E4 allele of the apolipoprotein E gene (APOE) has been established as a genetic risk factor for many diseases including cardiovascular diseases and Alzheimer's disease (AD), yet its mechanism of action remains poorly understood. APOE is a lipid transport protein, and the dysregulation of lipids has recently emerged as a key feature of several neurodegenerative diseases including AD. However, it is unclear how APOE4 perturbs the intracellular lipid state. Here, we report that APOE4, but not APOE3, disrupted the cellular lipidomes of human induced pluripotent stem cell (iPSC)-derived astrocytes generated from fibroblasts of APOE4 or APOE3 carriers, and of yeast expressing human APOE isoforms. We combined lipidomics and unbiased genome-wide screens in yeast with functional and genetic characterization to demonstrate that human APOE4 induced altered lipid homeostasis. These changes resulted in increased unsaturation of fatty acids and accumulation of intracellular lipid droplets both in yeast and in APOE4-expressing human iPSC-derived astrocytes. We then identified genetic and chemical modulators of this lipid disruption. We showed that supplementation of the culture medium with choline (a soluble phospholipid precursor) restored the cellular lipidome to its basal state in APOE4-expressing human iPSC-derived astrocytes and in yeast expressing human APOE4 Our study illuminates key molecular disruptions in lipid metabolism that may contribute to the disease risk linked to the APOE4 genotype. Our study suggests that manipulating lipid metabolism could be a therapeutic approach to help alleviate the consequences of carrying the APOE4 allele.

87 citations


Journal ArticleDOI
TL;DR: In this article, a high-confidence database of translated nuORFs across tissues (nuORFdb) was constructed and used to detect 3,555 translated unicast ORFs from MHC-I immunopeptidome mass spectrometry analysis.
Abstract: Tumor-associated epitopes presented on MHC-I that can activate the immune system against cancer cells are typically identified from annotated protein-coding regions of the genome, but whether peptides originating from novel or unannotated open reading frames (nuORFs) can contribute to antitumor immune responses remains unclear. Here we show that peptides originating from nuORFs detected by ribosome profiling of malignant and healthy samples can be displayed on MHC-I of cancer cells, acting as additional sources of cancer antigens. We constructed a high-confidence database of translated nuORFs across tissues (nuORFdb) and used it to detect 3,555 translated nuORFs from MHC-I immunopeptidome mass spectrometry analysis, including peptides that result from somatic mutations in nuORFs of cancer samples as well as tumor-specific nuORFs translated in melanoma, chronic lymphocytic leukemia and glioblastoma. NuORFs are an unexplored pool of MHC-I-presented, tumor-specific peptides with potential as immunotherapy targets. New tumor epitopes are discovered by ribosome profiling and immunopeptidome mass spectrometry.

73 citations


Journal ArticleDOI
TL;DR: This paper found that IFN-γ+ γδ T cells were almost exclusively dependent on glycolysis, whereas IL-17+ Δ T cells strongly engaged oxidative metabolism, with increased mitochondrial mass and activity.
Abstract: Metabolic programming controls immune cell lineages and functions, but little is known about γδ T cell metabolism. Here, we found that γδ T cell subsets making either interferon-γ (IFN-γ) or interleukin (IL)-17 have intrinsically distinct metabolic requirements. Whereas IFN-γ+ γδ T cells were almost exclusively dependent on glycolysis, IL-17+ γδ T cells strongly engaged oxidative metabolism, with increased mitochondrial mass and activity. These distinct metabolic signatures were surprisingly imprinted early during thymic development and were stably maintained in the periphery and within tumors. Moreover, pro-tumoral IL-17+ γδ T cells selectively showed high lipid uptake and intracellular lipid storage and were expanded in obesity and in tumors of obese mice. Conversely, glucose supplementation enhanced the antitumor functions of IFN-γ+ γδ T cells and reduced tumor growth upon adoptive transfer. These findings have important implications for the differentiation of effector γδ T cells and their manipulation in cancer immunotherapy.

69 citations


Journal ArticleDOI
TL;DR: In this article, a high-resolution map of the evolutionary dynamics of resistance to ICB, characterizes a de-differentiated neural-crest tumor population in melanoma immunotherapy resistance and describes site-specific differences in tumor-immune interactions via longitudinal analysis of a patient with melanoma with an unusual clinical course.
Abstract: Despite initial responses1-3, most melanoma patients develop resistance4 to immune checkpoint blockade (ICB). To understand the evolution of resistance, we studied 37 tumor samples over 9 years from a patient with metastatic melanoma with complete clinical response to ICB followed by delayed recurrence and death. Phylogenetic analysis revealed co-evolution of seven lineages with multiple convergent, but independent resistance-associated alterations. All recurrent tumors emerged from a lineage characterized by loss of chromosome 15q, with post-treatment clones acquiring additional genomic driver events. Deconvolution of bulk RNA sequencing and highly multiplexed immunofluorescence (t-CyCIF) revealed differences in immune composition among different lineages. Imaging revealed a vasculogenic mimicry phenotype in NGFRhi tumor cells with high PD-L1 expression in close proximity to immune cells. Rapid autopsy demonstrated two distinct NGFR spatial patterns with high polarity and proximity to immune cells in subcutaneous tumors versus a diffuse spatial pattern in lung tumors, suggesting different roles of this neural-crest-like program in different tumor microenvironments. Broadly, this study establishes a high-resolution map of the evolutionary dynamics of resistance to ICB, characterizes a de-differentiated neural-crest tumor population in melanoma immunotherapy resistance and describes site-specific differences in tumor-immune interactions via longitudinal analysis of a patient with melanoma with an unusual clinical course.

45 citations


Journal ArticleDOI
01 Feb 2021-Nature
TL;DR: A new analysis traces the story of the draft genome's impact on genomics since 2001, linking its effects on publications, drug approvals and understanding of disease as discussed by the authors, linking its effect on publications and drug approvals.
Abstract: A new analysis traces the story of the draft genome’s impact on genomics since 2001, linking its effects on publications, drug approvals and understanding of disease A new analysis traces the story of the draft genome’s impact on genomics since 2001, linking its effects on publications, drug approvals and understanding of disease

44 citations


Journal ArticleDOI
TL;DR: This study reveals how bumblebee genes and genomes have evolved across the Bombus phylogeny and identifies variations potentially linked to key ecological and behavioral traits of these important pollinators.
Abstract: Bumblebees are a diverse group of globally important pollinators in natural ecosystems and for agricultural food production. With both eusocial and solitary life-cycle phases, and some social parasite species, they are especially interesting models to understand social evolution, behavior, and ecology. Reports of many species in decline point to pathogen transmission, habitat loss, pesticide usage, and global climate change, as interconnected causes. These threats to bumblebee diversity make our reliance on a handful of well-studied species for agricultural pollination particularly precarious. To broadly sample bumblebee genomic and phenotypic diversity, we de novo sequenced and assembled the genomes of 17 species, representing all 15 subgenera, producing the first genus-wide quantification of genetic and genomic variation potentially underlying key ecological and behavioral traits. The species phylogeny resolves subgenera relationships, whereas incomplete lineage sorting likely drives high levels of gene tree discordance. Five chromosome-level assemblies show a stable 18-chromosome karyotype, with major rearrangements creating 25 chromosomes in social parasites. Differential transposable element activity drives changes in genome sizes, with putative domestications of repetitive sequences influencing gene coding and regulatory potential. Dynamically evolving gene families and signatures of positive selection point to genus-wide variation in processes linked to foraging, diet and metabolism, immunity and detoxification, as well as adaptations for life at high altitudes. Our study reveals how bumblebee genes and genomes have evolved across the Bombus phylogeny and identifies variations potentially linked to key ecological and behavioral traits of these important pollinators.

41 citations


Journal ArticleDOI
TL;DR: In this paper, the most prevalent post-transcriptional mRNA modification, N6-methyladenosine (m6A), plays diverse RNA-regulatory roles, but its genetic control in human tissues remains uncharted.
Abstract: The most prevalent post-transcriptional mRNA modification, N6-methyladenosine (m6A), plays diverse RNA-regulatory roles, but its genetic control in human tissues remains uncharted. Here we report 129 transcriptome-wide m6A profiles, covering 91 individuals and 4 tissues (brain, lung, muscle and heart) from GTEx/eGTEx. We integrate these with interindividual genetic and expression variation, revealing 8,843 tissue-specific and 469 tissue-shared m6A quantitative trait loci (QTLs), which are modestly enriched in, but mostly orthogonal to, expression QTLs. We integrate m6A QTLs with disease genetics, identifying 184 GWAS-colocalized m6A QTL, including brain m6A QTLs underlying neuroticism, depression, schizophrenia and anxiety; lung m6A QTLs underlying expiratory flow and asthma; and muscle/heart m6A QTLs underlying coronary artery disease. Last, we predict novel m6A regulators that show preferential binding in m6A QTLs, protein interactions with known m6A regulators and expression correlation with the m6A levels of their targets. Our results provide important insights and resources for understanding both cis and trans regulation of epitranscriptomic modifications, their interindividual variation and their roles in human disease. Analysis of 129 N6-methyladenosine (m6A) profiles across 4 tissues (brain, lung, muscle and heart) identifies 8,843 tissue-specific and 469 tissue-shared m6A quantitative trait loci (QTLs). Of these, 184 m6A QTLs colocalize with GWAS signals.

40 citations



Journal ArticleDOI
26 May 2021
TL;DR: In this paper, the authors proposed an efficient NEgative Binomial mixed model using a large-sample approximation (NEBULA), which analytically solves high-dimensional integrals instead of using the Laplace approximation.
Abstract: The increasing availability of single-cell data revolutionizes the understanding of biological mechanisms at cellular resolution. For differential expression analysis in multi-subject single-cell data, negative binomial mixed models account for both subject-level and cell-level overdispersions, but are computationally demanding. Here, we propose an efficient NEgative Binomial mixed model Using a Large-sample Approximation (NEBULA). The speed gain is achieved by analytically solving high-dimensional integrals instead of using the Laplace approximation. We demonstrate that NEBULA is orders of magnitude faster than existing tools and controls false-positive errors in marker gene identification and co-expression analysis. Using NEBULA in Alzheimer’s disease cohort data sets, we found that the cell-level expression of APOE correlated with that of other genetic risk factors (including CLU, CST3, TREM2, C1q, and ITM2B) in a cell-type-specific pattern and an isoform-dependent manner in microglia. NEBULA opens up a new avenue for the broad application of mixed models to large-scale multi-subject single-cell data. The application of negative binomial mixed models (NBMMs) to single-cell data is computationally demanding. To address this issue, Liang He et al. have developed NEBULA, an efficient algorithm that can analyze differential gene expression or co-expression networks in multi-subject single-cell data sets, and validate it on snRNA-seq and scRNA-seq data sets comprising ~200k cells from cohorts of Alzheimer’s disease and multiple sclerosis patients.

Journal ArticleDOI
TL;DR: In this article, a high-throughput strategy was developed to design, screen, and optimize 5′ UTRs that enhance protein expression from a strong human cytomegalovirus (CMV) promoter.
Abstract: Despite significant clinical progress in cell and gene therapies, maximizing protein expression in order to enhance potency remains a major technical challenge. Here, we develop a high-throughput strategy to design, screen, and optimize 5′ UTRs that enhance protein expression from a strong human cytomegalovirus (CMV) promoter. We first identify naturally occurring 5′ UTRs with high translation efficiencies and use this information with in silico genetic algorithms to generate synthetic 5′ UTRs. A total of ~12,000 5′ UTRs are then screened using a recombinase-mediated integration strategy that greatly enhances the sensitivity of high-throughput screens by eliminating copy number and position effects that limit lentiviral approaches. Using this approach, we identify three synthetic 5′ UTRs that outperform commonly used non-viral gene therapy plasmids in expressing protein payloads. In summary, we demonstrate that high-throughput screening of 5′ UTR libraries with recombinase-mediated integration can identify genetic elements that enhance protein expression, which should have numerous applications for engineered cell and gene therapies. The engineering of 5′ UTRs that modulate protein expression remains a great challenge. Here we leverage synthetic biology and computational design to develop a high-throughput strategy to design, screen, and optimize 5′ UTRs that enhance protein expression for non-viral gene therapies.

Posted ContentDOI
07 Jul 2021-bioRxiv
TL;DR: In this paper, the authors report a single-cell atlas of the human primary motor cortex (MCX) and its transcriptional alterations in ALS and FTLD across ~380,000 nuclei from 64 individuals, including 17 control samples and 47 sporadic and C9orf72-associated ALS/FTLD patient samples.
Abstract: Amyotrophic lateral sclerosis (ALS) and frontotemporal lobar degeneration (FTLD) are two devastating and fatal neurodegenerative conditions. While distinct, they share many clinical, genetic, and pathological characteristics1, and both show selective vulnerability of layer 5b extratelencephalic-projecting cortical populations, including Betz cells in ALS2,3 and von Economo neurons (VENs) in FTLD4,5. Here, we report the first high resolution single-cell atlas of the human primary motor cortex (MCX) and its transcriptional alterations in ALS and FTLD across ~380,000 nuclei from 64 individuals, including 17 control samples and 47 sporadic and C9orf72-associated ALS and FTLD patient samples. We identify 46 transcriptionally distinct cellular subtypes including two Betz-cell subtypes, and we observe a previously unappreciated molecular similarity between Betz cells and VENs of the prefrontal cortex (PFC) and frontal insula. Many of the dysregulated genes and pathways are shared across excitatory neurons, including stress response, ribosome function, oxidative phosphorylation, synaptic vesicle cycle, endoplasmic reticulum protein processing, and autophagy. Betz cells and SCN4B+ long-range projecting L3/L5 cells are the most transcriptionally affected in both ALS and FTLD. Lastly, we find that the VEN/Betz cell-enriched transcription factor, POU3F1, has altered subcellular localization, co-localizes with TDP-43 aggregates, and may represent a cell type-specific vulnerability factor in the Betz cells of ALS and FTLD patient tissues.

Journal ArticleDOI
TL;DR: The need to preserve cognitive function in an aging population has been highlighted by as discussed by the authors, who pointed out that recent increases in human longevity have been accompanied by a rise in the incidence of dementia.
Abstract: Recent increases in human longevity have been accompanied by a rise in the incidence of dementia, highlighting the need to preserve cognitive function in an aging population. A small percentage of ...

Posted ContentDOI
27 Apr 2021-bioRxiv
TL;DR: This paper performed single-cell characterization of the human cerebrovasculature using both ex vivo fresh-tissue experimental enrichment and post mortem in silico sorting of human cortical tissue samples.
Abstract: Summary Despite the importance of the blood-brain barrier in maintaining normal brain physiology and in understanding neurodegeneration and CNS drug delivery, human cerebrovascular cells remain poorly characterized due to their sparsity and dispersion. Here, we perform the first single-cell characterization of the human cerebrovasculature using both ex vivo fresh-tissue experimental enrichment and post mortem in silico sorting of human cortical tissue samples. We capture 31,812 cerebrovascular cells across 17 subtypes, including three distinct subtypes of perivascular fibroblasts as well as vasculature-coupled neurons and glia. We uncover human-specific expression patterns along the arteriovenous axis and determine previously uncharacterized cell type-specific markers. We use our newly discovered human-specific signatures to study changes in 3,945 cerebrovascular cells of Huntington’s disease patients, which reveal an activation of innate immune signaling in vascular and vasculature-coupled cell types and the concomitant reduction to proteins critical for maintenance of BBB integrity. Finally, our study provides a comprehensive resource molecular atlas of the human cerebrovasculature to guide future biological and therapeutic studies.

Posted ContentDOI
21 Jan 2021-bioRxiv
TL;DR: The SPLITR framework as mentioned in this paper integrates single-nucleus and bulk RNA-seq data, enabling phenotype-aware deconvolution and correcting for systematic discrepancies between bulk and single-cell data.
Abstract: Thousands of genetic variants acting in multiple cell types underlie complex disorders, yet most gene expression studies profile only bulk tissues, making it hard to resolve where genetic and non-genetic contributors act. This is particularly important for psychiatric and neurodegenerative disorders that impact multiple brain cell types with highly-distinct gene expression patterns and proportions. To address this challenge, we develop a new framework, SPLITR, that integrates single-nucleus and bulk RNA-seq data, enabling phenotype-aware deconvolution and correcting for systematic discrepancies between bulk and single-cell data. We deconvolved 3,387 post-mortem brain samples across 1,127 individuals and in multiple brain regions. We find that cell proportion varies across brain regions, individuals, disease status, and genotype, including genetic variants in TMEM106B that impact inhibitory neuron fraction and 4,757 cell-type-specific eQTLs. Our results demonstrate the power of jointly analyzing bulk and single-cell RNA-seq to provide insights into cell-type-specific mechanisms for complex brain disorders.

Posted ContentDOI
01 Jul 2021-bioRxiv
TL;DR: In this paper, the authors report a comprehensive single-cell transcriptomic dissection of the human hippocampus and entorhinal cortex across 489,558 cells from 65 individuals with varying stages of Alzheimer's disease pathology.
Abstract: The human hippocampal formation plays a central role in Alzheimer’s disease (AD) progression, cognitive traits, and the onset of dementia; yet its molecular states in AD remain uncharacterized. Here, we report a comprehensive single-cell transcriptomic dissection of the human hippocampus and entorhinal cortex across 489,558 cells from 65 individuals with varying stages of AD pathology. We transcriptionally characterize major brain cell types and neuronal classes, including 17 glutamatergic and 8 GABAergic neuron subpopulations. Combining evidence from human and mouse tissue-microdissection, neuronal cell isolation and spatial transcriptomics, we show that single-cell expression patterns capture fine-resolution neuronal anatomical topography. By stratifying subjects into early and late pathology groups, we uncover stage-dependent and cell-type specific transcriptional modules altered during AD progression. These include early-stage cell-type specific dysregulation of cellular and cholesterol metabolism, late-stage neuron-glia alterations in neurotransmission, and late-stage signatures of cellular stress, apoptosis, and DNA damage broadly shared across cell types. Late-stage signatures show signs of convergence in hippocampal and cortical cells, while early changes diverge; highlighting the relevance of characterizing molecular pathology across brain regions and AD progression. Finally, we characterize neuron subregion-specific responses to AD pathology and show that CA1 pyramidal neurons are the most transcriptionally altered while CA3 and dentate gyrus granule neurons the least. Our study provides a valuable resource to extend cell type-specific studies of AD to clinically relevant brain regions affected early by pathology in disease progression.

Journal ArticleDOI
TL;DR: Zhang et al. as mentioned in this paper carried out an exome-wide association analysis of age-of-onset of AD with ~20,000 subjects and placed more emphasis on APOE e4 non-carriers.
Abstract: Despite recent discoveries in genome-wide association studies (GWAS) of genomic variants associated with Alzheimer's disease (AD), its underlying biological mechanisms are still elusive. The discovery of novel AD-associated genetic variants, particularly in coding regions and from APOE e4 non-carriers, is critical for understanding the pathology of AD. In this study, we carried out an exome-wide association analysis of age-of-onset of AD with ~20,000 subjects and placed more emphasis on APOE e4 non-carriers. Using Cox mixed-effects models, we find that age-of-onset shows a stronger genetic signal than AD case-control status, capturing many known variants with stronger significance, and also revealing new variants. We identified two novel variants, rs56201815, a rare synonymous variant in ERN1, and rs12373123, a common missense variant in SPPL2C in the MAPT region in APOE e4 non-carriers. Besides, a rare missense variant rs144292455 in TACR3 showed the consistent direction of effect sizes across all studies with a suggestive significant level. In an attempt to unravel their regulatory and biological functions, we found that the minor allele of rs56201815 was associated with lower average FDG uptake across five brain regions in ADNI. Our eQTL analyses based on 6198 gene expression samples from ROSMAP and GTEx revealed that the minor allele of rs56201815 was potentially associated with elevated expression of ERN1, a key gene triggering unfolded protein response (UPR), in multiple brain regions, including the posterior cingulate cortex and nucleus accumbens. Our cell-type-specific eQTL analysis using ~80,000 single nuclei in the prefrontal cortex revealed that the protective minor allele of rs12373123 significantly increased the expression of GRN in microglia, and was associated with MAPT expression in astrocytes. These findings provide novel evidence supporting the hypothesis of the potential involvement of the UPR to ER stress in the pathological pathway of AD, and also give more insights into underlying regulatory mechanisms behind the pleiotropic effects of rs12373123 in multiple degenerative diseases including AD and Parkinson's disease.

Posted ContentDOI
21 May 2021-bioRxiv
TL;DR: In this article, the authors identify disease-critical fetal and adult brain cell types by integrating GWAS summary statistics from 28 brain-related diseases and traits (average N =298K) with 3.2 million scATAC-seq and scRNA-seq profiles from 83 cell types.
Abstract: Prioritizing disease-critical cell types by integrating genome-wide association studies (GWAS) with functional data is a fundamental goal. Single-cell chromatin accessibility (scATAC-seq) and gene expression (scRNA-seq) have characterized cell types at high resolution, and early work on integrating GWAS with scRNA-seq has shown promise, but work on integrating GWAS with scATAC-seq has been limited. Here, we identify disease-critical fetal and adult brain cell types by integrating GWAS summary statistics from 28 brain-related diseases and traits (average N =298K) with 3.2 million scATAC-seq and scRNA-seq profiles from 83 cell types. We identified disease-critical fetal (resp. adult) brain cell types for 22 (resp. 23) of 28 traits using scATAC-seq data, and for 8 (resp. 17) of 28 traits using scRNA-seq data. Notable findings using scATAC-seq data included highly significant enrichments of fetal photoreceptor cells for major depressive disorder, fetal ganglion cells for BMI, fetal astrocytes for ADHD, and adult VGLUT2 excitatory neurons for schizophrenia. Our findings improve our understanding of brain-related diseases and traits, and inform future analyses of other diseases/traits.

Posted ContentDOI
27 Jun 2021-bioRxiv
TL;DR: A general principle by which transcription uses environmental fluctuations for genome function is uncovered, and how resource conservation optimizes transcriptional self-organization through robust feedback integrators is demonstrated, highlighting obesity as an inhibitor of genome plasticity relevant for many diseases.
Abstract: Metabolism plays a central role in evolution, as resource conservation is a selective pressure for fitness and survival. Resource-driven adaptations offer a good model to study evolutionary innovation more broadly. It remains unknown how resource-driven optimization of genome function integrates chromatin architecture with transcriptional phase transitions. Here we show that tuning of genome architecture and heterotypic transcriptional condensates mediate resilience to nutrient limitation. Network genomic integration of phenotypic, structural, and functional relationships reveals that fat tissue promotes organismal adaptations through metabolic acceleration chromatin domains and heterotypic PGC1A condensates. We find evolutionary innovation in several dimensions; low conservation of amino acid residues within protein disorder regions, nonrandom chromatin location of metabolic acceleration domains, condensate-chromatin stability through cis-regulatory archoring and encoding of genome plasticity in radial chromatin organization. We show that environmental tuning of these adaptations leads to fasting endurance, through efficient nuclear compartmentalization of lipid metabolic regions, and, locally, human-specific burst kinetics of lipid cycling genes. This process reduces oxidative stress, and fatty-acid mediated cellular acidification, enabling endurance of condensate chromatin conformations. Comparative genomics of genetic and diet perturbations reveal mammalian convergence of phenotype and structural relationships, along with loss of transcriptional control by diet-induced obesity. Further, we find that radial transcriptional organization is encoded in functional divergence of metabolic disease variant-hubs, heterotypic condensate composition, and evolutionary tuned protein residues sensing metabolic variation. During fuel restriction, these features license the formation of large heterotypic condensates that buffer proton excess, and shift viscoelasticity for condensate endurance. This mechanism maintains physiological pH, reduces pH-resilient inflammatory gene programs, and enables genome plasticity through transcriptionally driven cell-specific chromatin contacts. In vivo manipulation of this circuit promotes fasting-like adaptations with heterotypic nuclear compartments, metabolic and cell-specific homeostasis. In sum, we uncover here a general principle by which transcription uses environmental fluctuations for genome function, and demonstrate how resource conservation optimizes transcriptional self-organization through robust feedback integrators, highlighting obesity as an inhibitor of genome plasticity relevant for many diseases.

Journal ArticleDOI
TL;DR: CoCoA-diff as discussed by the authors identifies 215 differentially regulated causal genes in various cell types, including highly relevant genes with a proper cell type context, by adjusting confounders without prior knowledge of control variables in single-cell RNAseq data.
Abstract: Finding a causal gene is a fundamental problem in genomic medicine. We present a causal inference framework, CoCoA-diff, that prioritizes disease genes by adjusting confounders without prior knowledge of control variables in single-cell RNA-seq data. We demonstrate that our method substantially improves statistical power in simulations and real-world data analysis of 70k brain cells collected for dissecting Alzheimer's disease. We identify 215 differentially regulated causal genes in various cell types, including highly relevant genes with a proper cell type context. Genes found in different types enrich distinctive pathways, implicating the importance of cell types in understanding multifaceted disease mechanisms.

Posted ContentDOI
24 Jun 2021-bioRxiv
TL;DR: In this article, a detailed single-cell dissection of the cell types and disease-associated gene expression changes in the living human heart, using cardiac biopsies collected during open-heart surgery from control, ischemic and non-ischemic heart failure patients, was provided.
Abstract: Ischemic heart disease is the single most common cause of death worldwide with an annual death rate of over 9 million people. Genome-wide association studies have uncovered over 200 genetic loci underlying the disease, providing a deeper understanding of the causal mechanisms leading to it. However, in order to understand ischemic heart disease at the cellular and molecular level, it is necessary to identify the cell-type-specific circuits enabling dissection of driver variants, genes, and signaling pathways in normal and diseased tissues. Here, we provide the first detailed single-cell dissection of the cell types and disease-associated gene expression changes in the living human heart, using cardiac biopsies collected during open-heart surgery from control, ischemic heart disease, and ischemic and non-ischemic heart failure patients. We identify 84 cell types/states, grouped in 12 major cell types. We define markers for each cell type, providing the first extensive reference set for the live human heart. These major cell types include cardiovascular cells (cardiomyocytes, endothelial cells, fibroblasts), rarer cell types (B lymphocytes, neurons, Schwann cells), and rich populations of previously understudied layer-specific epicardial and endocardial cells. In addition, we reveal substantial differences in disease-associated gene expression at the cell subtype level, revealing t arterial pericytes as having a central role in the pathogenesis of ischemic heart disease and heart failure. Our results demonstrate the importance of high-resolution cellular subtype mapping in gaining mechanistic insight into human cardiovascular disease.

Posted ContentDOI
27 Mar 2021-bioRxiv
TL;DR: In this article, a large-scale single-molecule long-read sequencing of 320 Tibetan and Han samples was used to show that structural variants are key drivers of selection under high-altitude adaptation.
Abstract: Structural variants (SVs) can be important drivers of human adaptation with strong effects, but previous studies have focused primarily on common variants with weak effects. Here, we used large-scale single-molecule long-read sequencing of 320 Tibetan and Han samples, to show that SVs are key drivers of selection under high-altitude adaptation. We expand the landscape of global SVs, apply robust models of selection and population differentiation combining SVs, SNPs and InDels, and use epigenomic analyses to predict driver enhancers, target genes, upstream regulators, and biological functions, which we validate using enhancer reporter and DNA pull-down assays. We reveal diverse Tibetan-specific SVs affecting the cis- and trans-regulatory circuitry of diverse biological functions, including hypoxia response, energy metabolism, lung function, etc. Our study greatly expands the global SV landscape, reveals the central role of gene-regulatory circuitry rewiring in human adaptation, and illustrates the diverse functional roles that SVs can play in human biology.

Posted ContentDOI
23 Nov 2021-bioRxiv
TL;DR: In this paper, the effects of exercise training and high-fat diet at single-cell, deconvolution and tissue-level resolutions across three metabolic tissues were studied in mice.
Abstract: Regular physical exercise has long been recognized to reverse the effects of diet-induced obesity, but the molecular mechanisms mediating these multi-tissue beneficial effects remain uncharacterized. Here, we address this challenge by studying the opposing effects of exercise training and high-fat diet at single-cell, deconvolution and tissue-level resolutions across 3 metabolic tissues. We profile scRNA-seq in 204,883 cells, grouped into 53 distinct cell subtypes/states in 22 major cell types, from subcutaneous and visceral white adipose tissue (WAT), and skeletal muscle (SkM) in mice with diet and exercise training interventions. With a great number of mesenchymal stem cells (MSCs) profiled, we compared depot-specific adipose stem cell (ASC) states, and defined 7 distinct fibro-adipogenic progenitor (FAP) states in SkM including discovering and validating a novel CD140+/CD34+/SCA1- FAP population. Exercise- and obesity-regulated proportion, transcriptional and cell-cell interaction changes were most strongly pronounced in and centered around ASCs, FAPs, macrophages and T-cells. These changes reflected thermogenesis-vs-lipogenesis and hyperplasia-vs-hypertrophy shifts, clustered in pathways including extracellular matrix remodeling and circadian rhythm, and implicated complex single- and multi-tissue communication including training-associated shift of a cytokine from binding to its decoy receptor on ASCs to true receptor on M2 macrophages in vWAT. Overall, our work provides new insights on the metabolic protective effects of exercise training, uncovers a previously-underappreciated role of MSCs in mediating tissue-specific and multi-tissue effects, and serves as a model for multi-tissue single-cell analyses in physiologically complex and multifactorial traits exemplified by obesity and exercise training.

Posted ContentDOI
26 Jun 2021-bioRxiv
TL;DR: In this paper, the concepts of non-equilibrium tuning and compartmentalization are sufficient to model manifestations of cellular intelligence such as specialization, division, fusion and communication using the language of operads.
Abstract: Intelligence is usually associated with the ability to perceive, retain and use information to adapt to changes in one9s environment. In this context, systems of living cells can be thought of as intelligent entities. Here, we show that the concepts of non-equilibrium tuning and compartmentalization are sufficient to model manifestations of cellular intelligence such as specialization, division, fusion and communication using the language of operads. We implement our framework as an unsupervised learning algorithm, IntCyt, which we show is able to memorize, organize and abstract reference machine-learning datasets through generative and self-supervised tasks. Overall, our learning framework captures emergent properties programmed in living systems, and provides a powerful new approach for data mining. to memorize, organize and abstract reference machine-learning datasets through generative and self-supervised tasks. Overall, our learning framework captures emergent properties programmed in living systems, and provides a powerful new approach for data mining.

Posted ContentDOI
23 May 2021-bioRxiv
TL;DR: In this article, a panel of 28 tissue pairs of human and mouse with RNA-Seq data on gene expression was assembled to identify expression patterns that identify and explain differences between the two species and suggest target genes for therapeutic applications.
Abstract: We assembled a panel of 28 tissue pairs of human and mouse with RNA-Seq data on gene expression. We focused on genes with no 1-to-1 homology, because they pose special challenges. In this way, we identified expression patterns that identify and explain differences between the two species and suggest target genes for therapeutic applications. Here we mention three examples. One pattern is observed by defining the aggregate expression of immunoglobulin genes (which have no homology) as a measure of different levels of an immune response. In Lung, we used this statistic to find genes that have significantly higher expression in low/moderate response, and thus they may be therapy targets: increasing their expression or mimicking their function with medications may help in recovery from inflammation in the lungs. Some of the observed associations are common to human and mouse; other associations involve genes involved in cell-to-cell signaling or in regeneration but were not known to be important in Lung. Second pattern is that in the Small Intestine, mouse expresses much less antimicrobial defensins, while it has much higher expression of enzymes that are found to improve adaptive immune response. Such enzymes may be tested if they improve probiotic supplements that help in gut inflammation and other diseases. Another pattern involves a many-to-many homology group of defensins that did not have a described function. In human tissues, expression of its genes was found only in a study of a disease of hair covered skin, but several of its genes are highly expressed in two tissues of our panel: mouse Skin and to a lesser degree mouse Vagina. This suggests that those genes or their homologs in other species may provide non-antibiotic medications for hair covered skin and other tissues with microbiome that includes fungi.

Posted Content
TL;DR: In this article, the authors propose a mixture of archetypal DAGs (NOTMAD) to infer context-specific Bayesian networks with a smooth regularization loss that is backpropagated to the mixing function.
Abstract: Context-specific Bayesian networks (i.e. directed acyclic graphs, DAGs) identify context-dependent relationships between variables, but the non-convexity induced by the acyclicity requirement makes it difficult to share information between context-specific estimators (e.g. with graph generator functions). For this reason, existing methods for inferring context-specific Bayesian networks have favored breaking datasets into subsamples, limiting statistical power and resolution, and preventing the use of multidimensional and latent contexts. To overcome this challenge, we propose NOTEARS-optimized Mixtures of Archetypal DAGs (NOTMAD). NOTMAD models context-specific Bayesian networks as the output of a function which learns to mix archetypal networks according to sample context. The archetypal networks are estimated jointly with the context-specific networks and do not require any prior knowledge. We encode the acyclicity constraint as a smooth regularization loss which is back-propagated to the mixing function; in this way, NOTMAD shares information between context-specific acyclic graphs, enabling the estimation of Bayesian network structures and parameters at even single-sample resolution. We demonstrate the utility of NOTMAD and sample-specific network inference through analysis and experiments, including patient-specific gene expression networks which correspond to morphological variation in cancer.

Posted ContentDOI
26 Jan 2021-medRxiv
TL;DR: In this paper, the authors present a causal inference framework that effectively adjusts confounding effects, not requiring prior knowledge of control genes or cells, and demonstrate that their causal inference algorithm substantially improves statistical power in simulations and real-world data analysis of 70k brain cells, collected for dissecting Alzheimer's disease mechanisms.
Abstract: Finding a causal gene from case-control studies is a classic and fundamental problem in genomics. To date, we still ask which genes are differentially regulated by a disease with single-cell sequencing data, but in a cell-type-specific way. Here, we present a causal inference framework that effectively adjusts confounding effects, not requiring prior knowledge of control genes or cells. We demonstrate that our causal inference algorithm substantially improves statistical power in simulations and real-world data analysis of 70k brain cells, collected for dissecting Alzheimer’s disease (AD) mechanisms. We identified that 377 causal genes are differentially regulated by the disease in various brain cell types, including highly-relevant AD genes with a proper cell type annotation, such as DGKD in neurons, SNCA in microglia, PIAS in oligodendrocyte progenitor cells, and FGFR2 in astrocytes. Causal genes in different cell types also enrich distinctive pathways, highlighting multiple components of the disease progressions.