scispace - formally typeset
Search or ask a question

Showing papers by "Broad Institute published in 2021"


Journal ArticleDOI
TL;DR: In this paper, the authors analyzed data from 4,182 incident cases of COVID-19 in which individuals self-reported their symptoms prospectively in the COVID Symptom Study app.
Abstract: Reports of long-lasting coronavirus disease 2019 (COVID-19) symptoms, the so-called 'long COVID', are rising but little is known about prevalence, risk factors or whether it is possible to predict a protracted course early in the disease. We analyzed data from 4,182 incident cases of COVID-19 in which individuals self-reported their symptoms prospectively in the COVID Symptom Study app1. A total of 558 (13.3%) participants reported symptoms lasting ≥28 days, 189 (4.5%) for ≥8 weeks and 95 (2.3%) for ≥12 weeks. Long COVID was characterized by symptoms of fatigue, headache, dyspnea and anosmia and was more likely with increasing age and body mass index and female sex. Experiencing more than five symptoms during the first week of illness was associated with long COVID (odds ratio = 3.53 (2.76-4.50)). A simple model to distinguish between short COVID and long COVID at 7 days (total sample size, n = 2,149) showed an area under the curve of the receiver operating characteristic curve of 76%, with replication in an independent sample of 2,472 individuals who were positive for severe acute respiratory syndrome coronavirus 2. This model could be used to identify individuals at risk of long COVID for trials of prevention or treatment and to plan education and rehabilitation services.

1,222 citations


Journal ArticleDOI
Daniel Taliun1, Daniel N. Harris2, Michael D. Kessler2, Jedidiah Carlson3  +202 moreInstitutions (61)
10 Feb 2021-Nature
TL;DR: The Trans-Omics for Precision Medicine (TOPMed) project as discussed by the authors aims to elucidate the genetic architecture and biology of heart, lung, blood and sleep disorders, with the ultimate goal of improving diagnosis, treatment and prevention of these diseases.
Abstract: The Trans-Omics for Precision Medicine (TOPMed) programme seeks to elucidate the genetic architecture and biology of heart, lung, blood and sleep disorders, with the ultimate goal of improving diagnosis, treatment and prevention of these diseases The initial phases of the programme focused on whole-genome sequencing of individuals with rich phenotypic data and diverse backgrounds Here we describe the TOPMed goals and design as well as the available resources and early insights obtained from the sequence data The resources include a variant browser, a genotype imputation server, and genomic and phenotypic data that are available through dbGaP (Database of Genotypes and Phenotypes)1 In the first 53,831 TOPMed samples, we detected more than 400 million single-nucleotide and insertion or deletion variants after alignment with the reference genome Additional previously undescribed variants were detected through assembly of unmapped reads and customized analysis in highly variable loci Among the more than 400 million detected variants, 97% have frequencies of less than 1% and 46% are singletons that are present in only one individual (53% among unrelated individuals) These rare variants provide insights into mutational processes and recent human evolutionary history The extensive catalogue of genetic variation in TOPMed studies provides unique opportunities for exploring the contributions of rare and noncoding sequence variants to phenotypic variation Furthermore, combining TOPMed haplotypes with modern imputation methods improves the power and reach of genome-wide association studies to include variants down to a frequency of approximately 001% The goals, resources and design of the NHLBI Trans-Omics for Precision Medicine (TOPMed) programme are described, and analyses of rare variants detected in the first 53,831 samples provide insights into mutational processes and recent human evolutionary history

801 citations


Journal ArticleDOI
TL;DR: MitoCarta3.0, a catalogue of over 1000 genes encoding the mammalian mitochondrial proteome, is introduced and includes manually curated annotations of sub-mitochondrial localization and MitoPathway annotations, spanning seven broad functional categories relevant to mitochondria.
Abstract: The mammalian mitochondrial proteome is under dual genomic control, with 99% of proteins encoded by the nuclear genome and 13 originating from the mitochondrial DNA (mtDNA). We previously developed MitoCarta, a catalogue of over 1000 genes encoding the mammalian mitochondrial proteome. This catalogue was compiled using a Bayesian integration of multiple sequence features and experimental datasets, notably protein mass spectrometry of mitochondria isolated from fourteen murine tissues. Here, we introduce MitoCarta3.0. Beginning with the MitoCarta2.0 inventory, we performed manual review to remove 100 genes and introduce 78 additional genes, arriving at an updated inventory of 1136 human genes. We now include manually curated annotations of sub-mitochondrial localization (matrix, inner membrane, intermembrane space, outer membrane) as well as assignment to 149 hierarchical 'MitoPathways' spanning seven broad functional categories relevant to mitochondria. MitoCarta3.0, including sub-mitochondrial localization and MitoPathway annotations, is freely available at http://www.broadinstitute.org/mitocarta and should serve as a continued community resource for mitochondrial biology and medicine.

526 citations


Journal ArticleDOI
TL;DR: It is shown how the popular workflow management system Snakemake can be used to guarantee reproducibility, and how it enables an ergonomic, combined, unified representation of all steps involved in data analysis, ranging from raw data processing, to quality control and fine-grained, interactive exploration and plotting of final results.
Abstract: Data analysis often entails a multitude of heterogeneous steps, from the application of various command line tools to the usage of scripting languages like R or Python for the generation of plots and tables. It is widely recognized that data analyses should ideally be conducted in a reproducible way. Reproducibility enables technical validation and regeneration of results on the original or even new data. However, reproducibility alone is by no means sufficient to deliver an analysis that is of lasting impact (i.e., sustainable) for the field, or even just one research group. We postulate that it is equally important to ensure adaptability and transparency. The former describes the ability to modify the analysis to answer extended or slightly different research questions. The latter describes the ability to understand the analysis in order to judge whether it is not only technically, but methodologically valid. Here, we analyze the properties needed for a data analysis to become reproducible, adaptable, and transparent. We show how the popular workflow management system Snakemake can be used to guarantee this, and how it enables an ergonomic, combined, unified representation of all steps involved in data analysis, ranging from raw data processing, to quality control and fine-grained, interactive exploration and plotting of final results.

519 citations


Journal ArticleDOI
04 May 2021-eLife
TL;DR: BioBakery 3 as mentioned in this paper is a set of integrated, improved methods for taxonomic, strain-level, functional, and phylogenetic profiling of metagenomes newly developed to build on the largest set of reference sequences now available.
Abstract: Culture-independent analyses of microbial communities have progressed dramatically in the last decade, particularly due to advances in methods for biological profiling via shotgun metagenomics. Opportunities for improvement continue to accelerate, with greater access to multi-omics, microbial reference genomes, and strain-level diversity. To leverage these, we present bioBakery 3, a set of integrated, improved methods for taxonomic, strain-level, functional, and phylogenetic profiling of metagenomes newly developed to build on the largest set of reference sequences now available. Compared to current alternatives, MetaPhlAn 3 increases the accuracy of taxonomic profiling, and HUMAnN 3 improves that of functional potential and activity. These methods detected novel disease-microbiome links in applications to CRC (1262 metagenomes) and IBD (1635 metagenomes and 817 metatranscriptomes). Strain-level profiling of an additional 4077 metagenomes with StrainPhlAn 3 and PanPhlAn 3 unraveled the phylogenetic and functional structure of the common gut microbe Ruminococcus bromii, previously described by only 15 isolate genomes. With open-source implementations and cloud-deployable reproducible workflows, the bioBakery 3 platform can help researchers deepen the resolution, scale, and accuracy of multi-omic profiling for microbial community studies.

500 citations


Journal ArticleDOI
TL;DR: Slide-seqV2, a technology that enables transcriptome-wide detection of RNAs with a spatial resolution of 10 μm, is reported, which combines improvements in library generation, bead synthesis and array indexing to reach an RNA capture efficiency ~50% that of single-cell RNA-seq data (~10-fold greater than Slide-seq).
Abstract: Measurement of the location of molecules in tissues is essential for understanding tissue formation and function. Previously, we developed Slide-seq, a technology that enables transcriptome-wide detection of RNAs with a spatial resolution of 10 μm. Here we report Slide-seqV2, which combines improvements in library generation, bead synthesis and array indexing to reach an RNA capture efficiency ~50% that of single-cell RNA-seq data (~10-fold greater than Slide-seq), approaching the detection efficiency of droplet-based single-cell RNA-seq techniques. First, we leverage the detection efficiency of Slide-seqV2 to identify dendritically localized mRNAs in neurons of the mouse hippocampus. Second, we integrate the spatial information of Slide-seqV2 data with single-cell trajectory analysis tools to characterize the spatiotemporal development of the mouse neocortex, identifying underlying genetic programs that were poorly sampled with Slide-seq. The combination of near-cellular resolution and high transcript detection efficiency makes Slide-seqV2 useful across many experimental contexts. An improved method for spatial transcriptomics with detection efficiency approaching that of droplet-based single-cell RNA-seq techniques.

435 citations


Journal ArticleDOI
TL;DR: In this article, a genome-wide association study of self-reported daytime napping in the UK Biobank and Mendelian randomization was performed to explore causal associations with cardiometabolic outcomes.
Abstract: Daytime napping is a common, heritable behavior, but its genetic basis and causal relationship with cardiometabolic health remain unclear. Here, we perform a genome-wide association study of self-reported daytime napping in the UK Biobank (n = 452,633) and identify 123 loci of which 61 replicate in the 23andMe research cohort (n = 541,333). Findings include missense variants in established drug targets for sleep disorders (HCRTR1, HCRTR2), genes with roles in arousal (TRPC6, PNOC), and genes suggesting an obesity-hypersomnolence pathway (PNOC, PATJ). Association signals are concordant with accelerometer-measured daytime inactivity duration and 33 loci colocalize with loci for other sleep phenotypes. Cluster analysis identifies three distinct clusters of nap-promoting mechanisms with heterogeneous associations with cardiometabolic outcomes. Mendelian randomization shows potential causal links between more frequent daytime napping and higher blood pressure and waist circumference. The genetic basis of daytime napping and the directional effect of daytime napping on cardiometabolic health are unknown. Here, the authors perform a genome-wide association study on self-reported daytime napping in the UK Biobank and Mendelian randomization to explore causal associations.

393 citations


Journal ArticleDOI
07 Jan 2021-Nature
TL;DR: A high-resolution map of coding regions in the SARS-CoV-2 genome enables the identification of 23 unannotated open reading frames and quantification of the expression of canonical viral open readingFrames, and it is shown that viral mRNAs are not translated more efficiently than host m RNAs; instead, virus translation dominates host translation because of the high levels of viral transcripts.
Abstract: Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is the cause of the ongoing coronavirus disease 2019 (COVID-19) pandemic1. To understand the pathogenicity and antigenic potential of SARS-CoV-2 and to develop therapeutic tools, it is essential to profile the full repertoire of its expressed proteins. The current map of SARS-CoV-2 coding capacity is based on computational predictions and relies on homology with other coronaviruses. As the protein complement varies among coronaviruses, especially in regard to the variety of accessory proteins, it is crucial to characterize the specific range of SARS-CoV-2 proteins in an unbiased and open-ended manner. Here, using a suite of ribosome-profiling techniques2–4, we present a high-resolution map of coding regions in the SARS-CoV-2 genome, which enables us to accurately quantify the expression of canonical viral open reading frames (ORFs) and to identify 23 unannotated viral ORFs. These ORFs include upstream ORFs that are likely to have a regulatory role, several in-frame internal ORFs within existing ORFs, resulting in N-terminally truncated products, as well as internal out-of-frame ORFs, which generate novel polypeptides. We further show that viral mRNAs are not translated more efficiently than host mRNAs; instead, virus translation dominates host translation because of the high levels of viral transcripts. Our work provides a resource that will form the basis of future functional studies. A high-resolution map of coding regions in the SARS-CoV-2 genome enables the identification of 23 unannotated open reading frames and quantification of the expression of canonical viral open reading frames.

381 citations


Journal ArticleDOI
Toni Delorey1, Carly G. K. Ziegler, Graham Heimberg1, Rachelly Normand, Yiming Yang2, Yiming Yang1, Asa Segerstolpe1, Domenic Abbondanza1, Stephen J. Fleming1, Ayshwarya Subramanian1, Daniel T. Montoro1, Karthik A. Jagadeesh1, Kushal K. Dey2, Pritha Sen, Michal Slyper1, Yered Pita-Juárez, Devan Phillips1, Jana Biermann3, Zohar Bloom-Ackermann1, Nikolaos Barkas1, Andrea Ganna2, Andrea Ganna4, James Gomez1, Johannes C. Melms3, Igor Katsyv3, Erica Normandin2, Erica Normandin1, Pourya Naderi5, Pourya Naderi2, Yury Popov5, Yury Popov2, Siddharth S. Raju1, Siddharth S. Raju2, Sebastian Niezen5, Sebastian Niezen2, Linus T.-Y. Tsai, Katherine J. Siddle1, Katherine J. Siddle2, Malika Sud1, Victoria M. Tran1, Shamsudheen K. Vellarikkal1, Shamsudheen K. Vellarikkal6, Yiping Wang3, Liat Amir-Zilberstein1, Deepak Atri1, Deepak Atri6, Joseph M. Beechem7, Olga R. Brook5, Jonathan H. Chen1, Jonathan H. Chen2, Prajan Divakar7, Phylicia Dorceus1, Jesse M. Engreitz1, Jesse M. Engreitz8, Adam Essene5, Donna M. Fitzgerald2, Robin Fropf7, Steven Gazal9, Joshua Gould1, John Grzyb6, Tyler Harvey1, Jonathan L. Hecht5, Jonathan L. Hecht2, Tyler Hether7, Judit Jané-Valbuena1, Michael Leney-Greene1, Hui Ma1, Hui Ma2, Cristin McCabe1, Daniel E. McLoughlin2, Eric M. Miller7, Christoph Muus1, Christoph Muus2, Mari Niemi4, Robert F. Padera10, Robert F. Padera2, Robert F. Padera6, Liuliu Pan7, Deepti Pant5, Carmel Pe’er1, Jenna Pfiffner-Borges1, Christopher J. Pinto2, Jacob Plaisted6, Jason Reeves7, Marty Ross7, Melissa Rudy1, Erroll H. Rueckert7, Michelle Siciliano6, Alexander Sturm1, Ellen Todres1, Avinash Waghray2, Sarah Warren7, Shuting Zhang1, Daniel R. Zollinger7, Lisa A. Cosimi6, Rajat M. Gupta6, Rajat M. Gupta1, Nir Hacohen1, Nir Hacohen2, Hanina Hibshoosh3, Winston Hide, Alkes L. Price2, Jayaraj Rajagopal2, Purushothama Rao Tata11, Stefan Riedel2, Stefan Riedel5, Gyongyi Szabo5, Gyongyi Szabo2, Gyongyi Szabo1, Timothy L. Tickle1, Patrick T. Ellinor1, Deborah T. Hung2, Deborah T. Hung1, Pardis C. Sabeti, Richard M. Novak12, Robert S. Rogers2, Robert S. Rogers5, Donald E. Ingber12, Donald E. Ingber13, Donald E. Ingber2, Z. Gordon Jiang2, Z. Gordon Jiang5, Dejan Juric2, Mehrtash Babadi1, Samouil L. Farhi1, Benjamin Izar, James R. Stone2, Ioannis S. Vlachos, Isaac H. Solomon6, Orr Ashenberg1, Caroline B. M. Porter1, Bo Li1, Bo Li2, Alex K. Shalek, Alexandra-Chloé Villani, Orit Rozenblatt-Rosen14, Orit Rozenblatt-Rosen1, Aviv Regev 
29 Apr 2021-Nature
TL;DR: In this article, single-cell analysis of lung, heart, kidney and liver autopsy samples shows the molecular and cellular changes and immune response resulting from severe SARS-CoV-2 infection.
Abstract: COVID-19, which is caused by SARS-CoV-2, can result in acute respiratory distress syndrome and multiple organ failure1–4, but little is known about its pathophysiology. Here we generated single-cell atlases of 24 lung, 16 kidney, 16 liver and 19 heart autopsy tissue samples and spatial atlases of 14 lung samples from donors who died of COVID-19. Integrated computational analysis uncovered substantial remodelling in the lung epithelial, immune and stromal compartments, with evidence of multiple paths of failed tissue regeneration, including defective alveolar type 2 differentiation and expansion of fibroblasts and putative TP63+ intrapulmonary basal-like progenitor cells. Viral RNAs were enriched in mononuclear phagocytic and endothelial lung cells, which induced specific host programs. Spatial analysis in lung distinguished inflammatory host responses in lung regions with and without viral RNA. Analysis of the other tissue atlases showed transcriptional alterations in multiple cell types in heart tissue from donors with COVID-19, and mapped cell types and genes implicated with disease severity based on COVID-19 genome-wide association studies. Our foundational dataset elucidates the biological effect of severe SARS-CoV-2 infection across the body, a key step towards new treatments. Single-cell analysis of lung, heart, kidney and liver autopsy samples shows the molecular and cellular changes and immune response resulting from severe COVID-19 infection.

380 citations


Journal ArticleDOI
TL;DR: The authors performed a genome-wide association study of 41,917 bipolar disorder cases and 371,549 controls of European ancestry, which identified 64 associated genomic loci, including genes encoding targets of antipsychotics, calcium channel blockers, antiepileptics and anesthetics.
Abstract: Bipolar disorder is a heritable mental illness with complex etiology. We performed a genome-wide association study of 41,917 bipolar disorder cases and 371,549 controls of European ancestry, which identified 64 associated genomic loci. Bipolar disorder risk alleles were enriched in genes in synaptic signaling pathways and brain-expressed genes, particularly those with high specificity of expression in neurons of the prefrontal cortex and hippocampus. Significant signal enrichment was found in genes encoding targets of antipsychotics, calcium channel blockers, antiepileptics and anesthetics. Integrating expression quantitative trait locus data implicated 15 genes robustly linked to bipolar disorder via gene expression, encoding druggable targets such as HTR6, MCHR1, DCLK3 and FURIN. Analyses of bipolar disorder subtypes indicated high but imperfect genetic correlation between bipolar disorder type I and II and identified additional associated loci. Together, these results advance our understanding of the biological etiology of bipolar disorder, identify novel therapeutic leads and prioritize genes for functional follow-up studies.

378 citations


Journal ArticleDOI
TL;DR: MaAsLin 2 (Microbiome Multivariable Associations with Linear Models) as mentioned in this paper uses generalized linear and mixed models to accommodate a wide variety of modern epidemiological studies, including cross-sectional and longitudinal designs, as well as a variety of data types with or without covariates and repeated measurements.
Abstract: It is challenging to associate features such as human health outcomes, diet, environmental conditions, or other metadata to microbial community measurements, due in part to their quantitative properties. Microbiome multi-omics are typically noisy, sparse (zero-inflated), high-dimensional, extremely non-normal, and often in the form of count or compositional measurements. Here we introduce an optimized combination of novel and established methodology to assess multivariable association of microbial community features with complex metadata in population-scale observational studies. Our approach, MaAsLin 2 (Microbiome Multivariable Associations with Linear Models), uses generalized linear and mixed models to accommodate a wide variety of modern epidemiological studies, including cross-sectional and longitudinal designs, as well as a variety of data types (e.g., counts and relative abundances) with or without covariates and repeated measurements. To construct this method, we conducted a large-scale evaluation of a broad range of scenarios under which straightforward identification of meta-omics associations can be challenging. These simulation studies reveal that MaAsLin 2's linear model preserves statistical power in the presence of repeated measures and multiple covariates, while accounting for the nuances of meta-omics features and controlling false discovery. We also applied MaAsLin 2 to a microbial multi-omics dataset from the Integrative Human Microbiome (HMP2) project which, in addition to reproducing established results, revealed a unique, integrated landscape of inflammatory bowel diseases (IBD) across multiple time points and omics profiles.


Journal ArticleDOI
TL;DR: In this article, a clustering-constrained-attention multiple-instance learning (CLAM) method is proposed to identify subregions of high diagnostic value to accurately classify whole slides and instance level clustering over the identified representative regions to constrain and refine the feature space.
Abstract: Deep-learning methods for computational pathology require either manual annotation of gigapixel whole-slide images (WSIs) or large datasets of WSIs with slide-level labels and typically suffer from poor domain adaptation and interpretability. Here we report an interpretable weakly supervised deep-learning method for data-efficient WSI processing and learning that only requires slide-level labels. The method, which we named clustering-constrained-attention multiple-instance learning (CLAM), uses attention-based learning to identify subregions of high diagnostic value to accurately classify whole slides and instance-level clustering over the identified representative regions to constrain and refine the feature space. By applying CLAM to the subtyping of renal cell carcinoma and non-small-cell lung cancer as well as the detection of lymph node metastasis, we show that it can be used to localize well-known morphological features on WSIs without the need for spatial labels, that it overperforms standard weakly supervised classification algorithms and that it is adaptable to independent test cohorts, smartphone microscopy and varying tissue content.

Journal ArticleDOI
TL;DR: This paper performed deep metagenomic sequencing of 1,203 gut microbiomes from 1,098 individuals enrolled in the Personalised Responses to Dietary Composition Trial (PREDICT 1) study, whose detailed longterm diet information, as well as hundreds of fasting and same-meal post-prandial cardiometabolic blood marker measurements were available.
Abstract: The gut microbiome is shaped by diet and influences host metabolism; however, these links are complex and can be unique to each individual. We performed deep metagenomic sequencing of 1,203 gut microbiomes from 1,098 individuals enrolled in the Personalised Responses to Dietary Composition Trial (PREDICT 1) study, whose detailed long-term diet information, as well as hundreds of fasting and same-meal postprandial cardiometabolic blood marker measurements were available. We found many significant associations between microbes and specific nutrients, foods, food groups and general dietary indices, which were driven especially by the presence and diversity of healthy and plant-based foods. Microbial biomarkers of obesity were reproducible across external publicly available cohorts and in agreement with circulating blood metabolites that are indicators of cardiovascular disease risk. While some microbes, such as Prevotella copri and Blastocystis spp., were indicators of favorable postprandial glucose metabolism, overall microbiome composition was predictive for a large panel of cardiometabolic blood markers including fasting and postprandial glycemic, lipemic and inflammatory indices. The panel of intestinal species associated with healthy dietary habits overlapped with those associated with favorable cardiometabolic and postprandial markers, indicating that our large-scale resource can potentially stratify the gut microbiome into generalizable health levels in individuals without clinically manifest disease.


Journal ArticleDOI
27 May 2021-Cell
TL;DR: BioPlex 3.0 as discussed by the authors is a cell-line-specific interaction network that includes 118,162 interactions among 14,586 proteins in 293T cells and 5,522 immunoprecipitations in HCT116 cells.

Journal ArticleDOI
26 Aug 2021
TL;DR: This Primer provides an introduction to genome-wide association studies (GWAS), techniques for deriving functional inferences from the results and applications of GWAS in understanding disease risk and trait architecture, and discusses important ethical considerations when considering GWAS populations and data.
Abstract: Genome-wide association studies (GWAS) test hundreds of thousands of genetic variants across many genomes to find those statistically associated with a specific trait or disease. This methodology has generated a myriad of robust associations for a range of traits and diseases, and the number of associated variants is expected to grow steadily as GWAS sample sizes increase. GWAS results have a range of applications, such as gaining insight into a phenotype’s underlying biology, estimating its heritability, calculating genetic correlations, making clinical risk predictions, informing drug development programmes and inferring potential causal relationships between risk factors and health outcomes. In this Primer, we provide the reader with an introduction to GWAS, explaining their statistical basis and how they are conducted, describe state-of-the art approaches and discuss limitations and challenges, concluding with an overview of the current and future applications for GWAS results. Uffelmann et al. describe the key considerations and best practices for conducting genome-wide association studies (GWAS), techniques for deriving functional inferences from the results and applications of GWAS in understanding disease risk and trait architecture. The Primer also provides information on the best practices for data sharing and discusses important ethical considerations when considering GWAS populations and data.

Journal ArticleDOI
TL;DR: In this paper, the authors conducted 220 deep-phenotype genome-wide association studies (diseases, biomarkers and medication usage) in BioBank Japan (n = 179,000), by incorporating past medical history and text-mining of electronic medical records.
Abstract: Current genome-wide association studies do not yet capture sufficient diversity in populations and scope of phenotypes. To expand an atlas of genetic associations in non-European populations, we conducted 220 deep-phenotype genome-wide association studies (diseases, biomarkers and medication usage) in BioBank Japan (n = 179,000), by incorporating past medical history and text-mining of electronic medical records. Meta-analyses with the UK Biobank and FinnGen (ntotal = 628,000) identified ~5,000 new loci, which improved the resolution of the genomic map of human traits. This atlas elucidated the landscape of pleiotropy as represented by the major histocompatibility complex locus, where we conducted HLA fine-mapping. Finally, we performed statistical decomposition of matrices of phenome-wide summary statistics, and identified latent genetic components, which pinpointed responsible variants and biological mechanisms underlying current disease classifications across populations. The decomposed components enabled genetically informed subtyping of similar diseases (for example, allergic diseases). Our study suggests a potential avenue for hypothesis-free re-investigation of human diseases through genetics.

Journal ArticleDOI
02 Apr 2021-Science
TL;DR: In this article, the authors present 64 assembled haplotypes from 32 diverse human genomes, which integrate all forms of genetic variation, even across complex loci, and identify 107,590 structural variants (SVs), of which 68% were not discovered with short-read sequencing.
Abstract: Long-read and strand-specific sequencing technologies together facilitate the de novo assembly of high-quality haplotype-resolved human genomes without parent-child trio data. We present 64 assembled haplotypes from 32 diverse human genomes. These highly contiguous haplotype assemblies (average minimum contig length needed to cover 50% of the genome: 26 million base pairs) integrate all forms of genetic variation, even across complex loci. We identified 107,590 structural variants (SVs), of which 68% were not discovered with short-read sequencing, and 278 SV hotspots (spanning megabases of gene-rich sequence). We characterized 130 of the most active mobile element source elements and found that 63% of all SVs arise through homology-mediated mechanisms. This resource enables reliable graph-based genotyping from short reads of up to 50,340 SVs, resulting in the identification of 1526 expression quantitative trait loci as well as SV candidates for adaptive selection within the human population.

Journal ArticleDOI
29 Apr 2021-Nature
TL;DR: In this paper, the authors performed single-nucleus RNA sequencing of about 116,000 nuclei from the lungs of nineteen individuals who died of COVID-19 and underwent rapid autopsy and seven control individuals.
Abstract: Respiratory failure is the leading cause of death in patients with severe SARS-CoV-2 infection1,2, but the host response at the lung tissue level is poorly understood. Here we performed single-nucleus RNA sequencing of about 116,000 nuclei from the lungs of nineteen individuals who died of COVID-19 and underwent rapid autopsy and seven control individuals. Integrated analyses identified substantial alterations in cellular composition, transcriptional cell states, and cell-to-cell interactions, thereby providing insight into the biology of lethal COVID-19. The lungs from individuals with COVID-19 were highly inflamed, with dense infiltration of aberrantly activated monocyte-derived macrophages and alveolar macrophages, but had impaired T cell responses. Monocyte/macrophage-derived interleukin-1β and epithelial cell-derived interleukin-6 were unique features of SARS-CoV-2 infection compared to other viral and bacterial causes of pneumonia. Alveolar type 2 cells adopted an inflammation-associated transient progenitor cell state and failed to undergo full transition into alveolar type 1 cells, resulting in impaired lung regeneration. Furthermore, we identified expansion of recently described CTHRC1+ pathological fibroblasts3 contributing to rapidly ensuing pulmonary fibrosis in COVID-19. Inference of protein activity and ligand–receptor interactions identified putative drug targets to disrupt deleterious circuits. This atlas enables the dissection of lethal COVID-19, may inform our understanding of long-term complications of COVID-19 survivors, and provides an important resource for therapeutic development. Lung samples collected soon after death from COVID-19 are used to provide a single-cell atlas of SARS-CoV-2 infection and the ensuing molecular changes.

Journal ArticleDOI
07 Jan 2021-Cell
TL;DR: Evidence from in-vitro and in vivo experiments support a model where RNAs produced during early steps in transcription initiation stimulate condensate formation, whereas the burst of RNAsproduced during elongation stimulate condensation.

Journal ArticleDOI
Douglas P Wightman1, Iris E. Jansen1, Jeanne E. Savage1, Alexey A. Shadrin2, Shahram Bahrami3, Shahram Bahrami2, Dominic Holland4, Arvid Rongve5, Sigrid Børte2, Sigrid Børte6, Sigrid Børte3, Bendik S. Winsvold6, Bendik S. Winsvold3, Ole Kristian Drange6, Amy E Martinsen2, Amy E Martinsen3, Amy E Martinsen6, Anne Heidi Skogholt6, Cristen J. Willer7, Geir Bråthen6, Ingunn Bosnes8, Ingunn Bosnes6, Jonas B. Nielsen7, Jonas B. Nielsen9, Jonas B. Nielsen6, Lars G. Fritsche7, Laurent F. Thomas6, Linda M. Pedersen3, Maiken Elvestad Gabrielsen6, Marianne Bakke Johnsen2, Marianne Bakke Johnsen6, Marianne Bakke Johnsen3, Tore Wergeland Meisingset6, Wei Zhou7, Wei Zhou10, Petroula Proitsi11, Angela Hodges11, Richard Dobson, Latha Velayudhan11, Karl Heilbron, Adam Auton, Julia M. Sealock12, Lea K. Davis12, Nancy L. Pedersen13, Chandra A. Reynolds14, Ida K. Karlsson13, Ida K. Karlsson15, Sigurdur H. Magnusson16, Hreinn Stefansson16, Steinunn Thordardottir, Palmi V. Jonsson17, Jon Snaedal, Anna Zettergren18, Ingmar Skoog18, Ingmar Skoog19, Silke Kern19, Silke Kern18, Margda Waern19, Margda Waern18, Henrik Zetterberg, Kaj Blennow19, Kaj Blennow18, Eystein Stordal8, Eystein Stordal6, Kristian Hveem6, John-Anker Zwart2, John-Anker Zwart6, John-Anker Zwart3, Lavinia Athanasiu3, Lavinia Athanasiu2, Per Selnes20, Ingvild Saltvedt6, Sigrid Botne Sando6, Ingun Ulstein3, Srdjan Djurovic3, Srdjan Djurovic5, Tormod Fladby20, Tormod Fladby2, Dag Aarsland21, Dag Aarsland11, Geir Selbæk3, Geir Selbæk2, Stephan Ripke22, Stephan Ripke10, Stephan Ripke23, Kari Stefansson16, Ole A. Andreassen3, Ole A. Andreassen2, Danielle Posthuma24, Danielle Posthuma1 
TL;DR: This paper identified microglia, immune cells and protein catabolism as relevant genes for late-onset Alzheimer's disease, while identifying and prioritizing previously unidentified genes of potential interest.
Abstract: Late-onset Alzheimer's disease is a prevalent age-related polygenic disease that accounts for 50-70% of dementia cases. Currently, only a fraction of the genetic variants underlying Alzheimer's disease have been identified. Here we show that increased sample sizes allowed identification of seven previously unidentified genetic loci contributing to Alzheimer's disease. This study highlights microglia, immune cells and protein catabolism as relevant to late-onset Alzheimer's disease, while identifying and prioritizing previously unidentified genes of potential interest. We anticipate that these results can be included in larger meta-analyses of Alzheimer's disease to identify further genetic variants that contribute to Alzheimer's pathology.

Journal ArticleDOI
TL;DR: The CellProfiler 4 as discussed by the authors is a new version of this software with expanded functionality based on user feedback, and it has made several user interface refinements to improve the usability of the software.
Abstract: Background Imaging data contains a substantial amount of information which can be difficult to evaluate by eye. With the expansion of high throughput microscopy methodologies producing increasingly large datasets, automated and objective analysis of the resulting images is essential to effectively extract biological information from this data. CellProfiler is a free, open source image analysis program which enables researchers to generate modular pipelines with which to process microscopy images into interpretable measurements. Results Herein we describe CellProfiler 4, a new version of this software with expanded functionality. Based on user feedback, we have made several user interface refinements to improve the usability of the software. We introduced new modules to expand the capabilities of the software. We also evaluated performance and made targeted optimizations to reduce the time and cost associated with running common large-scale analysis pipelines. Conclusions CellProfiler 4 provides significantly improved performance in complex workflows compared to previous versions. This release will ensure that researchers will have continued access to CellProfiler's powerful computational tools in the coming years.

Journal ArticleDOI
TL;DR: In this article, the genetic basis of 35 blood and urine laboratory measurements in the UK Biobank (n = 363,228 individuals) was evaluated and the results delineate the genetic underlying of biomarkers and their causal influences on diseases and improve genetic risk stratification for common diseases.
Abstract: Clinical laboratory tests are a critical component of the continuum of care. We evaluate the genetic basis of 35 blood and urine laboratory measurements in the UK Biobank (n = 363,228 individuals). We identify 1,857 loci associated with at least one trait, containing 3,374 fine-mapped associations and additional sets of large-effect (>0.1 s.d.) protein-altering, human leukocyte antigen (HLA) and copy number variant (CNV) associations. Through Mendelian randomization (MR) analysis, we discover 51 causal relationships, including previously known agonistic effects of urate on gout and cystatin C on stroke. Finally, we develop polygenic risk scores (PRSs) for each biomarker and build 'multi-PRS' models for diseases using 35 PRSs simultaneously, which improved chronic kidney disease, type 2 diabetes, gout and alcoholic cirrhosis genetic risk stratification in an independent dataset (FinnGen; n = 135,500) relative to single-disease PRSs. Together, our results delineate the genetic basis of biomarkers and their causal influences on diseases and improve genetic risk stratification for common diseases.

Journal ArticleDOI
TL;DR: In this paper, the authors propose robust cell type decomposition (RCTD) to detect mixtures and identify cell types on simulated datasets, which can accurately reproduce known cell type and subtype localization patterns in Slide-seq and Visium datasets.
Abstract: A limitation of spatial transcriptomics technologies is that individual measurements may contain contributions from multiple cells, hindering the discovery of cell-type-specific spatial patterns of localization and expression. Here, we develop robust cell type decomposition (RCTD), a computational method that leverages cell type profiles learned from single-cell RNA-seq to decompose cell type mixtures while correcting for differences across sequencing technologies. We demonstrate the ability of RCTD to detect mixtures and identify cell types on simulated datasets. Furthermore, RCTD accurately reproduces known cell type and subtype localization patterns in Slide-seq and Visium datasets of the mouse brain. Finally, we show how RCTD's recovery of cell type localization enables the discovery of genes within a cell type whose expression depends on spatial environment. Spatial mapping of cell types with RCTD enables the spatial components of cellular identity to be defined, uncovering new principles of cellular organization in biological tissue. RCTD is publicly available as an open-source R package at https://github.com/dmcable/RCTD .

Journal ArticleDOI
05 Feb 2021-Science
TL;DR: Investigating the introduction and spread of severe acute respiratory syndrome coronavirus 2 in the Boston area across the first wave of the pandemic provides powerful evidence of the importance of superspreading events in shaping the course of this pandemic and illustrates how some introductions, when amplified under unfortunate circumstances, can have an outsized effect with devastating consequences that extend far beyond the initial events themselves.
Abstract: Analysis of 772 complete severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genomes from early in the Boston-area epidemic revealed numerous introductions of the virus, a small number of which led to most cases. The data revealed two superspreading events. One, in a skilled nursing facility, led to rapid transmission and significant mortality in this vulnerable population but little broader spread, whereas other introductions into the facility had little effect. The second, at an international business conference, produced sustained community transmission and was exported, resulting in extensive regional, national, and international spread. The two events also differed substantially in the genetic variation they generated, suggesting varying transmission dynamics in superspreading events. Our results show how genomic epidemiology can help to understand the link between individual clusters and wider community spread.

Journal ArticleDOI
07 Apr 2021-Nature
TL;DR: In this article, the activity-by-contact (ABC) model was applied to create enhancer-gene maps in 131 human cell types and tissues, and use these maps to interpret the functions of GWAS variants.
Abstract: Genome-wide association studies (GWAS) have identified thousands of noncoding loci that are associated with human diseases and complex traits, each of which could reveal insights into the mechanisms of disease1. Many of the underlying causal variants may affect enhancers2,3, but we lack accurate maps of enhancers and their target genes to interpret such variants. We recently developed the activity-by-contact (ABC) model to predict which enhancers regulate which genes and validated the model using CRISPR perturbations in several cell types4. Here we apply this ABC model to create enhancer–gene maps in 131 human cell types and tissues, and use these maps to interpret the functions of GWAS variants. Across 72 diseases and complex traits, ABC links 5,036 GWAS signals to 2,249 unique genes, including a class of 577 genes that appear to influence multiple phenotypes through variants in enhancers that act in different cell types. In inflammatory bowel disease (IBD), causal variants are enriched in predicted enhancers by more than 20-fold in particular cell types such as dendritic cells, and ABC achieves higher precision than other regulatory methods at connecting noncoding variants to target genes. These variant-to-function maps reveal an enhancer that contains an IBD risk variant and that regulates the expression of PPIF to alter the membrane potential of mitochondria in macrophages. Our study reveals principles of genome regulation, identifies genes that affect IBD and provides a resource and generalizable strategy to connect risk variants of common diseases to their molecular and cellular functions. Mapping enhancer regulation across human cell types and tissues illuminates genome function and provides a resource to connect risk variants for common diseases to their molecular and cellular functions.


Journal ArticleDOI
Trygve E. Bakken1, Nikolas L. Jorstad1, Qiwen Hu2, Blue B. Lake3, Wei Tian4, Brian E. Kalmbach5, Brian E. Kalmbach1, Megan Crow6, Rebecca D. Hodge1, Fenna M. Krienen2, Staci A. Sorensen1, Jeroen Eggermont7, Zizhen Yao1, Brian D. Aevermann8, Andrew Aldridge4, Anna Bartlett4, Darren Bertagnolli1, Tamara Casper1, Rosa Castanon4, Kirsten Crichton1, Tanya L. Daigle1, Rachel A. Dalley1, Nick Dee1, Nikolai C. Dembrow5, Nikolai C. Dembrow9, Dinh Diep3, Songlin Ding1, Weixiu Dong3, Rongxin Fang3, Stephan Fischer6, Melissa Goldman2, Jeff Goldy1, Lucas T. Graybuck1, Brian R. Herb10, Xiaomeng Hou3, Jayaram Kancherla11, Matthew Kroll1, Kanan Lathia1, Baldur van Lew7, Yang Eric Li12, Yang Eric Li3, Christine S. Liu3, Christine S. Liu13, Hanqing Liu4, Jacinta Lucero4, Anup Mahurkar10, Delissa McMillen1, Jeremy A. Miller1, Marmar Moussa14, Joseph R. Nery4, Philip R. Nicovich1, Sheng-Yong Niu3, Sheng-Yong Niu4, Joshua Orvis10, Julia K. Osteen4, Scott F. Owen1, C. Palmer13, C. Palmer3, Thanh Pham1, Nongluk Plongthongkum3, Olivier Poirion3, Nora Reed2, Christine Rimorin1, Angeline Rivkin4, William J. Romanow13, Adriana E. Sedeno-Cortes1, Kimberly Siletti15, Saroja Somasundaram1, Josef Sulc1, Michael Tieu1, Amy Torkelson1, Herman Tung1, Xinxin Wang16, Fangming Xie3, Anna Marie Yanny1, Renee Zhang8, Seth A. Ament10, M. Margarita Behrens4, Héctor Corrada Bravo11, Jerold Chun13, Alexander Dobin6, Jesse Gillis6, Ronna Hertzano10, Patrick R. Hof17, Thomas Höllt18, Gregory D. Horwitz5, C. Dirk Keene5, Peter V. Kharchenko2, Andrew L. Ko5, Andrew L. Ko19, Boudewijn P. F. Lelieveldt7, Boudewijn P. F. Lelieveldt18, Chongyuan Luo20, Eran A. Mukamel3, Antonio Pinto-Duarte4, Sebastian Preissl3, Aviv Regev21, Bing Ren12, Bing Ren3, Richard H. Scheuermann8, Richard H. Scheuermann22, Richard H. Scheuermann3, Kimberly A. Smith1, William J. Spain9, William J. Spain5, Owen White10, Christof Koch1, Michael Hawrylycz1, Bosiljka Tasic1, Evan Z. Macosko21, Steven A. McCarroll2, Steven A. McCarroll21, Jonathan T. Ting1, Jonathan T. Ting5, Hongkui Zeng1, Kun Zhang3, Guoping Feng23, Guoping Feng21, Guoping Feng24, Joseph R. Ecker4, Sten Linnarsson15, Ed S. Lein1 
01 Oct 2021-Nature
TL;DR: The primary motor cortex (M1) is essential for voluntary fine-motor control and is functionally conserved across mammals using high-throughput transcriptomic and epigenomic profiling of more than 450k single nuclei in humans, marmoset monkeys and mice as mentioned in this paper.
Abstract: The primary motor cortex (M1) is essential for voluntary fine-motor control and is functionally conserved across mammals1. Here, using high-throughput transcriptomic and epigenomic profiling of more than 450,000 single nuclei in humans, marmoset monkeys and mice, we demonstrate a broadly conserved cellular makeup of this region, with similarities that mirror evolutionary distance and are consistent between the transcriptome and epigenome. The core conserved molecular identities of neuronal and non-neuronal cell types allow us to generate a cross-species consensus classification of cell types, and to infer conserved properties of cell types across species. Despite the overall conservation, however, many species-dependent specializations are apparent, including differences in cell-type proportions, gene expression, DNA methylation and chromatin state. Few cell-type marker genes are conserved across species, revealing a short list of candidate genes and regulatory mechanisms that are responsible for conserved features of homologous cell types, such as the GABAergic chandelier cells. This consensus transcriptomic classification allows us to use patch-seq (a combination of whole-cell patch-clamp recordings, RNA sequencing and morphological characterization) to identify corticospinal Betz cells from layer 5 in non-human primates and humans, and to characterize their highly specialized physiology and anatomy. These findings highlight the robust molecular underpinnings of cell-type diversity in M1 across mammals, and point to the genes and regulatory pathways responsible for the functional identity of cell types and their species-specific adaptations.

Journal ArticleDOI
TL;DR: A common framework is established that describes the experimental standards for defining trained immunity in both in vitro and in vivo settings, as well as in experimental models and human subjects.
Abstract: The similarities and differences between trained immunity and other immune processes are the subject of intense interrogation. Therefore, a consensus on the definition of trained immunity in both in vitro and in vivo settings, as well as in experimental models and human subjects, is necessary for advancing this field of research. Here we aim to establish a common framework that describes the experimental standards for defining trained immunity.