Author
Frederik Otzen Bagger
Other affiliations: Boston Children's Hospital, Copenhagen University Hospital, Swiss Institute of Bioinformatics ...read more
Bio: Frederik Otzen Bagger is an academic researcher from University of Copenhagen. The author has contributed to research in topics: Haematopoiesis & Cellular differentiation. The author has an hindex of 23, co-authored 52 publications receiving 4252 citations. Previous affiliations of Frederik Otzen Bagger include Boston Children's Hospital & Copenhagen University Hospital.
Papers
More filters
••
University of Copenhagen1, University Hospital Regensburg2, University of Birmingham3, University of North Carolina at Chapel Hill4, Harvard University5, Aarhus University6, University of Edinburgh7, Lawrence Berkeley National Laboratory8, European Bioinformatics Institute9, Karolinska Institutet10, VU University Medical Center11
TL;DR: It is shown that enhancers share properties with CpG-poor messenger RNA promoters but produce bidirectional, exosome-sensitive, relatively short unspliced RNAs, the generation of which is strongly related to enhancer activity.
Abstract: Enhancers control the correct temporal and cell-type-specific activation of gene expression in multicellular eukaryotes. Knowing their properties, regulatory activity and targets is crucial to understand the regulation of differentiation and homeostasis. Here we use the FANTOM5 panel of samples, covering the majority of human tissues and cell types, to produce an atlas of active, in vivo-transcribed enhancers. We show that enhancers share properties with CpG-poor messenger RNA promoters but produce bidirectional, exosome-sensitive, relatively short unspliced RNAs, the generation of which is strongly related to enhancer activity. The atlas is used to compare regulatory programs between different cells at unprecedented depth, to identify disease-associated regulatory single nucleotide polymorphisms, and to classify cell-type-specific and ubiquitous enhancers. We further explore the utility of enhancer redundancy, which explains gene expression strength rather than expression patterns. The online FANTOM5 enhancer atlas represents a unique resource for studies on cell-type-specific enhancers and gene regulation.
2,260 citations
••
TL;DR: This work presents a generic approach for processing scRNA-seq data and detecting low quality cells, using a curated set of over 20 biological and technical features, which improves classification accuracy by over 30 % compared to traditional methods.
Abstract: Single-cell RNA sequencing (scRNA-seq) has broad applications across biomedical research. One of the key challenges is to ensure that only single, live cells are included in downstream analysis, as the inclusion of compromised cells inevitably affects data interpretation. Here, we present a generic approach for processing scRNA-seq data and detecting low quality cells, using a curated set of over 20 biological and technical features. Our approach improves classification accuracy by over 30 % compared to traditional methods when tested on over 5,000 cells, including CD4+ T cells, bone marrow dendritic cells, and mouse embryonic stem cells.
526 citations
••
Wellcome Trust Sanger Institute1, University of Cambridge2, McGill University3, European Bioinformatics Institute4, Pompeu Fabra University5, University College London6, NHS Blood and Transplant7, Radboud University Nijmegen8, Max Planck Society9, University of Geneva10, New York University11, British Heart Foundation12, Newcastle University13, University of Amsterdam14
TL;DR: High-resolution genetic, epigenetic, and transcriptomic profiling in three major human immune cell types from up to 197 individuals yields insights into cell-type-specific correlation between diverse genomic inputs, more generalizable correlations between these inputs, and defines molecular events that may underpin complex disease risk.
504 citations
••
TL;DR: The developmental trajectories of TH1 and TFH (T follicular helper) cells during blood-stage Plasmodium infection in mice were reconstructed and it was found that precursor TH cells were coached toward a TH1 but not a TFH fate by inflammatory monocytes.
Abstract: Differentiation of naive CD4+ T cells into functionally distinct T helper subsets is crucial for the orchestration of immune responses. Due to extensive heterogeneity and multiple overlapping transcriptional programs in differentiating T cell populations, this process has remained a challenge for systematic dissection in vivo. By using single-cell transcriptomics and computational analysis using a temporal mixtures of Gaussian processes model, termed GPfates, we reconstructed the developmental trajectories of Th1 and Tfh cells during blood-stage Plasmodium infection in mice. By tracking clonality using endogenous TCR sequences, we first demonstrated that Th1/Tfh bifurcation had occurred at both population and single-clone levels. Next, we identified genes whose expression was associated with Th1 or Tfh fates, and demonstrated a T-cell intrinsic role for Galectin-1 in supporting a Th1 differentiation. We also revealed the close molecular relationship between Th1 and IL-10-producing Tr1 cells in this infection. Th1 and Tfh fates emerged from a highly proliferative precursor that upregulated aerobic glycolysis and accelerated cell cycling as cytokine expression began. Dynamic gene expression of chemokine receptors around bifurcation predicted roles for cell-cell in driving Th1/Tfh fates. In particular, we found that precursor Th cells were coached towards a Th1 but not a Tfh fate by inflammatory monocytes. Thus, by integrating genomic and computational approaches, our study has provided two unique resources, a database www.PlasmoTH.org, which facilitates discovery of novel factors controlling Th1/Tfh fate commitment, and more generally, GPfates, a modelling framework for characterizing cell differentiation towards multiple fates.
234 citations
••
TL;DR: The database now includes 23 high-quality curated data sets relevant to normal and malignant blood formation and, in addition, it has assembled and built a unique integrated data set, BloodPool, which contains more than 2000 samples assembled from six independent studies on acute myeloid leukemia.
Abstract: Research on human and murine haematopoiesis has resulted in a vast number of gene-expression data sets that can potentially answer questions regarding normal and aberrant blood formation. To researchers and clinicians with limited bioinformatics experience, these data have remained available, yet largely inaccessible. Current databases provide information about gene-expression but fail to answer key questions regarding co-regulation, genetic programs or effect on patient survival. To address these shortcomings, we present BloodSpot (www.bloodspot.eu), which includes and greatly extends our previously released database HemaExplorer, a database of gene expression profiles from FACS sorted healthy and malignant haematopoietic cells. A revised interactive interface simultaneously provides a plot of gene expression along with a Kaplan-Meier analysis and a hierarchical tree depicting the relationship between different cell types in the database. The database now includes 23 high-quality curated data sets relevant to normal and malignant blood formation and, in addition, we have assembled and built a unique integrated data set, BloodPool. Bloodpool contains more than 2000 samples assembled from six independent studies on acute myeloid leukemia. Furthermore, we have devised a robust sample integration procedure that allows for sensitive comparison of user-supplied patient samples in a well-defined haematopoietic cellular space.
226 citations
Cited by
More filters
••
TL;DR: An analytical strategy for integrating scRNA-seq data sets based on common sources of variation is introduced, enabling the identification of shared populations across data sets and downstream comparative analysis.
Abstract: Computational single-cell RNA-seq (scRNA-seq) methods have been successfully applied to experiments representing a single condition, technology, or species to discover and define cellular phenotypes. However, identifying subpopulations of cells that are present across multiple data sets remains challenging. Here, we introduce an analytical strategy for integrating scRNA-seq data sets based on common sources of variation, enabling the identification of shared populations across data sets and downstream comparative analysis. We apply this approach, implemented in our R toolkit Seurat (http://satijalab.org/seurat/), to align scRNA-seq data sets of peripheral blood mononuclear cells under resting and stimulated conditions, hematopoietic progenitors sequenced using two profiling technologies, and pancreatic cell 'atlases' generated from human and mouse islets. In each case, we learn distinct or transitional cell states jointly across data sets, while boosting statistical power through integrated analysis. Our approach facilitates general comparisons of scRNA-seq data sets, potentially deepening our understanding of how distinct cell states respond to perturbation, disease, and evolution.
7,741 citations
•
TL;DR: The UMAP algorithm is competitive with t-SNE for visualization quality, and arguably preserves more of the global structure with superior run time performance.
Abstract: UMAP (Uniform Manifold Approximation and Projection) is a novel manifold learning technique for dimension reduction UMAP is constructed from a theoretical framework based in Riemannian geometry and algebraic topology The result is a practical scalable algorithm that applies to real world data The UMAP algorithm is competitive with t-SNE for visualization quality, and arguably preserves more of the global structure with superior run time performance Furthermore, UMAP has no computational restrictions on embedding dimension, making it viable as a general purpose dimension reduction technique for machine learning
5,390 citations
01 Feb 2015
TL;DR: In this article, the authors describe the integrative analysis of 111 reference human epigenomes generated as part of the NIH Roadmap Epigenomics Consortium, profiled for histone modification patterns, DNA accessibility, DNA methylation and RNA expression.
Abstract: The reference human genome sequence set the stage for studies of genetic variation and its association with human disease, but epigenomic studies lack a similar reference. To address this need, the NIH Roadmap Epigenomics Consortium generated the largest collection so far of human epigenomes for primary cells and tissues. Here we describe the integrative analysis of 111 reference human epigenomes generated as part of the programme, profiled for histone modification patterns, DNA accessibility, DNA methylation and RNA expression. We establish global maps of regulatory elements, define regulatory modules of coordinated activity, and their likely activators and repressors. We show that disease- and trait-associated genetic variants are enriched in tissue-specific epigenomic marks, revealing biologically relevant cell types for diverse human traits, and providing a resource for interpreting the molecular basis of human disease. Our results demonstrate the central role of epigenomic information for understanding gene regulation, cellular differentiation and human disease.
4,409 citations
••
TL;DR: A new method is introduced, stratified LD score regression, for partitioning heritability from GWAS summary statistics while accounting for linked markers, which is computationally tractable at very large sample sizes and leverages genome-wide information.
Abstract: Recent work has demonstrated that some functional categories of the genome contribute disproportionately to the heritability of complex diseases. Here we analyze a broad set of functional elements, including cell type-specific elements, to estimate their polygenic contributions to heritability in genome-wide association studies (GWAS) of 17 complex diseases and traits with an average sample size of 73,599. To enable this analysis, we introduce a new method, stratified LD score regression, for partitioning heritability from GWAS summary statistics while accounting for linked markers. This new method is computationally tractable at very large sample sizes and leverages genome-wide information. Our findings include a large enrichment of heritability in conserved regions across many traits, a very large immunological disease-specific enrichment of heritability in FANTOM5 enhancers and many cell type-specific enrichments, including significant enrichment of central nervous system cell types in the heritability of body mass index, age at menarche, educational attainment and smoking behavior.
1,939 citations
••
Alistair R. R. Forrest, Hideya Kawaji, Michael Rehli1, J Kenneth Baillie2 +277 more•Institutions (63)
TL;DR: For example, the authors mapped transcription start sites (TSSs) and their usage in human and mouse primary cells, cell lines and tissues to produce a comprehensive overview of mammalian gene expression across the human body.
Abstract: Regulated transcription controls the diversity, developmental pathways and spatial organization of the hundreds of cell types that make up a mammal Using single-molecule cDNA sequencing, we mapped transcription start sites (TSSs) and their usage in human and mouse primary cells, cell lines and tissues to produce a comprehensive overview of mammalian gene expression across the human body We find that few genes are truly 'housekeeping', whereas many mammalian promoters are composite entities composed of several closely separated TSSs, with independent cell-type-specific expression profiles TSSs specific to different cell types evolve at different rates, whereas promoters of broadly expressed genes are the most conserved Promoter-based expression analysis reveals key transcription factors defining cell states and links them to binding-site motifs The functions of identified novel transcripts can be predicted by coexpression and sample ontology enrichment analyses The functional annotation of the mammalian genome 5 (FANTOM5) project provides comprehensive expression profiles and functional annotation of mammalian cell-type-specific transcriptomes with wide applications in biomedical research
1,715 citations