scispace - formally typeset
Search or ask a question
Posted ContentDOI

A rule-based data-informed cellular consensus map of the human mononuclear phagocyte cell space

TL;DR: A rule-based data-informed approach to build next generation cellular consensus maps, using the human dendritic-cell and monocyte compartment in peripheral blood as an example, and providing a generalizable method for building consensus maps for the life sciences.
Abstract: Single-cell genomic techniques are opening new avenues to understand the basic units of life. Large international efforts, such as those to derive a Human Cell Atlas, are driving progress in this area; here, cellular map generation is key. To expedite the inevitable iterations of these underlying maps, we have developed a rule-based data-informed approach to build next generation cellular consensus maps. Using the human dendritic-cell and monocyte compartment in peripheral blood as an example, we performed computational integration of previous, partially overlapping maps using an approach we termed ‘backmapping’, combined with multi-color flow-cytometry and index sorting-based single-cell RNA-sequencing. Our general strategy can be applied to any atlas generation for humans and other species. Graphical Highlights Defining a consensus of the human myeloid cell compartment in peripheral blood 3 monocytes subsets, pDC, cDC1, DC2, DC3 and precursor DC make up the compartment Distinguish myeloid cell compartment from other cell spaces, e.g. the NK cell space Providing a generalizable method for building consensus maps for the life sciences

Summary (5 min read)

Introduction

  • Such single-cell technologies allow for a fully data-driven analysis to establish cell maps of an organism, such as those proposed by the Human Cell Atlas consortium (Rozenblatt-Rosen et al., 2017).
  • Reliable consensus maps are a prerequisite to reconcile conflicting data that might have been generated based on different data generating approaches (Edney, 2019; Monmonier, 2015).
  • In order to establish a consensus map of the human mononuclear myeloid cell compartment the authors allow for the integration of prior knowledge in that they define a priori criteria for the cellular compartment under study in order to increase resolution and to allow 5 building of a consensus map.

Results

  • Integrated phenotypic characterization of the myeloid cell compartment in human peripheral blood CC-BY-NC-ND 4.0 International licensea certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
  • The copyright holder for this preprint (which was notthis version posted June 3, 2019.
  • To integrate the identified DC subsets in map 1 and map 2 with each other, the authors computed a UMAP topology from the original map 1 single-cell transcriptome data comprising the DC cell space and overlaid the signatures of the map 2 DC subsets (pDC, cDC1, cDC2, pre-DC) .
  • This analysis showed that if the totality of the Lin-CD16+ compartment is mapped back onto the Lin- UMAP topology , NK cells (CD56+), monocytes (CD56-CD16+/-) and granulocyte fractions (CD16high) are included in this cellular compartment.

Discussion

  • Consensus maps are an important instrument within an iterative process of producing cellular maps of all organs and tissues in different species, including humans.
  • Because the authors propose to include prior knowledge in the respective scientific field into the algorithm for generating such consensus maps, they define the overall strategy as being ‘data-informed’, combining prior knowledge and data-driven technologies including single-cell omics.
  • CC-BY-NC-ND 4.0 International licensea certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
  • The copyright holder for this preprint (which was notthis version posted June 3, 2019.
  • BioRxiv preprint 20 providing the next iteration of this particular subspace in the myeloid cell map of human peripheral blood.

Acknowledgments:

  • The authors thank Jessica Tamanini for critical review and editing of the manuscript.
  • This work was supported by the German Research Foundation to JLS (GRK 2168, INST 217/577-1, EXC2151/1), by the HGF grant sparse2big to JLS, the FASTGenomics grant of 5 the German Federal Ministry for Economic Affairs and Energy to JLS and the EU project SYSCID under grant number 733100, also known as Funding.
  • F.G is an EMBO YIP awardee and is supported by Singapore Immunology Network (SIgN) and Shanghai Institute of Immunology core funding.
  • The authors declare that there are no competing interests.

Figure Legends

  • Generating a new consensus map of the mononuclear myeloid cell compartment in human peripheral blood.
  • (B) Visualization of ~1.4 mio. live CD45+Lin(CD3, CD19, 5 CD20, CD56)- cells after UMAP dimensionality reduction of the flow cytometry panel introduced in A (left panel), mononuclear myeloid cell compartment (second panel), overlay of index-sorted cells (third panel), UMAP topology of the index-sorted cells based on the single-cell transcriptome data .
  • (B) Heatmap of 10 most significant marker genes for each of the 11 clusters identified and visualized in Figure 2A.
  • (G) UMAP topology of scRNA-seq data derived from the map1 DC and mono subsets (left panel) and overlay of the NK cell signature onto this UMAP topology.
  • 20 25 .CC-BY-NC-ND 4.0 International licensea certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.

Tables S1:

  • Cell types classified in the respective studies Data Table S1: 5 Data Table S1.csv.
  • Gene signatures of the 11 clusters identified in their new scRNA-seq consensus map.

Data Table S2:

  • CC-BY-NC-ND 4.0 International licensea certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
  • The copyright holder for this preprint (which was notthis version posted June 3, 2019.
  • Cell types classified in the respective studies .
  • CC-BY-N -ND 4.0 Internatio al licensea certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.

EXPERIMENTAL MODEL AND SUBJECT DETAILS

  • Peripheral blood mononuclear cells (PBMC) Buffy coats or venipuncture blood were obtained from healthy donors (University hospital Bonn, local ethics vote 203/09) after written consent was given according to the Declaration of Helsinki.
  • 10 Peripheral blood mononuclear cells (PBMC) were isolated by Pancoll (PAN-Biotech) density centrifugation from buffy coats.

METHOD DETAILS

  • Whole blood or buffy coat was diluted in room temperature PBS (1:2 or 1:5, respectively) and layered onto polysuccrose solution (Pancoll; PAN Biotech, Germany) for the enrichment of mononuclear cells by density gradient centrifugation according to the manufacturer's instructions.
  • Washed cells were incubated with L/D Marker DRAQ7 (BioLegend, USA) for 5 min at room temperature before acquisition and sorting of the cells using a BD FACSARIA III (BD BioSciences, USA).
  • CC-BY-NC-ND 4.0 International licensea certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
  • The authors new index-sorted single cell transcriptome dataset was based on the Smart-Seq2 protocol (Picelli et al., 2013).
  • CDNA was diluted to an average of 200pg/µl and 100pg cDNA from each cell was tagmented by adding 1µl TD and 0.5µl ATM from a Nextera XT DNA Library Preparation Kit to 0.5µl diluted cDNA in each well of a fresh 384-well plate.

Cytospin preparation and May-Grünwald/Giemsa staining

  • Cell populations of interest were sorted into 1.5 ml reaction tubes containing 200 µl FACS-buffer 5 using a BD FACSARIA III (BD BioSciences, USA).
  • Whole blood was diluted in room temperature PBS (1:2) and layered onto polysuccrose solution (Pancoll; PAN Biotech, Germany) for the enrichment of mononuclear cells by density gradient 15 centrifugation according to the manufacturer's instructions.
  • Sequenced single-cell data was demultiplexed using bcl2fastq2 v2.20.
  • Based on the pseudoalignment estimated by Kallisto, transcript levels were quantified as transcripts per million reads (TPM).

Quality control

  • Concerning their new index-sorted and Smart-Seq2-based single cell transcriptome dataset the following quality control scheme using various meta information was performed to obtain highquality transcriptome data: 1) We removed genes that are detected in less than 6 cells (0.2 percent of cells), 2) and removed cells that have less than 1,000 uniquely detected genes.the authors.the authors.
  • Next, 25 .CC-BY-NC-ND 4.0 International licensea certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
  • BioRxiv preprint 36 the authors filtered further outlier cells with 3) less than 50,000 unique reads, 4) less than 30% pseudoalignment of reads to the transcriptome, 5) a lower rate of endogenous-to-mitochondrial count rate of 2, 6).
  • To reduce the influence of variation of sequencing depth among samples the authors applied a lognormalization to the data and scaled each cells gene expression profile to a total count of 10,000.
  • The residuals of this regression are scaled and centered and used for further downstream analysis.

Dimensionality reduction and clustering

  • This resulted in a total of 2491 genes, which were used as input for a principal component (PC) analysis.
  • To test for cellular heterogeneity, the authors used a shared nearest neighbor (SNN)-graph based clustering algorithm implemented in the Seurat package.
  • The authors used the first 10 principal components for constructing the SNN-graph and set the resolution to 1.
  • Monocle was used to infer differentiation trajectories by using the Louvain clustering method, umap dimensionality reduction and the SimplePPT algorithm (Qiu et al., 2017) 25 .CC-BY-NC-ND 4.0 International licensea certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
  • The copyright holder for this preprint (which was notthis version posted June 3, 2019.

Additional analysis

  • Differentially expressed (DE) genes were defined using a Wilcoxon-based test for differential gene expression built in the Seurat pipeline (v.2.3.4) (Data Table S1).
  • Top10 DE genes have been visualized using heatmap of hierarchical clustered gene expression 5 profiles.
  • Gene signature enrichment analysis Single-cell RNA-Seq data is inherently sparse and a high-dropout rate is limiting the use of single marker genes to identify cell populations.
  • In order to increase the power, the authors use both up and downregulated gene signatures for the calculation of the gene expression scores.
  • The difference between these two is scaled and visualized.

To assess the single-cell RNA-Seq data of human dendritic cells and monocytes publicly available

  • Under the Gene Expression Omnibus accession number GSE94820, the authors applied the processing 25 .
  • CC-BY-NC-ND 4.0 International licensea certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
  • The copyright holder for this preprint (which was notthis version posted June 3, 2019.
  • Next, the authors followed the general data analysis scheme described at the Seurat package webpage 15 (https://satijalab.org/seurat/get_started_v1_4.html).
  • Briefly, the authors used the filtered cell-gene matrix provided by 10x Genomics and imported the data and performed the analysis with the Seurat package.

Backmapping

  • In order to compare the transcriptome profiles of monocytes isolated from the dataset derived 5 from GSE94820 (Villani et al., 2017) with the comprehensive PBMC dataset, the authors used the previously introduced canonical correlation alignment to combine datasets (Butler et al., 2018).
  • The authors determined the mutual highly variable genes as the overlap of the 4.000 genes from each dataset with highest dispersion.
  • The authors treated the different batches of the HCA dataset 25 as individual datasets and normalized them and the expression table of the consensus map .
  • CC-BY-NC-ND 4.0 International licensea certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
  • First, the authors repeated the steps above but without integration of the new consensus map data.

Data visualization

  • In general, the ggplot2 package was used to generate figures (Wickham, 2016).
  • 25 .CC-BY-NC-ND 4.0 International licensea certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
  • The copyright holder for this preprint (which was notthis version posted June 3, 2019.

QUANTIFICATION AND STATISTICAL ANALYSIS

  • Statistical analysis was performed using the R programming language.
  • Statistical tests used are described in the figure legend or methods part, respectively.
  • Differentially expressed genes have been identified using a Wilcoxon-based test for differential gene expression.
  • If not otherwise stated a significance level of 0.1 was applied to adjusted p-values (Benjamini Hochberg).

DATA AND SOFTWARE AVAILABILITY

  • Processed and raw scRNA-seq datasets are available through the Gene Expression Omnibus (GSE126422).
  • Additional Data tables are provided in form of EXCEL Tables (Data S1, S2) Data Table S1: Data Table S1.csv 10 Gene signatures of the 11 clusters identified in their new scRNA-seq consensus map.

ADDITIONAL RESOURCES

  • In addition, the authors provide an interactive web tool to visualize the single-cell RNA-Seq data together with the flow cytometry data at https://paguen.shinyapps.io/DC_MONO/ (external database S1).
  • .CC-BY-NC-ND 4.0 International licensea certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
  • The copyright holder for this preprint (which was notthis version posted June 3, 2019.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: The current understanding of the developmental path of DCs from hematopoietic stem cells to fully functional DCs in their local tissue environment is summarized and a template for the identification ofDCs across various tissues is provided.

27 citations

Journal ArticleDOI
TL;DR: Roles for IRF8, IRF4, NOTCH, ZEB2, KLF4 and TBet in cDC1 and cDC2 fate specification and role of tissue microenvironment in c DC2 heterogeneity remains to be studied.

26 citations

Journal ArticleDOI
TL;DR: The myeloid cell system shows very high plasticity, which is crucial to quickly adapt to changes during an immune response, and earlier attempts of cell type classification within the myeloidal cell system have been difficult.
Abstract: The myeloid cell system shows very high plasticity, which is crucial to quickly adapt to changes during an immune response. From the beginning, this high plasticity has made cell type classification within the myeloid cell system difficult. Not surprising, naming schemes have been frequently changed. Recent advancements in multidimensional technologies, including mass cytometry and single-cell RNA sequencing, are challenging our current understanding of cell types, cell subsets, and functional states of cells. Despite the power of these technologies to create new reference maps for the myeloid cell system, it is essential to put these new results into context with previous knowledge that was established over decades. Here we report on earlier attempts of cell type classification in the myeloid cell system, discuss current approaches and their pros and cons, and propose future strategies for cell type classification within the myeloid cell system that can be easily extended to other cell types.

17 citations

Journal ArticleDOI
TL;DR: The impact of sex, ethnicity, age, sleep, diet, and exercise on monocyte subsets and their function is outlined, highlighting that steady state is not a single physiological condition.
Abstract: Blood monocytes develop in the bone marrow before being released into the peripheral circulation. The circulating monocyte pool is composed of multiple subsets, each with specialized functions. These cells are recruited to repopulate resident monocyte-derived cells in the periphery and also to sites of injury. Several extrinsic factors influence the function and quantity of monocytes in the blood. Here, we outline the impact of sex, ethnicity, age, sleep, diet, and exercise on monocyte subsets and their function, highlighting that steady state is not a single physiological condition. A clearer understanding of the relationship between these factors and the immune system may allow for improved therapeutic strategies.

14 citations


Cites background from "A rule-based data-informed cellular..."

  • ...Recently, new monocyte subsets have been described in mice (35, 103–105) and humans (32, 34)....

    [...]

Journal ArticleDOI
TL;DR: The role of immunoglobulin-like receptors in the development of innate immune memory across species is discussed in this paper , where the role of the PIRs is discussed.
Abstract: Host immunity is classically divided into "innate" and "adaptive." While the former has always been regarded as the first, rapid, and antigen-nonspecific reaction to invading pathogens, the latter represents the more sophisticated and antigen-specific response that has the potential to persist and generate memory. Recent work however has challenged this dogma, where murine studies have successfully demonstrated the ability of innate immune cells (monocytes and macrophages) to acquire antigen-specific memory to allogeneic major histocompatibility complex (MHC) molecules. The immunoreceptors so far identified that mediate innate immune memory are the paired immunoglobulin-like receptors (PIRs) in mice, which are orthologous to human leukocyte immunoglobulin-like receptors (LILRs). These receptor families are mainly expressed by the myelomonocytic cell lineage, suggesting an important role in the innate immune response. In this review, we will discuss the role of immunoglobulin-like receptors in the development of innate immune memory across species.

4 citations

References
More filters
Journal ArticleDOI
TL;DR: The Gene Set Enrichment Analysis (GSEA) method as discussed by the authors focuses on gene sets, that is, groups of genes that share common biological function, chromosomal location, or regulation.
Abstract: Although genomewide RNA expression analysis has become a routine tool in biomedical research, extracting biological insight from such information remains a major challenge. Here, we describe a powerful analytical method called Gene Set Enrichment Analysis (GSEA) for interpreting gene expression data. The method derives its power by focusing on gene sets, that is, groups of genes that share common biological function, chromosomal location, or regulation. We demonstrate how GSEA yields insights into several cancer-related data sets, including leukemia and lung cancer. Notably, where single-gene analysis finds little similarity between two independent studies of patient survival in lung cancer, GSEA reveals many biological pathways in common. The GSEA method is embodied in a freely available software package, together with an initial database of 1,325 biologically defined gene sets.

34,830 citations

Book
13 Aug 2009
TL;DR: This book describes ggplot2, a new data visualization package for R that uses the insights from Leland Wilkisons Grammar of Graphics to create a powerful and flexible system for creating data graphics.
Abstract: This book describes ggplot2, a new data visualization package for R that uses the insights from Leland Wilkisons Grammar of Graphics to create a powerful and flexible system for creating data graphics. With ggplot2, its easy to: produce handsome, publication-quality plots, with automatic legends created from the plot specification superpose multiple layers (points, lines, maps, tiles, box plots to name a few) from different data sources, with automatically adjusted common scales add customisable smoothers that use the powerful modelling capabilities of R, such as loess, linear models, generalised additive models and robust regression save any ggplot2 plot (or part thereof) for later modification or reuse create custom themes that capture in-house or journal style requirements, and that can easily be applied to multiple plots approach your graph from a visual perspective, thinking about how each component of the data is represented on the final plot. This book will be useful to everyone who has struggled with displaying their data in an informative and attractive way. You will need some basic knowledge of R (i.e. you should be able to get your data into R), but ggplot2 is a mini-language specifically tailored for producing graphics, and youll learn everything you need in the book. After reading this book youll be able to produce graphics customized precisely for your problems,and youll find it easy to get graphics out of your head and on to the screen or page.

29,504 citations

Journal ArticleDOI
TL;DR: An analytical strategy for integrating scRNA-seq data sets based on common sources of variation is introduced, enabling the identification of shared populations across data sets and downstream comparative analysis.
Abstract: Computational single-cell RNA-seq (scRNA-seq) methods have been successfully applied to experiments representing a single condition, technology, or species to discover and define cellular phenotypes. However, identifying subpopulations of cells that are present across multiple data sets remains challenging. Here, we introduce an analytical strategy for integrating scRNA-seq data sets based on common sources of variation, enabling the identification of shared populations across data sets and downstream comparative analysis. We apply this approach, implemented in our R toolkit Seurat (http://satijalab.org/seurat/), to align scRNA-seq data sets of peripheral blood mononuclear cells under resting and stimulated conditions, hematopoietic progenitors sequenced using two profiling technologies, and pancreatic cell 'atlases' generated from human and mouse islets. In each case, we learn distinct or transitional cell states jointly across data sets, while boosting statistical power through integrated analysis. Our approach facilitates general comparisons of scRNA-seq data sets, potentially deepening our understanding of how distinct cell states respond to perturbation, disease, and evolution.

7,741 citations

Journal ArticleDOI
TL;DR: Kallisto pseudoaligns reads to a reference, producing a list of transcripts that are compatible with each read while avoiding alignment of individual bases, which removes a major computational bottleneck in RNA-seq analysis.
Abstract: We present kallisto, an RNA-seq quantification program that is two orders of magnitude faster than previous approaches and achieves similar accuracy. Kallisto pseudoaligns reads to a reference, producing a list of transcripts that are compatible with each read while avoiding alignment of individual bases. We use kallisto to analyze 30 million unaligned paired-end RNA-seq reads in <10 min on a standard laptop computer. This removes a major computational bottleneck in RNA-seq analysis.

6,468 citations

Journal ArticleDOI
21 May 2015-Cell
TL;DR: Drop-seq will accelerate biological discovery by enabling routine transcriptional profiling at single-cell resolution by separating them into nanoliter-sized aqueous droplets, associating a different barcode with each cell's RNAs, and sequencing them all together.

5,506 citations

Related Papers (5)