A rule-based data-informed cellular consensus map of the human mononuclear phagocyte cell space
Summary (5 min read)
Introduction
- Such single-cell technologies allow for a fully data-driven analysis to establish cell maps of an organism, such as those proposed by the Human Cell Atlas consortium (Rozenblatt-Rosen et al., 2017).
- Reliable consensus maps are a prerequisite to reconcile conflicting data that might have been generated based on different data generating approaches (Edney, 2019; Monmonier, 2015).
- In order to establish a consensus map of the human mononuclear myeloid cell compartment the authors allow for the integration of prior knowledge in that they define a priori criteria for the cellular compartment under study in order to increase resolution and to allow 5 building of a consensus map.
Results
- Integrated phenotypic characterization of the myeloid cell compartment in human peripheral blood CC-BY-NC-ND 4.0 International licensea certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
- The copyright holder for this preprint (which was notthis version posted June 3, 2019.
- To integrate the identified DC subsets in map 1 and map 2 with each other, the authors computed a UMAP topology from the original map 1 single-cell transcriptome data comprising the DC cell space and overlaid the signatures of the map 2 DC subsets (pDC, cDC1, cDC2, pre-DC) .
- This analysis showed that if the totality of the Lin-CD16+ compartment is mapped back onto the Lin- UMAP topology , NK cells (CD56+), monocytes (CD56-CD16+/-) and granulocyte fractions (CD16high) are included in this cellular compartment.
Discussion
- Consensus maps are an important instrument within an iterative process of producing cellular maps of all organs and tissues in different species, including humans.
- Because the authors propose to include prior knowledge in the respective scientific field into the algorithm for generating such consensus maps, they define the overall strategy as being ‘data-informed’, combining prior knowledge and data-driven technologies including single-cell omics.
- CC-BY-NC-ND 4.0 International licensea certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
- The copyright holder for this preprint (which was notthis version posted June 3, 2019.
- BioRxiv preprint 20 providing the next iteration of this particular subspace in the myeloid cell map of human peripheral blood.
Acknowledgments:
- The authors thank Jessica Tamanini for critical review and editing of the manuscript.
- This work was supported by the German Research Foundation to JLS (GRK 2168, INST 217/577-1, EXC2151/1), by the HGF grant sparse2big to JLS, the FASTGenomics grant of 5 the German Federal Ministry for Economic Affairs and Energy to JLS and the EU project SYSCID under grant number 733100, also known as Funding.
- F.G is an EMBO YIP awardee and is supported by Singapore Immunology Network (SIgN) and Shanghai Institute of Immunology core funding.
- The authors declare that there are no competing interests.
Figure Legends
- Generating a new consensus map of the mononuclear myeloid cell compartment in human peripheral blood.
- (B) Visualization of ~1.4 mio. live CD45+Lin(CD3, CD19, 5 CD20, CD56)- cells after UMAP dimensionality reduction of the flow cytometry panel introduced in A (left panel), mononuclear myeloid cell compartment (second panel), overlay of index-sorted cells (third panel), UMAP topology of the index-sorted cells based on the single-cell transcriptome data .
- (B) Heatmap of 10 most significant marker genes for each of the 11 clusters identified and visualized in Figure 2A.
- (G) UMAP topology of scRNA-seq data derived from the map1 DC and mono subsets (left panel) and overlay of the NK cell signature onto this UMAP topology.
- 20 25 .CC-BY-NC-ND 4.0 International licensea certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
Tables S1:
- Cell types classified in the respective studies Data Table S1: 5 Data Table S1.csv.
- Gene signatures of the 11 clusters identified in their new scRNA-seq consensus map.
Data Table S2:
- CC-BY-NC-ND 4.0 International licensea certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
- The copyright holder for this preprint (which was notthis version posted June 3, 2019.
- Cell types classified in the respective studies .
- CC-BY-N -ND 4.0 Internatio al licensea certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
EXPERIMENTAL MODEL AND SUBJECT DETAILS
- Peripheral blood mononuclear cells (PBMC) Buffy coats or venipuncture blood were obtained from healthy donors (University hospital Bonn, local ethics vote 203/09) after written consent was given according to the Declaration of Helsinki.
- 10 Peripheral blood mononuclear cells (PBMC) were isolated by Pancoll (PAN-Biotech) density centrifugation from buffy coats.
METHOD DETAILS
- Whole blood or buffy coat was diluted in room temperature PBS (1:2 or 1:5, respectively) and layered onto polysuccrose solution (Pancoll; PAN Biotech, Germany) for the enrichment of mononuclear cells by density gradient centrifugation according to the manufacturer's instructions.
- Washed cells were incubated with L/D Marker DRAQ7 (BioLegend, USA) for 5 min at room temperature before acquisition and sorting of the cells using a BD FACSARIA III (BD BioSciences, USA).
- CC-BY-NC-ND 4.0 International licensea certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
- The authors new index-sorted single cell transcriptome dataset was based on the Smart-Seq2 protocol (Picelli et al., 2013).
- CDNA was diluted to an average of 200pg/µl and 100pg cDNA from each cell was tagmented by adding 1µl TD and 0.5µl ATM from a Nextera XT DNA Library Preparation Kit to 0.5µl diluted cDNA in each well of a fresh 384-well plate.
Cytospin preparation and May-Grünwald/Giemsa staining
- Cell populations of interest were sorted into 1.5 ml reaction tubes containing 200 µl FACS-buffer 5 using a BD FACSARIA III (BD BioSciences, USA).
- Whole blood was diluted in room temperature PBS (1:2) and layered onto polysuccrose solution (Pancoll; PAN Biotech, Germany) for the enrichment of mononuclear cells by density gradient 15 centrifugation according to the manufacturer's instructions.
- Sequenced single-cell data was demultiplexed using bcl2fastq2 v2.20.
- Based on the pseudoalignment estimated by Kallisto, transcript levels were quantified as transcripts per million reads (TPM).
Quality control
- Concerning their new index-sorted and Smart-Seq2-based single cell transcriptome dataset the following quality control scheme using various meta information was performed to obtain highquality transcriptome data: 1) We removed genes that are detected in less than 6 cells (0.2 percent of cells), 2) and removed cells that have less than 1,000 uniquely detected genes.the authors.the authors.
- Next, 25 .CC-BY-NC-ND 4.0 International licensea certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
- BioRxiv preprint 36 the authors filtered further outlier cells with 3) less than 50,000 unique reads, 4) less than 30% pseudoalignment of reads to the transcriptome, 5) a lower rate of endogenous-to-mitochondrial count rate of 2, 6).
- To reduce the influence of variation of sequencing depth among samples the authors applied a lognormalization to the data and scaled each cells gene expression profile to a total count of 10,000.
- The residuals of this regression are scaled and centered and used for further downstream analysis.
Dimensionality reduction and clustering
- This resulted in a total of 2491 genes, which were used as input for a principal component (PC) analysis.
- To test for cellular heterogeneity, the authors used a shared nearest neighbor (SNN)-graph based clustering algorithm implemented in the Seurat package.
- The authors used the first 10 principal components for constructing the SNN-graph and set the resolution to 1.
- Monocle was used to infer differentiation trajectories by using the Louvain clustering method, umap dimensionality reduction and the SimplePPT algorithm (Qiu et al., 2017) 25 .CC-BY-NC-ND 4.0 International licensea certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
- The copyright holder for this preprint (which was notthis version posted June 3, 2019.
Additional analysis
- Differentially expressed (DE) genes were defined using a Wilcoxon-based test for differential gene expression built in the Seurat pipeline (v.2.3.4) (Data Table S1).
- Top10 DE genes have been visualized using heatmap of hierarchical clustered gene expression 5 profiles.
- Gene signature enrichment analysis Single-cell RNA-Seq data is inherently sparse and a high-dropout rate is limiting the use of single marker genes to identify cell populations.
- In order to increase the power, the authors use both up and downregulated gene signatures for the calculation of the gene expression scores.
- The difference between these two is scaled and visualized.
To assess the single-cell RNA-Seq data of human dendritic cells and monocytes publicly available
- Under the Gene Expression Omnibus accession number GSE94820, the authors applied the processing 25 .
- CC-BY-NC-ND 4.0 International licensea certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
- The copyright holder for this preprint (which was notthis version posted June 3, 2019.
- Next, the authors followed the general data analysis scheme described at the Seurat package webpage 15 (https://satijalab.org/seurat/get_started_v1_4.html).
- Briefly, the authors used the filtered cell-gene matrix provided by 10x Genomics and imported the data and performed the analysis with the Seurat package.
Backmapping
- In order to compare the transcriptome profiles of monocytes isolated from the dataset derived 5 from GSE94820 (Villani et al., 2017) with the comprehensive PBMC dataset, the authors used the previously introduced canonical correlation alignment to combine datasets (Butler et al., 2018).
- The authors determined the mutual highly variable genes as the overlap of the 4.000 genes from each dataset with highest dispersion.
- The authors treated the different batches of the HCA dataset 25 as individual datasets and normalized them and the expression table of the consensus map .
- CC-BY-NC-ND 4.0 International licensea certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
- First, the authors repeated the steps above but without integration of the new consensus map data.
Data visualization
- In general, the ggplot2 package was used to generate figures (Wickham, 2016).
- 25 .CC-BY-NC-ND 4.0 International licensea certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
- The copyright holder for this preprint (which was notthis version posted June 3, 2019.
QUANTIFICATION AND STATISTICAL ANALYSIS
- Statistical analysis was performed using the R programming language.
- Statistical tests used are described in the figure legend or methods part, respectively.
- Differentially expressed genes have been identified using a Wilcoxon-based test for differential gene expression.
- If not otherwise stated a significance level of 0.1 was applied to adjusted p-values (Benjamini Hochberg).
DATA AND SOFTWARE AVAILABILITY
- Processed and raw scRNA-seq datasets are available through the Gene Expression Omnibus (GSE126422).
- Additional Data tables are provided in form of EXCEL Tables (Data S1, S2) Data Table S1: Data Table S1.csv 10 Gene signatures of the 11 clusters identified in their new scRNA-seq consensus map.
ADDITIONAL RESOURCES
- In addition, the authors provide an interactive web tool to visualize the single-cell RNA-Seq data together with the flow cytometry data at https://paguen.shinyapps.io/DC_MONO/ (external database S1).
- .CC-BY-NC-ND 4.0 International licensea certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
- The copyright holder for this preprint (which was notthis version posted June 3, 2019.
Did you find this useful? Give us your feedback
Citations
427 citations
283 citations
198 citations
59 citations
47 citations
References
216 citations
"A rule-based data-informed cellular..." refers background in this paper
...…markers and results derived from genetic mouse models showing that Ly6chi monocytes (murine equivalents of classical monocytes) can transition into Ly6clow monocytes (murine equivalents of non-classical monocytes) with only a few cells detectable in the transitory state (Mildner et al., 2017)....
[...]
212 citations
"A rule-based data-informed cellular..." refers methods in this paper
...The two previous maps based on single-cell RNA-seq used in our approach as well as a phenotypic analysis of the human blood and tissue myeloid cells were developed to improve our understanding of myeloid cell heterogeneity (Alcantara-Hernandez et al., 2017; See et al., 2017; Villani et al., 2017)....
[...]
...The two previous maps based on single-cell RNA-seq used in our approach as well as a 20 phenotypic analysis of the human blood and tissue myeloid cells were developed to improve our understanding of myeloid cell heterogeneity (Alcantara-Hernandez et al., 2017; See et al., 2017; Villani et al., 2017)....
[...]
180 citations
160 citations
"A rule-based data-informed cellular..." refers background in this paper
...Reliable consensus maps are a prerequisite to reconcile conflicting data that might have been generated based on different data generating approaches (Edney, 2019; Monmonier, 2015)....
[...]
...These iterations improve the precision, accuracy and available content per data point (Edney, 2019; Monmonier, 2015; Ridpath, 2007)....
[...]
146 citations
"A rule-based data-informed cellular..." refers result in this paper
...In conclusion, these analyses demonstrate that map 1 DC5 and map 2 pre-DCs represent, to a large extent, the same pre-DC identities and therefore, might be best named according to already published guidelines (Guilliams et al., 2014; Schlitzer and Ginhoux, 2014) as pre-DCs....
[...]
...In conclusion, these analyses demonstrate that map 1 DC5 and map 2 preDCs represent, to a large extent, the same pre-DC identities and therefore, might be best named according to already published guidelines (Guilliams et al., 2014; Schlitzer and Ginhoux, 2014) as pre-DCs....
[...]