1
Single-cell mapper (scMappR): using scRNA-seq to infer cell-type specificities of
differentially expressed genes
Dustin J. Sokolowski
1,2*
, Mariela Faykoo-Martinez
2,3
, Lauren Erdman
2,4
, Huayun Hou
1,2
, Cadia
Chan
1,2
, Helen Zhu
5,6
, Melissa M. Holmes
3,7
, Anna Goldenberg
2,4,8,9
, Michael D. Wilson
1,2*
1
Department of Molecular Genetics, University of Toronto, Toronto, ON, M5S 1A8, Canada
2
Genetics and Genome Biology, SickKids Research Institute, Toronto, ON, M5G 0A4, Canada
3
Department of Cell and Systems Biology, University of Toronto, Toronto, ON, M5S 3G5,
Canada
4
Department of Computer Science, University of Toronto, Toronto, ON, M5S 2E4, Canada
5
Department of Medical Biophysics, University of Toronto, Toronto, ON, M5G 1L7, Canada
6
Princess Margaret Cancer Center, University Health Network, Toronto, ON, M5G 2C1, Canada
7
Department of Psychology, University of Toronto Mississauga, Mississauga, ON, L5L 1C6,
8
Vector Institute for Artificial Intelligence, MaRS Centre, Toronto, ON, M5G 1M1
9
CIFAR, MaRS Centre, Toronto, ON, M5G 1M1
*
To whom correspondence should be addressed. Tel :416.813.7654; Fax: ; 416.813.4931| Ext:
328699; Email: dustin.sokolowski@sickkids.ca and michael.wilson@sickkids.ca
Keywords: RNA sequencing, single cell RNA sequencing, Differential gene expression, R package, Cell-
type specificity
.CC-BY 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted August 25, 2020. ; https://doi.org/10.1101/2020.08.24.265298doi: bioRxiv preprint
2
Abstract
RNA sequencing (RNA-seq) is widely used to identify differentially expressed genes (DEGs)
and reveal biological mechanisms underlying complex biological processes. RNA-seq is often
performed on heterogeneous samples and the resulting DEGs do not necessarily indicate the cell
types where the differential expression occurred. While single-cell RNA-seq (scRNA-seq)
methods solve this problem, technical and cost constraints currently limit its widespread use.
Here we present single cell Mapper (scMappR), a method that assigns cell-type specificity scores
to DEGs obtained from bulk RNA-seq by integrating cell-type expression data generated by
scRNA-seq and existing deconvolution methods. After benchmarking scMappR using RNA-seq
data obtained from sorted blood cells, we asked if scMappR could reveal known cell-type
specific changes that occur during kidney regeneration. We found that scMappR appropriately
assigned DEGs to cell-types involved in kidney regeneration, including a relatively small
proportion of immune cells. While scMappR can work with any user supplied scRNA-seq data,
we curated scRNA-seq expression matrices for ~100 human and mouse tissues to facilitate its
use with bulk RNA-seq data alone. Overall, scMappR is a user-friendly R package that
complements traditional differential expression analysis available at CRAN.
Highlights:
• scMappR integrates scRNA-seq and bulk RNA-seq to re-calibrate bulk differentially
expressed genes (DEGs).
• scMappR correctly identified immune-cell expressed DEGs from a bulk RNA-seq analysis of
mouse kidney regeneration.
• scMappR is deployed as a user-friendly R package available at CRAN.
.CC-BY 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted August 25, 2020. ; https://doi.org/10.1101/2020.08.24.265298doi: bioRxiv preprint
3
Introduction
RNA-seq is a powerful and widely-used technology to measure transcript abundance and
structure in biological samples (1). RNA-seq analyses typically compare transcript abundance
between conditions by calculating differentially expressed genes (DEGs) (2, 3). When RNA-seq
of a whole tissue (bulk RNA-seq) is completed, it is often a challenge to determine the extent to
which changes in gene expression are due to changes in cell-type proportion (4). This challenge
is addressed by single-cell RNA-seq (scRNA-seq) methods that measure gene expression at a
single-cell resolution. Despite many advances, technical limitations (e.g., low gene detection per
cell, cell dissociation optimization) and cost currently limit the use of scRNA-seq for hard-to-
dissociate cell types and large study designs (5, 6). Importantly, bioinformatics methods that
integrate bulk RNA-seq and scRNA-seq demonstrate the highly complementary nature of these
two technologies (7–16).
Single cell RNA-seq experiments readily indicate combinations of genes that are
involved in the biological functions altered in an experiment or clinical condition. The value of
these data is reflected in the growing number of repositories containing publicly available
reprocessed scRNA-seq data such as PanglaoDB (17), scRNAseqDB (18), SCPortalen (19),
Single Cell Expression Atlas (20) and the Human Cell Atlas (21), which allow for a consistent,
tissue-aware reference to the cell-type specificity of individual genes. Indeed, such datasets can
be used to interrogate cell-type specific gene expression and enhance bulk RNA-seq analyses in
the absence of a matched scRNA-seq experiment (12, 22).
Several methods exist to integrate bulk RNA-seq and scRNA-seq, with the most common
class of tools being cell-type deconvolution (12, 14, 15, 23–25). Cell-type deconvolution
.CC-BY 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted August 25, 2020. ; https://doi.org/10.1101/2020.08.24.265298doi: bioRxiv preprint
4
leverages cell-type specific expression within a scRNA-seq dataset to estimate the relative cell-
type proportions within a bulk RNA-seq sample. Estimated cell-type proportions can then be
directly compared between conditions to identify alterations in cell-type composition (26, 27).
Bioinformatic tools such as csSAM (4) and subsequently released Bseq-sc (28) utilize estimated
cell-type proportions to identify DEGs that were not considered differentially expressed with
bulk differential analysis alone (2, 3, 29). While powerful, these tools require a larger number of
samples than is typically performed in exploratory studies looking for DEGs (4, 28). For this
reason, new methods that leverage scRNA-seq to interpret the results from typical bulk-RNA-seq
experiments are of value, especially considering the growing number of scRNA-seq reference
datasets.
Here we present a tool called single-cell mapper (scMappR) that is designed to infer
which cell-types are responsible for DEGs generated using common bulk RNA-seq experimental
designs. The purpose of scMappR is to assign cell-type specificity scores to DEGs previously
obtained from bulk RNA-seq experiments. Starting with a reference scRNA-seq dataset,
scMappR integrates cell-type proportions and cell-type specific expression to compute and
visualize the putative cell-type origins of DEGs identified in bulk RNA-seq analysis. We first
demonstrate that scMappR can identify validated cell-type specific gene expression by taking
advantage of a reference data set (23) where bulk RNA-seq was performed on cell-sorted
samples. We show that scMappR can identify bonafide differential gene expression changes
emanating from a minority cell population present in the mouse kidney during regeneration (13).
Overall, scMappR is a freely available R package (available on CRAN) that provides important
cell-type specificity to a set of user-provided DEGs.
.CC-BY 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted August 25, 2020. ; https://doi.org/10.1101/2020.08.24.265298doi: bioRxiv preprint
5
Materials and methods
R Package: scMappR
We built an R package which we call scMappR to compute and visualize the roles that
different cell-types play upon the identification of DEGs. scMappR contains the bioinformatic
pipeline to process scRNA-seq data from a count matrix to formats compatible with scMappR.
scMappR is currently stored on CRAN (https://cran.r-
project.org/web/packages/scMappR/index.html). Reprocessed scRNA-seq cell-type matrices are
stored in a separate Github repository (https://github.com/wilsonlabgroup/scMappR_Data).
Computation and visualization of cell-type contextualized DEGs and cell-type specific
pathway analysis
scMappR combines differential expression, cell-type expression, and cell-type
proportions to generate cell-weighted fold-changes (cwFold-changes,
𝑐𝑤𝛥
). Specifically,
scMappR reweighs the fold-changes of bulk DE (
𝛥
) genes by the fold-change of cell-type
specificity (e.g., cell-type vs. other cell-types) identified in the reference scRNA-seq dataset
(
𝜉
), and estimated cell-type proportion. These proportions are estimated through RNA-seq
deconvolution with the inputted gene’s expression removed from the count and signature
matrices. A signature matrix is defined as a gene-by-cell-type matrix populated with the relative
expression of a gene in each cell-type. Cell-type proportions (
𝜋
) are estimated with
DeconRNAseq (15) and cell-types with >1% of cell-type proportions are used in subsequent
analyses (Figure 1). Then, estimated cell-type proportions are made independent from the cell-
.CC-BY 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted August 25, 2020. ; https://doi.org/10.1101/2020.08.24.265298doi: bioRxiv preprint