scispace - formally typeset
Open AccessPosted ContentDOI

Single-cell mapper (scMappR): using scRNA-seq to infer cell-type specificities of differentially expressed genes

Reads0
Chats0
TLDR
It is found that scMappR appropriately assigned DEGs to cell-types involved in kidney regeneration, including a relatively small proportion of immune cells, as well as correctly identified immune-cell expressed DEGs from a bulk RNA-seq analysis of mouse kidney regeneration.
Abstract
RNA sequencing (RNA-seq) is widely used to identify differentially expressed genes (DEGs) and reveal biological mechanisms underlying complex biological processes. RNA-seq is often performed on heterogeneous samples and the resulting DEGs do not necessarily indicate the cell types where the differential expression occurred. While single-cell RNA-seq (scRNA-seq) methods solve this problem, technical and cost constraints currently limit its widespread use. Here we present single cell Mapper (scMappR), a method that assigns cell-type specificity scores to DEGs obtained from bulk RNA-seq by integrating cell-type expression data generated by scRNA-seq and existing deconvolution methods. After benchmarking scMappR using RNA-seq data obtained from sorted blood cells, we asked if scMappR could reveal known cell-type specific changes that occur during kidney regeneration. We found that scMappR appropriately assigned DEGs to cell-types involved in kidney regeneration, including a relatively small proportion of immune cells. While scMappR can work with any user supplied scRNA-seq data, we curated scRNA-seq expression matrices for ∼100 human and mouse tissues to facilitate its use with bulk RNA-seq data alone. Overall, scMappR is a user-friendly R package that complements traditional differential expression analysis available at CRAN. Highlights scMappR integrates scRNA-seq and bulk RNA-seq to re-calibrate bulk differentially expressed genes (DEGs). scMappR correctly identified immune-cell expressed DEGs from a bulk RNA-seq analysis of mouse kidney regeneration. scMappR is deployed as a user-friendly R package available at CRAN.

read more

Content maybe subject to copyright    Report

1
Single-cell mapper (scMappR): using scRNA-seq to infer cell-type specificities of
differentially expressed genes
Dustin J. Sokolowski
1,2*
, Mariela Faykoo-Martinez
2,3
, Lauren Erdman
2,4
, Huayun Hou
1,2
, Cadia
Chan
1,2
, Helen Zhu
5,6
, Melissa M. Holmes
3,7
, Anna Goldenberg
2,4,8,9
, Michael D. Wilson
1,2*
1
Department of Molecular Genetics, University of Toronto, Toronto, ON, M5S 1A8, Canada
2
Genetics and Genome Biology, SickKids Research Institute, Toronto, ON, M5G 0A4, Canada
3
Department of Cell and Systems Biology, University of Toronto, Toronto, ON, M5S 3G5,
Canada
4
Department of Computer Science, University of Toronto, Toronto, ON, M5S 2E4, Canada
5
Department of Medical Biophysics, University of Toronto, Toronto, ON, M5G 1L7, Canada
6
Princess Margaret Cancer Center, University Health Network, Toronto, ON, M5G 2C1, Canada
7
Department of Psychology, University of Toronto Mississauga, Mississauga, ON, L5L 1C6,
8
Vector Institute for Artificial Intelligence, MaRS Centre, Toronto, ON, M5G 1M1
9
CIFAR, MaRS Centre, Toronto, ON, M5G 1M1
*
To whom correspondence should be addressed. Tel :416.813.7654; Fax: ; 416.813.4931| Ext:
328699; Email: dustin.sokolowski@sickkids.ca and michael.wilson@sickkids.ca
Keywords: RNA sequencing, single cell RNA sequencing, Differential gene expression, R package, Cell-
type specificity
.CC-BY 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted August 25, 2020. ; https://doi.org/10.1101/2020.08.24.265298doi: bioRxiv preprint

2
Abstract
RNA sequencing (RNA-seq) is widely used to identify differentially expressed genes (DEGs)
and reveal biological mechanisms underlying complex biological processes. RNA-seq is often
performed on heterogeneous samples and the resulting DEGs do not necessarily indicate the cell
types where the differential expression occurred. While single-cell RNA-seq (scRNA-seq)
methods solve this problem, technical and cost constraints currently limit its widespread use.
Here we present single cell Mapper (scMappR), a method that assigns cell-type specificity scores
to DEGs obtained from bulk RNA-seq by integrating cell-type expression data generated by
scRNA-seq and existing deconvolution methods. After benchmarking scMappR using RNA-seq
data obtained from sorted blood cells, we asked if scMappR could reveal known cell-type
specific changes that occur during kidney regeneration. We found that scMappR appropriately
assigned DEGs to cell-types involved in kidney regeneration, including a relatively small
proportion of immune cells. While scMappR can work with any user supplied scRNA-seq data,
we curated scRNA-seq expression matrices for ~100 human and mouse tissues to facilitate its
use with bulk RNA-seq data alone. Overall, scMappR is a user-friendly R package that
complements traditional differential expression analysis available at CRAN.
Highlights:
scMappR integrates scRNA-seq and bulk RNA-seq to re-calibrate bulk differentially
expressed genes (DEGs).
scMappR correctly identified immune-cell expressed DEGs from a bulk RNA-seq analysis of
mouse kidney regeneration.
scMappR is deployed as a user-friendly R package available at CRAN.
.CC-BY 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted August 25, 2020. ; https://doi.org/10.1101/2020.08.24.265298doi: bioRxiv preprint

3
Introduction
RNA-seq is a powerful and widely-used technology to measure transcript abundance and
structure in biological samples (1). RNA-seq analyses typically compare transcript abundance
between conditions by calculating differentially expressed genes (DEGs) (2, 3). When RNA-seq
of a whole tissue (bulk RNA-seq) is completed, it is often a challenge to determine the extent to
which changes in gene expression are due to changes in cell-type proportion (4). This challenge
is addressed by single-cell RNA-seq (scRNA-seq) methods that measure gene expression at a
single-cell resolution. Despite many advances, technical limitations (e.g., low gene detection per
cell, cell dissociation optimization) and cost currently limit the use of scRNA-seq for hard-to-
dissociate cell types and large study designs (5, 6). Importantly, bioinformatics methods that
integrate bulk RNA-seq and scRNA-seq demonstrate the highly complementary nature of these
two technologies (7–16).
Single cell RNA-seq experiments readily indicate combinations of genes that are
involved in the biological functions altered in an experiment or clinical condition. The value of
these data is reflected in the growing number of repositories containing publicly available
reprocessed scRNA-seq data such as PanglaoDB (17), scRNAseqDB (18), SCPortalen (19),
Single Cell Expression Atlas (20) and the Human Cell Atlas (21), which allow for a consistent,
tissue-aware reference to the cell-type specificity of individual genes. Indeed, such datasets can
be used to interrogate cell-type specific gene expression and enhance bulk RNA-seq analyses in
the absence of a matched scRNA-seq experiment (12, 22).
Several methods exist to integrate bulk RNA-seq and scRNA-seq, with the most common
class of tools being cell-type deconvolution (12, 14, 15, 23–25). Cell-type deconvolution
.CC-BY 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted August 25, 2020. ; https://doi.org/10.1101/2020.08.24.265298doi: bioRxiv preprint

4
leverages cell-type specific expression within a scRNA-seq dataset to estimate the relative cell-
type proportions within a bulk RNA-seq sample. Estimated cell-type proportions can then be
directly compared between conditions to identify alterations in cell-type composition (26, 27).
Bioinformatic tools such as csSAM (4) and subsequently released Bseq-sc (28) utilize estimated
cell-type proportions to identify DEGs that were not considered differentially expressed with
bulk differential analysis alone (2, 3, 29). While powerful, these tools require a larger number of
samples than is typically performed in exploratory studies looking for DEGs (4, 28). For this
reason, new methods that leverage scRNA-seq to interpret the results from typical bulk-RNA-seq
experiments are of value, especially considering the growing number of scRNA-seq reference
datasets.
Here we present a tool called single-cell mapper (scMappR) that is designed to infer
which cell-types are responsible for DEGs generated using common bulk RNA-seq experimental
designs. The purpose of scMappR is to assign cell-type specificity scores to DEGs previously
obtained from bulk RNA-seq experiments. Starting with a reference scRNA-seq dataset,
scMappR integrates cell-type proportions and cell-type specific expression to compute and
visualize the putative cell-type origins of DEGs identified in bulk RNA-seq analysis. We first
demonstrate that scMappR can identify validated cell-type specific gene expression by taking
advantage of a reference data set (23) where bulk RNA-seq was performed on cell-sorted
samples. We show that scMappR can identify bonafide differential gene expression changes
emanating from a minority cell population present in the mouse kidney during regeneration (13).
Overall, scMappR is a freely available R package (available on CRAN) that provides important
cell-type specificity to a set of user-provided DEGs.
.CC-BY 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted August 25, 2020. ; https://doi.org/10.1101/2020.08.24.265298doi: bioRxiv preprint

5
Materials and methods
R Package: scMappR
We built an R package which we call scMappR to compute and visualize the roles that
different cell-types play upon the identification of DEGs. scMappR contains the bioinformatic
pipeline to process scRNA-seq data from a count matrix to formats compatible with scMappR.
scMappR is currently stored on CRAN (https://cran.r-
project.org/web/packages/scMappR/index.html). Reprocessed scRNA-seq cell-type matrices are
stored in a separate Github repository (https://github.com/wilsonlabgroup/scMappR_Data).
Computation and visualization of cell-type contextualized DEGs and cell-type specific
pathway analysis
scMappR combines differential expression, cell-type expression, and cell-type
proportions to generate cell-weighted fold-changes (cwFold-changes,
𝑐𝑤𝛥
). Specifically,
scMappR reweighs the fold-changes of bulk DE (
𝛥
) genes by the fold-change of cell-type
specificity (e.g., cell-type vs. other cell-types) identified in the reference scRNA-seq dataset
(
𝜉
), and estimated cell-type proportion. These proportions are estimated through RNA-seq
deconvolution with the inputted gene’s expression removed from the count and signature
matrices. A signature matrix is defined as a gene-by-cell-type matrix populated with the relative
expression of a gene in each cell-type. Cell-type proportions (
𝜋
) are estimated with
DeconRNAseq (15) and cell-types with >1% of cell-type proportions are used in subsequent
analyses (Figure 1). Then, estimated cell-type proportions are made independent from the cell-
.CC-BY 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted August 25, 2020. ; https://doi.org/10.1101/2020.08.24.265298doi: bioRxiv preprint

Citations
More filters
Journal ArticleDOI

A single-cell atlas of the normal and malformed human brain vasculature

TL;DR: The authors used spatial transcriptomics to reveal the geographical organization of an unexpectedly diverse array of molecularly defined cell types within the human brain, and explored the cellular and molecular alterations that occur in arteriovenous malformations, a leading cause of stroke in young people.
Journal ArticleDOI

Postnatal developmental trajectory of sex-biased gene expression in the mouse pituitary gland

TL;DR: In this paper , the identity and postnatal developmental trajectory of sex-biased gene expression in the mouse pituitary was revealed by using 3' untranslated region sequencing and small RNA sequencing to ascertain gene and microRNA expression, respectively, across five postnatal ages (postnatal days 12, 22, 27, 32, 37, 37) that span the pubertal transition in female and male C57BL/6J mice pituitaries (n = 5−6 biological replicates for each sex at each age).

Renal endothelial injury and microvascular dysfunction in acute kidney injury

TL;DR: The renal endothelium is comprised of heterogeneous cell populations that function together to perform a number of tightly controlled, complex and interdependent processes, such as vascular tone, regulation of blood flow to local tissue beds, modulation of coagulation and inflammation, and vascular permeability.
References
More filters
Journal ArticleDOI

Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2

TL;DR: This work presents DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates, which enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression.
Journal ArticleDOI

STAR: ultrafast universal RNA-seq aligner

TL;DR: The Spliced Transcripts Alignment to a Reference (STAR) software based on a previously undescribed RNA-seq alignment algorithm that uses sequential maximum mappable seed search in uncompressed suffix arrays followed by seed clustering and stitching procedure outperforms other aligners by a factor of >50 in mapping speed.
Book

ggplot2: Elegant Graphics for Data Analysis

TL;DR: This book describes ggplot2, a new data visualization package for R that uses the insights from Leland Wilkisons Grammar of Graphics to create a powerful and flexible system for creating data graphics.
Journal ArticleDOI

edgeR: a Bioconductor package for differential expression analysis of digital gene expression data.

TL;DR: EdgeR as mentioned in this paper is a Bioconductor software package for examining differential expression of replicated count data, which uses an overdispersed Poisson model to account for both biological and technical variability and empirical Bayes methods are used to moderate the degree of overdispersion across transcripts, improving the reliability of inference.
Related Papers (5)
Frequently Asked Questions (1)
Q1. What are the contributions in "Single-cell mapper (scmappr): using scrna-seq to infer cell-type specificities of differentially expressed genes" ?

In this paper, a single-cell mapper ( scMappR ) is proposed to identify validated cell-type specific gene expression changes by taking advantage of a reference data set performed on mouse cell-specific data.