scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Generalizing RNA velocity to transient cell states through dynamical modeling.

03 Aug 2020-Nature Biotechnology (Springer Science and Business Media LLC)-Vol. 38, Iss: 12, pp 1408-1414
TL;DR: ScVelo reconstructs transient cell states and differentiation pathways from single-cell RNA-sequencing data, and infer gene-specific rates of transcription, splicing and degradation, recover each cell’s position in the underlying differentiation processes and detect putative driver genes.
Abstract: RNA velocity has opened up new ways of studying cellular differentiation in single-cell RNA-sequencing data. It describes the rate of gene expression change for an individual gene at a given time point based on the ratio of its spliced and unspliced messenger RNA (mRNA). However, errors in velocity estimates arise if the central assumptions of a common splicing rate and the observation of the full splicing dynamics with steady-state mRNA levels are violated. Here we present scVelo, a method that overcomes these limitations by solving the full transcriptional dynamics of splicing kinetics using a likelihood-based dynamical model. This generalizes RNA velocity to systems with transient cell states, which are common in development and in response to perturbations. We apply scVelo to disentangling subpopulation kinetics in neurogenesis and pancreatic endocrinogenesis. We infer gene-specific rates of transcription, splicing and degradation, recover each cell's position in the underlying differentiation processes and detect putative driver genes. scVelo will facilitate the study of lineage decisions and gene regulation.

Summary (3 min read)

Introduction

  • Single-cell transcriptomics has enabled the unbiased study of biological processes such as cellular differentiation and lineage choice at single cell resolution1,2.
  • These assays are not straightforward to set up and are technically limited in many systems, such as human tissues.
  • After inferring the ratio of unspliced to spliced mRNA abundance that is in a constant transcriptional steady state, velocities are determined as the deviation of the observed ratio from its steady-state ratio.
  • Here, their inferred latent time is able to reconstruct the temporal sequence of transcriptomic events and cellular fates.
  • The authors illustrate its considerable improvement over the steady-state model while being as efficient in computation time.

Results

  • Solving the full gene-wise transcription dynamics at single-cell resolution.
  • Hence, scVelo’s latent time yields faithful gene expression time-courses to delineate dynamical processes, and to extract gene cascades.
  • While they successfully linked transient intermediate states to neuroblast stages and mature granule cells, the commitment of radial glia-like cells could not be conclusively determined.
  • The stochastic steady-state model is capable of capturing the results of the full dynamical model to a greater extent than the deterministic steady-state model (Suppl. Fig. 7a).

Discussion

  • ScVelo enables velocity estimation without assuming either the presence of steady states or a common splicing rate across genes.
  • These assumptions might be violated in practice and can be addressed by extending scVelo towards more complex regulations:.
  • On the gene level, full-length scRNA-seq protocols, such as Smart-seq236, allow accounting for gene structure, alternative splicing and state-dependent degradation rates.
  • This additional readout can be easily included into the dynamical model, incorporating varying labeling lengths as additional prior.
  • Beyond the identification of trajectories and the dynamics of single genes, the dynamic activation of pathways is of central importance.

Methods

  • Preparing the scRNA-seq data for velocity estimation.
  • The authors included samples from two experimental time points, P12 and P35.
  • The raw dataset of pancreatic endocrinogenesis20 has been deposited under the accession number GSE132188.
  • These are the default procedures in scVelo.
  • After velocity estimation, the gene space can be further restricted to genes that pass a minimum threshold for the coefficient of determination (R2, derived from the steady-state model) or gene likelihood (P ((u, s)|(θ, η)), derived from the dynamical model).

Modeling transcriptional dynamics

  • On the basis of the dynamical model of transcription shown in Fig. 1, the authors developed a computational framework for robust and scalable inference of RNA velocity.
  • Assuming splicing and degradation rates to be constant (time-independent), the authors obtain the gene- specific rate equations du(t) dt = α(k)(t)− βu(t), ds(t) dt = βu(t)− γs(t), (1) which describe how the mRNA abundances evolve over time.
  • The time derivative of mature spliced mRNA, termed RNA velocity, is denoted as ν(t) = ds(t)dt .
  • In general, the sampled population is not time-resolved and t is a latent variable.
  • Likewise, the cell’s transcriptional state k is a latent variable that is not known, and the rates α(k), β and γ are usually not experimentally measured.

Steady-state model

  • Under the assumption that the authors observe both transcriptional phases of induction and repression, and that these phases last sufficiently long to reach a transcribing and a silenced (in) steady-state equilibrium, velocity estimation can be simplified as follows:.
  • In steady states, the authors obtain on average a constant transcriptional state where dsdt = 0 which, by solving Eq. 1, yields γ ′ = γβ as the steady-state ratio of unspliced to spliced mRNA.
  • Hence, the ratio can be approximated by a linear regression on these extreme quantiles.
  • Taken together, under this simplified model, velocities are estimated along two simple equations as steady-state deviations.
  • The cumbersome problem of estimating latent time is circumvented.

Model description

  • In recognition that steady states are not always captured and that splicing rates differ between genes, the authors establish a framework that does not rely on these restrictions.
  • Gene activity is orchestrated by transcriptional regulation, implying that gene up- or down-regulation is inscribed by alterations in the state-dependent transcription rate α(k).
  • That is, α(k) can have multiple configurations each encoding one transcriptional state.
  • Not only α(k) but also the initial conditions u(k)0 , s (k) 0 are state-dependent, as well as the time point of switching states t(k)0 .

Parameter inference

  • In the following, the authors consider four phases, induction (k=1), and repression (k=0) each with an associated potential steady state (k=ss1, k=ss0).
  • For latent time, the authors adopt an explicit formula that approximates the optimal time assignment for each cell.
  • Rd predicts the change in gene expression of cell si ∈ Rd. Cell si is expected to have a high probability of transitioning towards cell sj when the corresponding change in gene expression δij = sj−si matches the predicted change according to the velocity vector vi.
  • The resulting similarity matrix π encodes a graph, which the authors refer to as velocity graph.

Validation metrics

  • To validate the coherence of the velocity vector field, the authors define a consistency score for each cell i as the mean correlation of its velocity νi with velocities from neighboring cells, ci = 〈corr(νi, νj)〉j , where cell j is neighboring cell i. (22).
  • To validate the contribution of a selection of genes (e.g. top likelihood-ranked genes) to the overall inferred dynamics, the authors define a reconstructability score as follows:.
  • The velocity graph consisting of correlations between velocities and cell-to-cell transitions (see previous sections), is computed once (i) including all genes yielding π, and once (ii) only including the selection of genes yielding π′.
  • The reconstructability score is defined as the median correlation of outgoing transitions from cell i to all cells that it can potentially transition to, i.e., r = mediani corr(πi, π′i) across all cells i. (23).

Contributions

  • VB designed and developed the method, implemented scVelo and analyzed the data.
  • FJT conceived the study with contributions from VB and FAW.
  • VB, FAW and FJT wrote the manuscript with contributions from the coauthors.
  • SP contributed to developing scVelo, and ML contributed to developing validation metrics.
  • All authors read and approved the final manuscript.

Did you find this useful? Give us your feedback

Figures (3)

Content maybe subject to copyright    Report

October 28, 2019
Generalizing RNA velocity to transient cell states
through dynamical modeling
Volker Bergen
1,2
, Marius Lange
1,2
, Stefan Peidli
2
, F. Alexander Wolf
1*
, Fabian J. Theis
1,2*
1 Institute of Computational Biology, Helmholtz Center Munich, Germany.
2 Department of Mathematics, TU Munich, Germany.
*Corresponding authors: alex.wolf@helmholtz-muenchen.de, fabian.theis@helmholtz-muenchen.de
Abstract
The introduction of RNA velocity in single cells has opened up new ways of studying cellular dif-
ferentiation. The originally proposed framework obtains velocities as the deviation of the observed
ratio of spliced and unspliced mRNA from an inferred steady state. Errors in velocity estimates
arise if the central assumptions of a common splicing rate and the observation of the full splicing
dynamics with steady-state mRNA levels are violated. With scVelo (https://scvelo.org), we
address these restrictions by solving the full transcriptional dynamics of splicing kinetics using a
likelihood-based dynamical model. This generalizes RNA velocity to a wide variety of systems com-
prising transient cell states, which are common in development and in response to perturbations.
We infer gene-specific rates of transcription, splicing and degradation, and recover the latent time
of the underlying cellular processes. This latent time represents the cell’s internal clock and is based
only on its transcriptional dynamics. Moreover, scVelo allows us to identify regimes of regulatory
changes such as stages of cell fate commitment and, therein, systematically detects putative driver
genes. We demonstrate that scVelo enables disentangling heterogeneous subpopulation kinetics with
unprecedented resolution in hippocampal dentate gyrus neurogenesis and pancreatic endocrinogen-
esis. We anticipate that scVelo will greatly facilitate the study of lineage decisions, gene regulation,
and pathway activity identification.
Introduction
Single-cell transcriptomics has enabled the unbiased study of biological processes such as cellular
differentiation and lineage choice at single cell resolution
1,2
. The resulting computational problem
is known as trajectory inference. Starting from a population of cells at different stages of a devel-
opmental process, trajectory inference algorithms aim to reconstruct the developmental sequence
of transcriptional changes leading to potential cell fates. A multitude of such methods have been
developed, commonly modeling the dynamics as the progression of cells along an idealized, poten-
tially branching trajectory
38
. A central challenge in trajectory inference is the destructive nature of
single-cell RNA-seq, which only reveals static snapshots of cellular states. To move from descriptive
towards predictive trajectory models, additional information is required to constrain the space of
possible dynamics that could give rise to the same trajectory
9,10
. As such, lineage-tracing assays can
add information via genetic modification to enable the reconstruction of lineage relationships
1117
.
However, these assays are not straightforward to set up and are technically limited in many systems,
such as human tissues.
The concept of RNA velocity has enabled the recovery of directed dynamic information by lever-
aging the fact that newly transcribed, unspliced pre-mRNAs and mature, spliced mRNAs can be
distinguished in common single-cell RNA-seq protocols, the former detectable by the presence of in-
trons
18
. Assuming a simple per-gene reaction model that relates abundance of unspliced and spliced
mRNA, the change in mRNA abundance, termed RNA velocity, can be inferred. The combination
1
was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (whichthis version posted October 29, 2019. ; https://doi.org/10.1101/820936doi: bioRxiv preprint

of velocities across genes can then be used to estimate the future state of an individual cell. The
original model
18
estimates velocities under the assumption that the transcriptional phases of induc-
tion and repression of gene expression last sufficiently long to reach both an actively transcribing
and an inactive silenced steady-state equilibrium. After inferring the ratio of unspliced to spliced
mRNA abundance that is in a constant transcriptional steady state, velocities are determined as the
deviation of the observed ratio from its steady-state ratio. Inferring the steady-state ratio makes
two fundamental assumptions, namely that (i) on the gene level, the full splicing dynamics with
transcriptional induction, repression and steady-state mRNA levels are captured; and (ii) on the
cellular level, all genes share a common splicing rate. These assumptions are often violated, in par-
ticular when a population comprises multiple heterogeneous subpopulations with different kinetics.
We refer to this modeling approach as the “steady-state model”.
To resolve the above restrictions, we developed scVelo, a likelihood-based dynamical model that
solves the full gene-wise transcriptional dynamics. It thereby generalizes RNA velocity estimation
to transient systems and systems with heterogeneous subpopulation kinetics. We infer the gene-
specific reaction rates of transcription, splicing and degradation, and an underlying gene-shared
latent time in an efficient expectation-maximization framework. The inferred latent time represents
the cell’s internal clock, which accurately describes the cell’s position in the underlying biological
process. In contrast to existing similarity-based pseudotime methods, this latent time is grounded
only on transcriptional dynamics and accounts for speed and direction of motion.
We demonstrate the capabilities of the dynamical model on various cell lineages in hippocampal den-
tate gyrus neurogenesis
19
and pancreatic endocrinogenesis
20
. The dynamical model generally yields
more consistent velocity estimates across neighboring cells and accurately identifies transcriptional
states as opposed to the steady-state model. It provides fine-grained insights into the cell states
of cycling pancreatic endocrine precursor cells, including their lineage commitment, cell-cycle exit,
and finally endocrine cell differentiation. Here, our inferred latent time is able to reconstruct the
temporal sequence of transcriptomic events and cellular fates. Moreover, scVelo identifies regimes
of regulatory changes such as transition states and stages of cell fate commitment. Herein, scVelo
identifies putative driver genes of these transcriptional changes. Driver genes display pronounced dy-
namic behaviour and are systematically detected via their characterization by high likelihoods in the
dynamic model. This procedure presents a dynamics-based alternative to the standard differential
expression paradigm.
Finally, we propose to further account for stochasticity in gene expression, obtained by treating
transcription, splicing and degradation as probabilistic events. We show how this can be achieved
for the steady-state model and demonstrate its capability of capturing the directionality inferred
from the full dynamical model to a large extent. We illustrate its considerable improvement over the
steady-state model while being as efficient in computation time. The dynamical, the stochastic as
well as the steady-state model are available within scVelo as a robust and scalable implementation
(https://scvelo.org). For the latter two scVelo achieves a ten-fold speedup over the original
implementation (velocyto)
18
.
2
was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (whichthis version posted October 29, 2019. ; https://doi.org/10.1101/820936doi: bioRxiv preprint

a
c
α
on/off
β
γ
u (t)
s(t)
transcription
splicing
b
s(t)
t
1 hour
3 hours
steady state
early switch
inference by maximizing joint likelihood
P
(
(u
i
,s
i
)
|
(θ,t
i
,k
i
)
)
learned kinetics
previous iteration
time assignment
t
i
0
1
latent time
u
s
state likelihood
off
steady
on
k
i
state assignment parameter update
θ = (α
on/off
,β,γ)
d
Figure 1 | Solving the full splicing kinetics generalizes RNA velocity to transient populations.
a. Modeling transcriptional dynamics captures transcriptional induction and repression (‘on’ and ‘off phase)
of unspliced pre-mRNAs, their conversion into mature, spliced mRNAs and their eventual degradation.
b. An actively transcribed and an inactive silenced steady state is reached when the transcriptional phases
of induction and repression last sufficiently long, respectively. In particular in transient cell populations,
however, steady states are often not reached as, e.g., induction may terminate before mRNA level saturation,
displaying an ‘early switching’ behavior.
c. We propose scVelo, a likelihood-based model that solves the full gene-wise transcriptional dynamics of
splicing kinetics, which is governed by two sets of parameters: (i) reaction rates of transcription, splicing
and degradation, and (ii) cell-specific latent variables of transcriptional state and time. The parameters
are inferred iteratively via expectation-maximization. For a given estimate of reaction rate parameters, time
points are assigned to each cell by minimizing its distance to the current phase trajectory. The transcriptional
states are assigned by associating a likelihood to respective segments on the trajectory, i.e. induction,
repression, active and inactive steady state.
d. The overall likelihood is then optimized by updating the model parameters of reaction rates. The dashed
purple line links the inferred (unobserved) inactive with the active steady state.
Results
Solving the full gene-wise transcription dynamics at single-cell resolution
As in the original framework
18
, we model transcriptional dynamics (Fig. 1a) using the basic reaction
kinetics described by
du(t)
dt
= α
k
(t) βu(t),
ds(t)
dt
= βu(t) γs(t),
for each gene, independent of all other genes. As opposed to the original framework, to account
for non-observed steady states (Fig. 1b), we solve these equations explicitly and infer the splicing
kinetics that is governed by two sets of parameters: (i) the reaction rates of transcription α
k
(t),
splicing β, and degradation γ; and (ii) cell-specific latent variables, i.e., a discrete transcriptional
state k
i
and a continuous time t
i
, where i represents a single observed cell. The parameters of
the reaction rates can be obtained if the latent variables are given, and vice versa. Hence, we
infer the parameters by expectation-maximization, iteratively estimating the reaction rates and
latent variables via maximum likelihood. In the expectation step, for a given model estimate of
3
was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (whichthis version posted October 29, 2019. ; https://doi.org/10.1101/820936doi: bioRxiv preprint

the unspliced/spliced phase trajectory, X =
ˆu(t), ˆs(t)
t
, a latent time t
i
is assigned to an observed
mRNA value x
i
= (u
i
, s
i
) by minimizing its distance to the phase trajectory X (Fig. 1c). The
transcriptional states k
i
are then assigned by associating a likelihood to respective segments on the
phase trajectory X, i.e., k
i
{on, off, ss
on
, ss
off
} labeling induction, repression, active and inactive
steady states. In the maximization step, the overall likelihood is then optimized by updating the
parameters of reaction rates (Fig. 1d, Supp. Fig. 5, Methods).
The resulting gene-specific trajectory X, parametrized by interpretable parameters of reaction rates
and transcriptional states, explicitly describes how mRNA levels evolve over latent time. While
the steady-state model uses linear regression to fit assumed steady states and fails if these are not
observed, the dynamical model resolves the full dynamics of unspliced and spliced mRNA abundances
and thus enables unobserved steady states to also be faithfully captured (Supp. Fig. 1). RNA velocity
is then explicitly given by the derivative of spliced mRNA abundance, parametrized by the inferred
variables.
In order to make the inferred parameters of reaction rates relatable across genes, the gene-wise
latent times are coupled to a universal, gene-shared latent time that proxies a cell’s internal clock
(Suppl. Fig. 2, Methods). This universal time allows us to resolve the cell’s relative position in
a biological process with support from the splicing dynamics of all genes. Also transcriptional
states can be identified more confidently by sharing information between genes. On simulated
splicing kinetics, latent time is able to reconstruct the underlying real time at near perfect correlation
and correct scale, clearly outperforming pseudotime. In contrast to pseudotime methods
3,21
, our
latent time is grounded on transcriptional dynamics and internally accounts for speed and direction
of motion. Hence, scVelo’s latent time yields faithful gene expression time-courses to delineate
dynamical processes, and to extract gene cascades.
Further, the coupling to a universal latent time allows us to identify the kinetic rates up to a global
gene-shared scale parameter. Employing the overall timescale of the developmental process as prior
information, the absolute values of kinetic rates can eventually be identified (Supp. Fig. 3).
Identifying reaction rates in transient cell populations
To validate the sensitivity of both models with respect to varying parameters in simulated splicing
kinetics, we randomly sampled 2,000 log-normally distributed parameters for each reaction rate and
time events following the Poisson law. The total time spent in a transcriptional state is varied
between two and ten hours.
The ratio inferred by the steady-state model yields a systematic error as the time of transcriptional
induction decreases such that mRNA levels are less likely to reach steady-state equilibrium lev-
els (Suppl. Fig. 3a). By contrast, the dynamical model yields a consistently smaller error and is
completely insensitive with respect to variability in induction duration. Furthermore, the Pearson
correlation between the true and inferred steady-state ratio increases from 0.71 to 0.97 when using
the dynamical model. Imposing the overall timescale of the splicing dynamics of 20 hours as prior
information, the dynamical model reliably recovers the true parameters of the simulated splicing
kinetics, achieving correlations of 0.85 and higher (Supp. Fig. 3b).
Resolving the heterogeneous population kinetics in dentate gyrus development
To test whether scVelo’s velocity estimates allow identification of more complex population kinetics,
we considered a scRNA-seq experiment from the developing mouse dentate gyrus
19
(DG) comprising
two time points (P12 and P35) measured using droplet-based scRNA-seq (10x Genomics Chromium
Single Cell Kit V1, see Methods). The original publication aimed to elucidate the relationship be-
tween developmental and adult dentate gyrus neurogenesis. While they successfully linked transient
4
was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (whichthis version posted October 29, 2019. ; https://doi.org/10.1101/820936doi: bioRxiv preprint

gene likelihood
0
0.6
(log) # genes
a b
c
dynamic-driving genes
α
γ
Tmsb10 likelihood
Tmsb10
Hn1
Ppp3ca
Dlg2
non-dynamic genes
Tcea1 likelihood
Tcea1
Herc2
Rab11fip3
Capn15
Tmsb10
steady-state model
inferred dynamics
steady-state ratio
steady dyn.
dynamical model
u
s
α
γ
steady
dynamical
steady dynamical
steady
dynamical
u
s
Fam155a
0 0.1
0 0.7
u
s
u
s
Granule !
immature
Granule mature
Neuroblast
nIPC
Radial
Glia
Astrocytes
CR
GABA
Endothelial
Microglia
OPC
OL
Mossy
Cck
steady
dyn.
1
0
consistency
scores
Figure 2 | Resolving subpopulation kinetics and identifying dynamical genes in neurogenesis.
a. Velocities derived from the dynamical model for dentate gyrus neurogenesis
19
are projected into a UMAP-
based embedding. The main gene-averaged flow visualized by velocity streamlines corresponds to the granule
lineage, in which neuroblasts develop into granule cells. The remaining populations form distinct cell types
that are either differentiated, e.g., Cajal Retzius (CR) cells, or cell types that form sub-lineages, e.g., the
GABA and oligodendrocyte lineages (OPC to OL). When zooming into the cell types to examine single-cell
velocities, fundamental differences between the velocities derived from the steady-state and dynamical model
become apparent. Only the dynamical model identifies CR cells to be terminal by assigning no velocity
and indicates that OPCs indeed differentiate into OLs. By contrast, the steady-state model displays a high
velocity in CR cells and points OPCs away from OLs. Overall, the dynamical models yields a more coherent
velocity vector field as illustrated by the consistency scores (in the top right corner, defined for each cell as
the correlation of its velocity with the velocities of neighboring cells).
b. Gene-resolved velocities allow further interpreting the inferred directionality on the cellular level. For
instance, Tmsb10 is the major contributor to the gene-averaged flow that describes neuroblasts as differenti-
ating into granule cells. With Fam155a, the incongruous CR velocities from the steady-state model become
evident. By reducing velocity estimation to steady-state deviations, this model is biased to assign high veloc-
ities to outlier cells, such as the CR population. In contrast, the dynamical model assign CR cells to a steady
state with high likelihoods as they are not well explained by the overall kinetics and cannot be confidently
linked to the transient induction state.
c. The dynamical model allows to systematically identify putative driver genes as genes characterized by high
likelihoods. While genes selected by high likelihoods (upper row) display pronounced dynamic behaviour,
expression of low-likelihood genes (lower row) is governed by noise or non-existing transient states.
intermediate states to neuroblast stages and mature granule cells, the commitment of radial glia-like
cells could not be conclusively determined.
After basic preprocessing, we apply both the steady-state and the dynamical model and display
the vector fields using streamline plots
22
in a UMAP-based embedding
23
of the data (Fig. 2a).
The dominating structure is the granule cell lineage, in which neuroblasts develop into granule
5
was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (whichthis version posted October 29, 2019. ; https://doi.org/10.1101/820936doi: bioRxiv preprint

Citations
More filters
Journal ArticleDOI
24 Sep 2020-Nature
TL;DR: The state-of-the-art analyses of large-scale single-cell and single-nucleus transcriptomes are used to construct a cellular atlas of the human heart that will aid further research into cardiac physiology and disease and provides a valuable reference for future studies.
Abstract: Cardiovascular disease is the leading cause of death worldwide. Advanced insights into disease mechanisms and therapeutic strategies require a deeper understanding of the molecular processes involved in the healthy heart. Knowledge of the full repertoire of cardiac cells and their gene expression profiles is a fundamental first step in this endeavour. Here, using state-of-the-art analyses of large-scale single-cell and single-nucleus transcriptomes, we characterize six anatomical adult heart regions. Our results highlight the cellular heterogeneity of cardiomyocytes, pericytes and fibroblasts, and reveal distinct atrial and ventricular subsets of cells with diverse developmental origins and specialized properties. We define the complexity of the cardiac vasculature and its changes along the arterio-venous axis. In the immune compartment, we identify cardiac-resident macrophages with inflammatory and protective transcriptional signatures. Furthermore, analyses of cell-to-cell interactions highlight different networks of macrophages, fibroblasts and cardiomyocytes between atria and ventricles that are distinct from those of skeletal muscle. Our human cardiac cell atlas improves our understanding of the human heart and provides a valuable reference for future studies.

703 citations

Journal ArticleDOI
Dominik Pfister1, Dominik Pfister2, Nicolás Gonzalo Núñez3, Roser Pinyol4, Olivier Govaere5, Matthias Pinter6, Marta Szydlowska1, Revant Gupta7, Mengjie Qiu8, Aleksandra Deczkowska9, Assaf Weiner9, Florian Müller1, Ankit Sinha10, Ankit Sinha11, Ekaterina Friebel3, Thomas Engleitner1, Thomas Engleitner11, Daniela Lenggenhager3, Anja Moncsek3, Danijela Heide1, Kristin Stirm1, Jan Kosla1, Eleni Kotsiliti1, Valentina Leone1, Michael Dudek11, Suhail Yousuf8, Donato Inverso1, Donato Inverso12, Indrabahadur Singh1, Ana Teijeiro, Florian Castet4, Carla Montironi4, Philipp K. Haber13, Dina Tiniakos5, Dina Tiniakos14, Pierre Bedossa5, Simon Cockell5, Ramy Younes15, Ramy Younes5, Michele Vacca16, Fabio Marra17, Jörn M. Schattenberg, Michael Allison16, Elisabetta Bugianesi15, Vlad Ratziu18, Tiziana Pressiani, Antonio D'Alessio, Nicola Personeni19, Lorenza Rimassa19, Ann K. Daly5, Bernhard Scheiner6, Katharina Pomej6, Martha M. Kirstein20, Arndt Vogel20, Markus Peck-Radosavljevic, F. Hucke, Fabian Finkelmeier, Oliver Waidmann, Jörg Trojan, Kornelius Schulze21, Henning Wege21, Sandra Koch22, Arndt Weinmann22, Marco Bueter3, Fabian Rössler3, Alexander Siebenhüner3, Sara De Dosso, Jan-Philipp Mallm1, Viktor Umansky12, Viktor Umansky1, Manfred Jugold1, Tom Luedde23, Andrea Schietinger24, Andrea Schietinger25, Peter Schirmacher8, Brinda Emu1, Hellmut G. Augustin1, Hellmut G. Augustin12, Adrian T. Billeter8, Beat P. Müller-Stich8, Hiroto Kikuchi26, Dan G. Duda26, Fabian Kütting27, Dirk Waldschmidt27, Matthias P. Ebert12, Nuh N. Rahbari12, Henrik E. Mei28, Axel Schulz28, Marc Ringelhan11, Nisar P. Malek, Stephan Spahn, Michael Bitzer, Marina Ruiz de Galarreta13, Amaia Lujambio13, Jean-François Dufour29, Thomas U. Marron13, Thomas U. Marron30, Ahmed Kaseb31, Masatoshi Kudo32, Yi Hsiang Huang33, Yi Hsiang Huang34, Nabil Djouder, Katharina Wolter7, Lars Zender1, Lars Zender7, Parice N. Marche35, Parice N. Marche36, Thomas Decaens36, Thomas Decaens35, David J. Pinato37, Roland Rad11, Roland Rad1, Joachim C. Mertens3, Achim Weber3, Kristian Unger, Felix Meissner10, Susanne Roth8, Zuzana Macek Jilkova36, Zuzana Macek Jilkova35, Zuzana Macek Jilkova37, Manfred Claassen7, Quentin M. Anstee5, Ido Amit9, Percy A. Knolle11, Burkhard Becher3, Josep M. Llovet13, Josep M. Llovet4, Josep M. Llovet38, Mathias Heikenwalder1 
15 Apr 2021-Nature
TL;DR: The progressive accumulation of exhausted, unconventionally activated CD8+PD1+ T cells in NASH-affected livers provides a rationale for stratification of patients with HCC according to underlying aetiology in studies of immunotherapy as a primary or adjuvant treatment.
Abstract: Hepatocellular carcinoma (HCC) can have viral or non-viral causes1-5. Non-alcoholic steatohepatitis (NASH) is an important driver of HCC. Immunotherapy has been approved for treating HCC, but biomarker-based stratification of patients for optimal response to therapy is an unmet need6,7. Here we report the progressive accumulation of exhausted, unconventionally activated CD8+PD1+ T cells in NASH-affected livers. In preclinical models of NASH-induced HCC, therapeutic immunotherapy targeted at programmed death-1 (PD1) expanded activated CD8+PD1+ T cells within tumours but did not lead to tumour regression, which indicates that tumour immune surveillance was impaired. When given prophylactically, anti-PD1 treatment led to an increase in the incidence of NASH-HCC and in the number and size of tumour nodules, which correlated with increased hepatic CD8+PD1+CXCR6+, TOX+, and TNF+ T cells. The increase in HCC triggered by anti-PD1 treatment was prevented by depletion of CD8+ T cells or TNF neutralization, suggesting that CD8+ T cells help to induce NASH-HCC, rather than invigorating or executing immune surveillance. We found similar phenotypic and functional profiles in hepatic CD8+PD1+ T cells from humans with NAFLD or NASH. A meta-analysis of three randomized phase III clinical trials that tested inhibitors of PDL1 (programmed death-ligand 1) or PD1 in more than 1,600 patients with advanced HCC revealed that immune therapy did not improve survival in patients with non-viral HCC. In two additional cohorts, patients with NASH-driven HCC who received anti-PD1 or anti-PDL1 treatment showed reduced overall survival compared to patients with other aetiologies. Collectively, these data show that non-viral HCC, and particularly NASH-HCC, might be less responsive to immunotherapy, probably owing to NASH-related aberrant T cell activation causing tissue damage that leads to impaired immune surveillance. Our data provide a rationale for stratification of patients with HCC according to underlying aetiology in studies of immunotherapy as a primary or adjuvant treatment.

526 citations

Journal ArticleDOI
TL;DR: Slide-seqV2, a technology that enables transcriptome-wide detection of RNAs with a spatial resolution of 10 μm, is reported, which combines improvements in library generation, bead synthesis and array indexing to reach an RNA capture efficiency ~50% that of single-cell RNA-seq data (~10-fold greater than Slide-seq).
Abstract: Measurement of the location of molecules in tissues is essential for understanding tissue formation and function. Previously, we developed Slide-seq, a technology that enables transcriptome-wide detection of RNAs with a spatial resolution of 10 μm. Here we report Slide-seqV2, which combines improvements in library generation, bead synthesis and array indexing to reach an RNA capture efficiency ~50% that of single-cell RNA-seq data (~10-fold greater than Slide-seq), approaching the detection efficiency of droplet-based single-cell RNA-seq techniques. First, we leverage the detection efficiency of Slide-seqV2 to identify dendritically localized mRNAs in neurons of the mouse hippocampus. Second, we integrate the spatial information of Slide-seqV2 data with single-cell trajectory analysis tools to characterize the spatiotemporal development of the mouse neocortex, identifying underlying genetic programs that were poorly sampled with Slide-seq. The combination of near-cellular resolution and high transcript detection efficiency makes Slide-seqV2 useful across many experimental contexts. An improved method for spatial transcriptomics with detection efficiency approaching that of droplet-based single-cell RNA-seq techniques.

435 citations

Journal ArticleDOI
04 Mar 2022-Science
TL;DR: A human reference atlas comprising nearly 500,000 cells from 24 different tissues and organs, many from the same donor, enabled molecular characterization of more than 400 cell types, their distribution across tissues and tissue specific variation in gene expression.
Abstract: Molecular characterization of cell types using single cell transcriptome sequencing is revolutionizing cell biology and enabling new insights into the physiology of human organs. We created a human reference atlas comprising nearly 500,000 cells from 24 different tissues and organs, many from the same donor. This atlas enabled molecular characterization of more than 400 cell types, their distribution across tissues and tissue specific variation in gene expression. Using multiple tissues from a single donor enabled identification of the clonal distribution of T cells between tissues, the tissue specific mutation rate in B cells, and analysis of the cell cycle state and proliferative potential of shared cell types across tissues. Cell type specific RNA splicing was discovered and analyzed across tissues within an individual.

254 citations

Journal ArticleDOI
TL;DR: In this paper, the authors identify various immunophenotypes and associated gene sets that are positively or negatively correlated with T cell expansion following anti-PD1 treatment and shed light on the heterogeneity in treatment response to anti-drugs in breast cancer.
Abstract: Immune-checkpoint blockade (ICB) combined with neoadjuvant chemotherapy improves pathological complete response in breast cancer. To understand why only a subset of tumors respond to ICB, patients with hormone receptor-positive or triple-negative breast cancer were treated with anti-PD1 before surgery. Paired pre- versus on-treatment biopsies from treatment-naive patients receiving anti-PD1 (n = 29) or patients receiving neoadjuvant chemotherapy before anti-PD1 (n = 11) were subjected to single-cell transcriptome, T cell receptor and proteome profiling. One-third of tumors contained PD1-expressing T cells, which clonally expanded upon anti-PD1 treatment, irrespective of tumor subtype. Expansion mainly involved CD8+ T cells with pronounced expression of cytotoxic-activity (PRF1, GZMB), immune-cell homing (CXCL13) and exhaustion markers (HAVCR2, LAG3), and CD4+ T cells characterized by expression of T-helper-1 (IFNG) and follicular-helper (BCL6, CXCR5) markers. In pre-treatment biopsies, the relative frequency of immunoregulatory dendritic cells (PD-L1+), specific macrophage phenotypes (CCR2+ or MMP9+) and cancer cells exhibiting major histocompatibility complex class I/II expression correlated positively with T cell expansion. Conversely, undifferentiated pre-effector/memory T cells (TCF7+, GZMK+) or inhibitory macrophages (CX3CR1+, C3+) were inversely correlated with T cell expansion. Collectively, our data identify various immunophenotypes and associated gene sets that are positively or negatively correlated with T cell expansion following anti-PD1 treatment. We shed light on the heterogeneity in treatment response to anti-PD1 in breast cancer. Transcriptomic and proteomic profiling of breast cancer biopsies identifies baseline features of the tumor immune microenvironment associated with T cell clonal expansion following neoadjuvant anti-PD-1 treatment.

207 citations

References
More filters
Journal ArticleDOI
TL;DR: Matplotlib is a 2D graphics package used for Python for application development, interactive scripting, and publication-quality image generation across user interfaces and operating systems.
Abstract: Matplotlib is a 2D graphics package used for Python for application development, interactive scripting,and publication-quality image generation across user interfaces and operating systems

23,312 citations

Journal ArticleDOI
TL;DR: This work proposes a heuristic method that is shown to outperform all other known community detection methods in terms of computation time and the quality of the communities detected is very good, as measured by the so-called modularity.
Abstract: We propose a simple method to extract the community structure of large networks. Our method is a heuristic method that is based on modularity optimization. It is shown to outperform all other known community detection method in terms of computation time. Moreover, the quality of the communities detected is very good, as measured by the so-called modularity. This is shown first by identifying language communities in a Belgian mobile phone network of 2.6 million customers and by analyzing a web graph of 118 million nodes and more than one billion links. The accuracy of our algorithm is also verified on ad-hoc modular networks. .

13,519 citations

Journal ArticleDOI
TL;DR: In this paper, the authors proposed a simple method to extract the community structure of large networks based on modularity optimization, which is shown to outperform all other known community detection methods in terms of computation time.
Abstract: We propose a simple method to extract the community structure of large networks. Our method is a heuristic method that is based on modularity optimization. It is shown to outperform all other known community detection methods in terms of computation time. Moreover, the quality of the communities detected is very good, as measured by the so-called modularity. This is shown first by identifying language communities in a Belgian mobile phone network of 2 million customers and by analysing a web graph of 118 million nodes and more than one billion links. The accuracy of our algorithm is also verified on ad hoc modular networks.

11,078 citations

Journal ArticleDOI
21 May 2015-Cell
TL;DR: Drop-seq will accelerate biological discovery by enabling routine transcriptional profiling at single-cell resolution by separating them into nanoliter-sized aqueous droplets, associating a different barcode with each cell's RNAs, and sequencing them all together.

5,506 citations

Posted Content
TL;DR: The UMAP algorithm is competitive with t-SNE for visualization quality, and arguably preserves more of the global structure with superior run time performance.
Abstract: UMAP (Uniform Manifold Approximation and Projection) is a novel manifold learning technique for dimension reduction UMAP is constructed from a theoretical framework based in Riemannian geometry and algebraic topology The result is a practical scalable algorithm that applies to real world data The UMAP algorithm is competitive with t-SNE for visualization quality, and arguably preserves more of the global structure with superior run time performance Furthermore, UMAP has no computational restrictions on embedding dimension, making it viable as a general purpose dimension reduction technique for machine learning

5,390 citations

Related Papers (5)
Frequently Asked Questions (17)
Q1. What are the main fates of endocrine commitment in the mouse pancreas?

Endocrine commitment terminates in four major fates: glucagonproducing α-cells, insulin-producing β-cells, somatostatin-producing δ-cells and ghrelin-producing -cells27. 

The introduction of RNA velocity in single cells has opened up new ways of studying cellular differentiation. The authors demonstrate that scVelo enables disentangling heterogeneous subpopulation kinetics with unprecedented resolution in hippocampal dentate gyrus neurogenesis and pancreatic endocrinogenesis. The authors anticipate that scVelo will greatly facilitate the study of lineage decisions, gene regulation, and pathway activity identification. 

Assuming a simple per-gene reaction model that relates abundance of unspliced and spliced mRNA, the change in mRNA abundance, termed RNA velocity, can be inferred. 

Metabolic labeling, e.g. using scSLAM-seq41,42, enables the quantification of total RNA levels together with newly transcribed RNA. 

Endocrine cells are derived from endocrine progenitors located in the pancreatic epithelium, marked by transient expression of the transcription factor Ngn3. 

Inferring the steady-state ratio makes two fundamental assumptions, namely that (i) on the gene level, the full splicing dynamics with transcriptional induction, repression and steady-state mRNA levels are captured; and (ii) on the cellular level, all genes share a common splicing rate. 

By imposing an overall timescale of 20 hours as prior information, the dynamical model recovers the absolute values of the reaction rates. 

The resulting Markov jump process is commonly approximated by moment equations34, which can be solved in closed form in the linear ODE system under consideration. 

The projection of velocities into a lower-dimensional embedding (e.g. UMAP23) for a cell i is obtained on the basis of a transition matrix π̃ (see previous section) which contains probabilities of cell-to-cell transitions that are in accordance with the corresponding velocity vectors,π̃ij = 1zi exp( cos∠(xj − xi,νi)σ2i) ,with row normalization factors zi = ∑j exp( πij σ2i ) and kernel width parameters σi. 

the authors infer the parameters by expectation-maximization, iteratively estimating the reaction rates and latent variables via maximum likelihood. 

The top 2,000 highly variable genes are selected out of those that pass a minimum threshold of 20 expressed counts commonly for spliced and unspliced mRNA. 

After basic preprocessing, the authors apply both the steady-state and the dynamical model and display the vector fields using streamline plots22 in a UMAP-based embedding23 of the data (Fig. 2a). 

With the assumption of the gene-specific σ to be constant across cells within one transcriptional state, and the observations to be i.i.d., the likelihood-based framework is derived in the following. 

In order to make the inferred parameters of reaction rates relatable across genes, the gene-wise latent times are coupled to a universal, gene-shared latent time that proxies a cell’s internal clock (Suppl. Fig. 2, Methods). 

Velocities derived from the dynamical model are more consistent across velocities of neighboring cells than those derived from the steady-state model which results in a higher overall coherence of the velocity vector field (Fig. 2a top right, Supp. Fig. 7).Both the steady-state and the dynamical model yield additional dynamic flow within the mature compartment of granule cells, which was expected to be terminal and may be worthwhile to follow up experimentally. 

As it scales linearly with the number of cells and genes, its runtime is exceeded by velocyto’s quadratic runtime on large cell numbers of 35k and higher. 

The negative log-likelihood to be minimized is given byl(θ) = log( √ 2πσ2) + 12σ2 ∑ i ∥∥∥xti(θ)− xobsi ∥∥∥2 , (7)where θ = (α(k), β, γ).