Journal Article•DOI•

Generalizing RNA velocity to transient cell states through dynamical modeling.

Volker Bergen¹, Marius Lange¹, Stefan Peidli¹, F. Alexander Wolf, Fabian J. Theis¹ - Show less +1 more•Institutions (1)

03 Aug 2020-Nature Biotechnology (Springer Science and Business Media LLC)-Vol. 38, Iss: 12, pp 1408-1414

TL;DR: ScVelo reconstructs transient cell states and differentiation pathways from single-cell RNA-sequencing data, and infer gene-specific rates of transcription, splicing and degradation, recover each cell’s position in the underlying differentiation processes and detect putative driver genes.

read less

Abstract: RNA velocity has opened up new ways of studying cellular differentiation in single-cell RNA-sequencing data. It describes the rate of gene expression change for an individual gene at a given time point based on the ratio of its spliced and unspliced messenger RNA (mRNA). However, errors in velocity estimates arise if the central assumptions of a common splicing rate and the observation of the full splicing dynamics with steady-state mRNA levels are violated. Here we present scVelo, a method that overcomes these limitations by solving the full transcriptional dynamics of splicing kinetics using a likelihood-based dynamical model. This generalizes RNA velocity to systems with transient cell states, which are common in development and in response to perturbations. We apply scVelo to disentangling subpopulation kinetics in neurogenesis and pancreatic endocrinogenesis. We infer gene-specific rates of transcription, splicing and degradation, recover each cell's position in the underlying differentiation processes and detect putative driver genes. scVelo will facilitate the study of lineage decisions and gene regulation.

...read moreread less

Summary (3 min read)

Jump to: [Introduction] – [Results] – [Discussion] – [Methods] – [Modeling transcriptional dynamics] – [Steady-state model] – [Model description] – [Parameter inference] – [Validation metrics] and [Contributions]

Introduction

Single-cell transcriptomics has enabled the unbiased study of biological processes such as cellular differentiation and lineage choice at single cell resolution1,2.
These assays are not straightforward to set up and are technically limited in many systems, such as human tissues.
After inferring the ratio of unspliced to spliced mRNA abundance that is in a constant transcriptional steady state, velocities are determined as the deviation of the observed ratio from its steady-state ratio.
Here, their inferred latent time is able to reconstruct the temporal sequence of transcriptomic events and cellular fates.
The authors illustrate its considerable improvement over the steady-state model while being as efficient in computation time.

Results

Solving the full gene-wise transcription dynamics at single-cell resolution.
Hence, scVelo’s latent time yields faithful gene expression time-courses to delineate dynamical processes, and to extract gene cascades.
While they successfully linked transient intermediate states to neuroblast stages and mature granule cells, the commitment of radial glia-like cells could not be conclusively determined.
The stochastic steady-state model is capable of capturing the results of the full dynamical model to a greater extent than the deterministic steady-state model (Suppl. Fig. 7a).

Discussion

ScVelo enables velocity estimation without assuming either the presence of steady states or a common splicing rate across genes.
These assumptions might be violated in practice and can be addressed by extending scVelo towards more complex regulations:.
On the gene level, full-length scRNA-seq protocols, such as Smart-seq236, allow accounting for gene structure, alternative splicing and state-dependent degradation rates.
This additional readout can be easily included into the dynamical model, incorporating varying labeling lengths as additional prior.
Beyond the identification of trajectories and the dynamics of single genes, the dynamic activation of pathways is of central importance.

Methods

Preparing the scRNA-seq data for velocity estimation.
The authors included samples from two experimental time points, P12 and P35.
The raw dataset of pancreatic endocrinogenesis20 has been deposited under the accession number GSE132188.
These are the default procedures in scVelo.
After velocity estimation, the gene space can be further restricted to genes that pass a minimum threshold for the coefficient of determination (R2, derived from the steady-state model) or gene likelihood (P ((u, s)|(θ, η)), derived from the dynamical model).

Modeling transcriptional dynamics

On the basis of the dynamical model of transcription shown in Fig. 1, the authors developed a computational framework for robust and scalable inference of RNA velocity.
Assuming splicing and degradation rates to be constant (time-independent), the authors obtain the gene- specific rate equations du(t) dt = α(k)(t)− βu(t), ds(t) dt = βu(t)− γs(t), (1) which describe how the mRNA abundances evolve over time.
The time derivative of mature spliced mRNA, termed RNA velocity, is denoted as ν(t) = ds(t)dt .
In general, the sampled population is not time-resolved and t is a latent variable.
Likewise, the cell’s transcriptional state k is a latent variable that is not known, and the rates α(k), β and γ are usually not experimentally measured.

Steady-state model

Under the assumption that the authors observe both transcriptional phases of induction and repression, and that these phases last sufficiently long to reach a transcribing and a silenced (in) steady-state equilibrium, velocity estimation can be simplified as follows:.
In steady states, the authors obtain on average a constant transcriptional state where dsdt = 0 which, by solving Eq. 1, yields γ ′ = γβ as the steady-state ratio of unspliced to spliced mRNA.
Hence, the ratio can be approximated by a linear regression on these extreme quantiles.
Taken together, under this simplified model, velocities are estimated along two simple equations as steady-state deviations.
The cumbersome problem of estimating latent time is circumvented.

Model description

In recognition that steady states are not always captured and that splicing rates differ between genes, the authors establish a framework that does not rely on these restrictions.
Gene activity is orchestrated by transcriptional regulation, implying that gene up- or down-regulation is inscribed by alterations in the state-dependent transcription rate α(k).
That is, α(k) can have multiple configurations each encoding one transcriptional state.
Not only α(k) but also the initial conditions u(k)0 , s (k) 0 are state-dependent, as well as the time point of switching states t(k)0 .

Parameter inference

In the following, the authors consider four phases, induction (k=1), and repression (k=0) each with an associated potential steady state (k=ss1, k=ss0).
For latent time, the authors adopt an explicit formula that approximates the optimal time assignment for each cell.
Rd predicts the change in gene expression of cell si ∈ Rd. Cell si is expected to have a high probability of transitioning towards cell sj when the corresponding change in gene expression δij = sj−si matches the predicted change according to the velocity vector vi.
The resulting similarity matrix π encodes a graph, which the authors refer to as velocity graph.

Validation metrics

To validate the coherence of the velocity vector field, the authors define a consistency score for each cell i as the mean correlation of its velocity νi with velocities from neighboring cells, ci = 〈corr(νi, νj)〉j , where cell j is neighboring cell i. (22).
To validate the contribution of a selection of genes (e.g. top likelihood-ranked genes) to the overall inferred dynamics, the authors define a reconstructability score as follows:.
The velocity graph consisting of correlations between velocities and cell-to-cell transitions (see previous sections), is computed once (i) including all genes yielding π, and once (ii) only including the selection of genes yielding π′.
The reconstructability score is defined as the median correlation of outgoing transitions from cell i to all cells that it can potentially transition to, i.e., r = mediani corr(πi, π′i) across all cells i. (23).

Contributions

VB designed and developed the method, implemented scVelo and analyzed the data.
FJT conceived the study with contributions from VB and FAW.
VB, FAW and FJT wrote the manuscript with contributions from the coauthors.
SP contributed to developing scVelo, and ML contributed to developing validation metrics.
All authors read and approved the final manuscript.

Did you find this useful? Give us your feedback

Figures (3)

Figure 1 | Solving the full splicing kinetics generalizes RNA velocity to transient populations. a. Modeling transcriptional dynamics captures transcriptional induction and repression (‘on’ and ‘off’ phase) of unspliced pre-mRNAs, their conversion into mature, spliced mRNAs and their eventual degradation. b. An actively transcribed and an inactive silenced steady state is reached when the transcriptional phases of induction and repression last sufficiently long, respectively. In particular in transient cell populations, however, steady states are often not reached as, e.g., induction may terminate before mRNA level saturation, displaying an ‘early switching’ behavior. c. We propose scVelo, a likelihood-based model that solves the full gene-wise transcriptional dynamics of splicing kinetics, which is governed by two sets of parameters: (i) reaction rates of transcription, splicing and degradation, and (ii) cell-specific latent variables of transcriptional state and time. The parameters are inferred iteratively via expectation-maximization. For a given estimate of reaction rate parameters, time points are assigned to each cell by minimizing its distance to the current phase trajectory. The transcriptional states are assigned by associating a likelihood to respective segments on the trajectory, i.e. induction, repression, active and inactive steady state. d. The overall likelihood is then optimized by updating the model parameters of reaction rates. The dashed purple line links the inferred (unobserved) inactive with the active steady state.

Figure 2 | Resolving subpopulation kinetics and identifying dynamical genes in neurogenesis. a. Velocities derived from the dynamical model for dentate gyrus neurogenesis19 are projected into a UMAPbased embedding. The main gene-averaged flow visualized by velocity streamlines corresponds to the granule lineage, in which neuroblasts develop into granule cells. The remaining populations form distinct cell types that are either differentiated, e.g., Cajal Retzius (CR) cells, or cell types that form sub-lineages, e.g., the GABA and oligodendrocyte lineages (OPC to OL). When zooming into the cell types to examine single-cell velocities, fundamental differences between the velocities derived from the steady-state and dynamical model become apparent. Only the dynamical model identifies CR cells to be terminal by assigning no velocity and indicates that OPCs indeed differentiate into OLs. By contrast, the steady-state model displays a high velocity in CR cells and points OPCs away from OLs. Overall, the dynamical models yields a more coherent velocity vector field as illustrated by the consistency scores (in the top right corner, defined for each cell as the correlation of its velocity with the velocities of neighboring cells). b. Gene-resolved velocities allow further interpreting the inferred directionality on the cellular level. For instance, Tmsb10 is the major contributor to the gene-averaged flow that describes neuroblasts as differentiating into granule cells. With Fam155a, the incongruous CR velocities from the steady-state model become evident. By reducing velocity estimation to steady-state deviations, this model is biased to assign high velocities to outlier cells, such as the CR population. In contrast, the dynamical model assign CR cells to a steady state with high likelihoods as they are not well explained by the overall kinetics and cannot be confidently linked to the transient induction state. c. The dynamical model allows to systematically identify putative driver genes as genes characterized by high likelihoods. While genes selected by high likelihoods (upper row) display pronounced dynamic behaviour, expression of low-likelihood genes (lower row) is governed by noise or non-existing transient states.

Figure 3 | Delineating cycling progenitors, lineage commitment and disentangling cell fates and regimes of transcriptional activity through latent time in pancreatic endocrinogenesis. a. Velocities derived from the dynamical model for pancreatic endocrinogenesis20 are visualized as streamlines in a UMAP-based embedding. The dynamical model accurately delineates the cycling population of endocrine progenitors (EP), their lineage commitment, cell-cycle exit, and endocrine differentiation. Inferred S and G2M phases based on cell-cycle scores affirms the cell cycle identified by the dynamical model. b. The steady-state model does not capture the cycle and yields incongruous backflows directed against the lineage in later endocrine stages. c. Single-gene velocities illustrate the limitations of the steady-state model. Incongruous backflows in α-cells can be traced back to false state identifications, e.g., in Cpe assigning α-cells in parts to both induction and repression phase. d. scVelo’s latent time is based only on transcriptional dynamics and represents the cell’s internal clock. It captures aspects of the actual time better than similarity-based diffusion pseudotime, as observed in the chronology of endocrine cell fates: α-cells are produced earlier in actual time (prior to E12.5) while β-cells are produced later (E12.5 – E15.5). While latent time enables the temporal relation of the two fates, pseudotime does not distinguish their temporal position. e. By using latent time to infer and count switching points between transcriptional states (e.g., from induction to homeostasis), lineage commitment and branching points become apparent. f. Gene expression dynamics resolved along latent time shows a clear cascade of transcription in the top 300 likelihood-ranked genes. g. Putative driver genes are identified by high likelihoods. Phase portraits (top) and expression dynamics along latent time (bottom) for these driver genes characterize their activity. While Actn4 switches at cycleexit and endocrine commitment, the three other genes switch or start to express at the branching points.

Content maybe subject to copyright Report

October 28, 2019

Generalizing RNA velocity to transient cell states

through dynamical modeling

Volker Bergen

1,2

, Marius Lange

1,2

, Stefan Peidli

, F. Alexander Wolf

, Fabian J. Theis

1,2*

1 Institute of Computational Biology, Helmholtz Center Munich, Germany.

2 Department of Mathematics, TU Munich, Germany.

*Corresponding authors: alex.wolf@helmholtz-muenchen.de, fabian.theis@helmholtz-muenchen.de

Abstract

The introduction of RNA velocity in single cells has opened up new ways of studying cellular dif-

ferentiation. The originally proposed framework obtains velocities as the deviation of the observed

ratio of spliced and unspliced mRNA from an inferred steady state. Errors in velocity estimates

arise if the central assumptions of a common splicing rate and the observation of the full splicing

dynamics with steady-state mRNA levels are violated. With scVelo (https://scvelo.org), we

address these restrictions by solving the full transcriptional dynamics of splicing kinetics using a

likelihood-based dynamical model. This generalizes RNA velocity to a wide variety of systems com-

prising transient cell states, which are common in development and in response to perturbations.

We infer gene-speciﬁc rates of transcription, splicing and degradation, and recover the latent time

of the underlying cellular processes. This latent time represents the cell’s internal clock and is based

only on its transcriptional dynamics. Moreover, scVelo allows us to identify regimes of regulatory

changes such as stages of cell fate commitment and, therein, systematically detects putative driver

genes. We demonstrate that scVelo enables disentangling heterogeneous subpopulation kinetics with

unprecedented resolution in hippocampal dentate gyrus neurogenesis and pancreatic endocrinogen-

esis. We anticipate that scVelo will greatly facilitate the study of lineage decisions, gene regulation,

and pathway activity identiﬁcation.

Introduction

Single-cell transcriptomics has enabled the unbiased study of biological processes such as cellular

diﬀerentiation and lineage choice at single cell resolution

1,2

. The resulting computational problem

is known as trajectory inference. Starting from a population of cells at diﬀerent stages of a devel-

opmental process, trajectory inference algorithms aim to reconstruct the developmental sequence

of transcriptional changes leading to potential cell fates. A multitude of such methods have been

developed, commonly modeling the dynamics as the progression of cells along an idealized, poten-

tially branching trajectory

3–8

. A central challenge in trajectory inference is the destructive nature of

single-cell RNA-seq, which only reveals static snapshots of cellular states. To move from descriptive

towards predictive trajectory models, additional information is required to constrain the space of

possible dynamics that could give rise to the same trajectory

9,10

. As such, lineage-tracing assays can

add information via genetic modiﬁcation to enable the reconstruction of lineage relationships

11–17

However, these assays are not straightforward to set up and are technically limited in many systems,

such as human tissues.

The concept of RNA velocity has enabled the recovery of directed dynamic information by lever-

aging the fact that newly transcribed, unspliced pre-mRNAs and mature, spliced mRNAs can be

distinguished in common single-cell RNA-seq protocols, the former detectable by the presence of in-

trons

. Assuming a simple per-gene reaction model that relates abundance of unspliced and spliced

mRNA, the change in mRNA abundance, termed RNA velocity, can be inferred. The combination

The copyright holder for this preprint (whichthis version posted October 29, 2019. ; https://doi.org/10.1101/820936doi: bioRxiv preprint

of velocities across genes can then be used to estimate the future state of an individual cell. The

original model

estimates velocities under the assumption that the transcriptional phases of induc-

tion and repression of gene expression last suﬃciently long to reach both an actively transcribing

and an inactive silenced steady-state equilibrium. After inferring the ratio of unspliced to spliced

mRNA abundance that is in a constant transcriptional steady state, velocities are determined as the

deviation of the observed ratio from its steady-state ratio. Inferring the steady-state ratio makes

two fundamental assumptions, namely that (i) on the gene level, the full splicing dynamics with

transcriptional induction, repression and steady-state mRNA levels are captured; and (ii) on the

cellular level, all genes share a common splicing rate. These assumptions are often violated, in par-

ticular when a population comprises multiple heterogeneous subpopulations with diﬀerent kinetics.

We refer to this modeling approach as the “steady-state model”.

To resolve the above restrictions, we developed scVelo, a likelihood-based dynamical model that

solves the full gene-wise transcriptional dynamics. It thereby generalizes RNA velocity estimation

to transient systems and systems with heterogeneous subpopulation kinetics. We infer the gene-

speciﬁc reaction rates of transcription, splicing and degradation, and an underlying gene-shared

latent time in an eﬃcient expectation-maximization framework. The inferred latent time represents

the cell’s internal clock, which accurately describes the cell’s position in the underlying biological

process. In contrast to existing similarity-based pseudotime methods, this latent time is grounded

only on transcriptional dynamics and accounts for speed and direction of motion.

We demonstrate the capabilities of the dynamical model on various cell lineages in hippocampal den-

tate gyrus neurogenesis

and pancreatic endocrinogenesis

. The dynamical model generally yields

more consistent velocity estimates across neighboring cells and accurately identiﬁes transcriptional

states as opposed to the steady-state model. It provides ﬁne-grained insights into the cell states

of cycling pancreatic endocrine precursor cells, including their lineage commitment, cell-cycle exit,

and ﬁnally endocrine cell diﬀerentiation. Here, our inferred latent time is able to reconstruct the

temporal sequence of transcriptomic events and cellular fates. Moreover, scVelo identiﬁes regimes

of regulatory changes such as transition states and stages of cell fate commitment. Herein, scVelo

identiﬁes putative driver genes of these transcriptional changes. Driver genes display pronounced dy-

namic behaviour and are systematically detected via their characterization by high likelihoods in the

dynamic model. This procedure presents a dynamics-based alternative to the standard diﬀerential

expression paradigm.

Finally, we propose to further account for stochasticity in gene expression, obtained by treating

transcription, splicing and degradation as probabilistic events. We show how this can be achieved

for the steady-state model and demonstrate its capability of capturing the directionality inferred

from the full dynamical model to a large extent. We illustrate its considerable improvement over the

steady-state model while being as eﬃcient in computation time. The dynamical, the stochastic as

well as the steady-state model are available within scVelo as a robust and scalable implementation

(https://scvelo.org). For the latter two scVelo achieves a ten-fold speedup over the original

implementation (velocyto)

The copyright holder for this preprint (whichthis version posted October 29, 2019. ; https://doi.org/10.1101/820936doi: bioRxiv preprint

on/off

u (t)

s(t)

transcription

splicing

degradation

s(t)

1 hour

3 hours

steady state

early switch

inference by maximizing joint likelihood

(

)

(θ,t

)

learned kinetics

previous iteration

time assignment

latent time

state likelihood

off

steady

state assignment parameter update

θ = (α

on/off

,β,γ)

Figure 1 | Solving the full splicing kinetics generalizes RNA velocity to transient populations.

a. Modeling transcriptional dynamics captures transcriptional induction and repression (‘on’ and ‘oﬀ’ phase)

of unspliced pre-mRNAs, their conversion into mature, spliced mRNAs and their eventual degradation.

b. An actively transcribed and an inactive silenced steady state is reached when the transcriptional phases

of induction and repression last suﬃciently long, respectively. In particular in transient cell populations,

however, steady states are often not reached as, e.g., induction may terminate before mRNA level saturation,

displaying an ‘early switching’ behavior.

c. We propose scVelo, a likelihood-based model that solves the full gene-wise transcriptional dynamics of

splicing kinetics, which is governed by two sets of parameters: (i) reaction rates of transcription, splicing

and degradation, and (ii) cell-speciﬁc latent variables of transcriptional state and time. The parameters

are inferred iteratively via expectation-maximization. For a given estimate of reaction rate parameters, time

points are assigned to each cell by minimizing its distance to the current phase trajectory. The transcriptional

states are assigned by associating a likelihood to respective segments on the trajectory, i.e. induction,

repression, active and inactive steady state.

d. The overall likelihood is then optimized by updating the model parameters of reaction rates. The dashed

purple line links the inferred (unobserved) inactive with the active steady state.

Results

Solving the full gene-wise transcription dynamics at single-cell resolution

As in the original framework

, we model transcriptional dynamics (Fig. 1a) using the basic reaction

kinetics described by

du(t)

= α

(t) −βu(t),

ds(t)

= βu(t) − γs(t),

for each gene, independent of all other genes. As opposed to the original framework, to account

for non-observed steady states (Fig. 1b), we solve these equations explicitly and infer the splicing

kinetics that is governed by two sets of parameters: (i) the reaction rates of transcription α

(t),

splicing β, and degradation γ; and (ii) cell-speciﬁc latent variables, i.e., a discrete transcriptional

state k

and a continuous time t

, where i represents a single observed cell. The parameters of

the reaction rates can be obtained if the latent variables are given, and vice versa. Hence, we

infer the parameters by expectation-maximization, iteratively estimating the reaction rates and

latent variables via maximum likelihood. In the expectation step, for a given model estimate of

The copyright holder for this preprint (whichthis version posted October 29, 2019. ; https://doi.org/10.1101/820936doi: bioRxiv preprint

the unspliced/spliced phase trajectory, X =



ˆu(t), ˆs(t)



, a latent time t

is assigned to an observed

mRNA value x

= (u

, s

) by minimizing its distance to the phase trajectory X (Fig. 1c). The

transcriptional states k

are then assigned by associating a likelihood to respective segments on the

phase trajectory X, i.e., k

∈ {on, oﬀ, ss

, ss

oﬀ

} labeling induction, repression, active and inactive

steady states. In the maximization step, the overall likelihood is then optimized by updating the

parameters of reaction rates (Fig. 1d, Supp. Fig. 5, Methods).

The resulting gene-speciﬁc trajectory X, parametrized by interpretable parameters of reaction rates

and transcriptional states, explicitly describes how mRNA levels evolve over latent time. While

the steady-state model uses linear regression to ﬁt assumed steady states and fails if these are not

observed, the dynamical model resolves the full dynamics of unspliced and spliced mRNA abundances

and thus enables unobserved steady states to also be faithfully captured (Supp. Fig. 1). RNA velocity

is then explicitly given by the derivative of spliced mRNA abundance, parametrized by the inferred

variables.

In order to make the inferred parameters of reaction rates relatable across genes, the gene-wise

latent times are coupled to a universal, gene-shared latent time that proxies a cell’s internal clock

(Suppl. Fig. 2, Methods). This universal time allows us to resolve the cell’s relative position in

a biological process with support from the splicing dynamics of all genes. Also transcriptional

states can be identiﬁed more conﬁdently by sharing information between genes. On simulated

splicing kinetics, latent time is able to reconstruct the underlying real time at near perfect correlation

and correct scale, clearly outperforming pseudotime. In contrast to pseudotime methods

3,21

, our

latent time is grounded on transcriptional dynamics and internally accounts for speed and direction

of motion. Hence, scVelo’s latent time yields faithful gene expression time-courses to delineate

dynamical processes, and to extract gene cascades.

Further, the coupling to a universal latent time allows us to identify the kinetic rates up to a global

gene-shared scale parameter. Employing the overall timescale of the developmental process as prior

information, the absolute values of kinetic rates can eventually be identiﬁed (Supp. Fig. 3).

Identifying reaction rates in transient cell populations

To validate the sensitivity of both models with respect to varying parameters in simulated splicing

kinetics, we randomly sampled 2,000 log-normally distributed parameters for each reaction rate and

time events following the Poisson law. The total time spent in a transcriptional state is varied

between two and ten hours.

The ratio inferred by the steady-state model yields a systematic error as the time of transcriptional

induction decreases such that mRNA levels are less likely to reach steady-state equilibrium lev-

els (Suppl. Fig. 3a). By contrast, the dynamical model yields a consistently smaller error and is

completely insensitive with respect to variability in induction duration. Furthermore, the Pearson

correlation between the true and inferred steady-state ratio increases from 0.71 to 0.97 when using

the dynamical model. Imposing the overall timescale of the splicing dynamics of 20 hours as prior

information, the dynamical model reliably recovers the true parameters of the simulated splicing

kinetics, achieving correlations of 0.85 and higher (Supp. Fig. 3b).

Resolving the heterogeneous population kinetics in dentate gyrus development

To test whether scVelo’s velocity estimates allow identiﬁcation of more complex population kinetics,

we considered a scRNA-seq experiment from the developing mouse dentate gyrus

(DG) comprising

two time points (P12 and P35) measured using droplet-based scRNA-seq (10x Genomics Chromium

Single Cell Kit V1, see Methods). The original publication aimed to elucidate the relationship be-

tween developmental and adult dentate gyrus neurogenesis. While they successfully linked transient

The copyright holder for this preprint (whichthis version posted October 29, 2019. ; https://doi.org/10.1101/820936doi: bioRxiv preprint

gene likelihood

0.6

(log) # genes

a b

dynamic-driving genes

Tmsb10 likelihood

Tmsb10

Hn1

Ppp3ca

Dlg2

non-dynamic genes

Tcea1 likelihood

Tcea1

Herc2

Rab11fip3

Capn15

Tmsb10

steady-state model

inferred dynamics

steady-state ratio

steady dyn.

dynamical model

steady

dynamical

steady dynamical

steady

dynamical

Fam155a

0 0.1

0 0.7

Granule !

immature

Granule mature

Neuroblast

nIPC

Radial

Glia

Astrocytes

GABA

Endothelial

Microglia

OPC

Mossy

Cck

steady

dyn.

consistency

scores

Figure 2 | Resolving subpopulation kinetics and identifying dynamical genes in neurogenesis.

a. Velocities derived from the dynamical model for dentate gyrus neurogenesis

are projected into a UMAP-

based embedding. The main gene-averaged ﬂow visualized by velocity streamlines corresponds to the granule

lineage, in which neuroblasts develop into granule cells. The remaining populations form distinct cell types

that are either diﬀerentiated, e.g., Cajal Retzius (CR) cells, or cell types that form sub-lineages, e.g., the

GABA and oligodendrocyte lineages (OPC to OL). When zooming into the cell types to examine single-cell

velocities, fundamental diﬀerences between the velocities derived from the steady-state and dynamical model

become apparent. Only the dynamical model identiﬁes CR cells to be terminal by assigning no velocity

and indicates that OPCs indeed diﬀerentiate into OLs. By contrast, the steady-state model displays a high

velocity in CR cells and points OPCs away from OLs. Overall, the dynamical models yields a more coherent

velocity vector ﬁeld as illustrated by the consistency scores (in the top right corner, deﬁned for each cell as

the correlation of its velocity with the velocities of neighboring cells).

b. Gene-resolved velocities allow further interpreting the inferred directionality on the cellular level. For

instance, Tmsb10 is the major contributor to the gene-averaged ﬂow that describes neuroblasts as diﬀerenti-

ating into granule cells. With Fam155a, the incongruous CR velocities from the steady-state model become

evident. By reducing velocity estimation to steady-state deviations, this model is biased to assign high veloc-

ities to outlier cells, such as the CR population. In contrast, the dynamical model assign CR cells to a steady

state with high likelihoods as they are not well explained by the overall kinetics and cannot be conﬁdently

linked to the transient induction state.

c. The dynamical model allows to systematically identify putative driver genes as genes characterized by high

likelihoods. While genes selected by high likelihoods (upper row) display pronounced dynamic behaviour,

expression of low-likelihood genes (lower row) is governed by noise or non-existing transient states.

intermediate states to neuroblast stages and mature granule cells, the commitment of radial glia-like

cells could not be conclusively determined.

After basic preprocessing, we apply both the steady-state and the dynamical model and display

the vector ﬁelds using streamline plots

in a UMAP-based embedding

of the data (Fig. 2a).

The dominating structure is the granule cell lineage, in which neuroblasts develop into granule

The copyright holder for this preprint (whichthis version posted October 29, 2019. ; https://doi.org/10.1101/820936doi: bioRxiv preprint

HTML Viewer

Frequently Asked Questions (17)

Q1. What are the main fates of endocrine commitment in the mouse pancreas?

Endocrine commitment terminates in four major fates: glucagonproducing α-cells, insulin-producing β-cells, somatostatin-producing δ-cells and ghrelin-producing -cells27.

Q2. What have the authors contributed in "Generalizing rna velocity to transient cell states through dynamical modeling" ?

The introduction of RNA velocity in single cells has opened up new ways of studying cellular differentiation. The authors demonstrate that scVelo enables disentangling heterogeneous subpopulation kinetics with unprecedented resolution in hippocampal dentate gyrus neurogenesis and pancreatic endocrinogenesis. The authors anticipate that scVelo will greatly facilitate the study of lineage decisions, gene regulation, and pathway activity identification.

Q3. What is the simplest way to infer mRNA velocity?

Assuming a simple per-gene reaction model that relates abundance of unspliced and spliced mRNA, the change in mRNA abundance, termed RNA velocity, can be inferred.

Q4. What is the way to measure RNA levels?

Metabolic labeling, e.g. using scSLAM-seq41,42, enables the quantification of total RNA levels together with newly transcribed RNA.

Q5. What is the role of the transcription factor Ngn3 in the endocrine development?

Endocrine cells are derived from endocrine progenitors located in the pancreatic epithelium, marked by transient expression of the transcription factor Ngn3.

Q6. What is the main idea behind the steady-state ratio?

Inferring the steady-state ratio makes two fundamental assumptions, namely that (i) on the gene level, the full splicing dynamics with transcriptional induction, repression and steady-state mRNA levels are captured; and (ii) on the cellular level, all genes share a common splicing rate.

Q7. how long does the dynamical model take to recover the absolute values of the reaction rates?

By imposing an overall timescale of 20 hours as prior information, the dynamical model recovers the absolute values of the reaction rates.

Q8. How is the resulting Markov jump process solved?

The resulting Markov jump process is commonly approximated by moment equations34, which can be solved in closed form in the linear ODE system under consideration.

Q9. how is the projection of velocities into the embedding?

The projection of velocities into a lower-dimensional embedding (e.g. UMAP23) for a cell i is obtained on the basis of a transition matrix π̃ (see previous section) which contains probabilities of cell-to-cell transitions that are in accordance with the corresponding velocity vectors,π̃ij = 1zi exp( cos∠(xj − xi,νi)σ2i) ,with row normalization factors zi = ∑j exp( πij σ2i ) and kernel width parameters σi.

Q10. How do the authors infer the parameters of the splicing kinetics?

the authors infer the parameters by expectation-maximization, iteratively estimating the reaction rates and latent variables via maximum likelihood.

Q11. How many highly variable genes are selected out of those that pass the minimum threshold?

The top 2,000 highly variable genes are selected out of those that pass a minimum threshold of 20 expressed counts commonly for spliced and unspliced mRNA.

Q12. How do the authors display the vector fields?

After basic preprocessing, the authors apply both the steady-state and the dynamical model and display the vector fields using streamline plots22 in a UMAP-based embedding23 of the data (Fig. 2a).

Q13. What is the likelihood-based framework for splicing?

With the assumption of the gene-specific σ to be constant across cells within one transcriptional state, and the observations to be i.i.d., the likelihood-based framework is derived in the following.

Q14. How can the authors infer the latent variables of the splicing kinetics?

In order to make the inferred parameters of reaction rates relatable across genes, the gene-wise latent times are coupled to a universal, gene-shared latent time that proxies a cell’s internal clock (Suppl. Fig. 2, Methods).

Q15. What is the difference between the steady-state and the dynamical model?

Velocities derived from the dynamical model are more consistent across velocities of neighboring cells than those derived from the steady-state model which results in a higher overall coherence of the velocity vector field (Fig. 2a top right, Supp. Fig. 7).Both the steady-state and the dynamical model yield additional dynamic flow within the mature compartment of granule cells, which was expected to be terminal and may be worthwhile to follow up experimentally.

Q16. How does scVelo scale with the number of cells and genes?

As it scales linearly with the number of cells and genes, its runtime is exceeded by velocyto’s quadratic runtime on large cell numbers of 35k and higher.

Q17. What is the negative log-likelihood to be minimized?

The negative log-likelihood to be minimized is given byl(θ) = log( √ 2πσ2) + 12σ2 ∑ i ∥∥∥xti(θ)− xobsi ∥∥∥2 , (7)where θ = (α(k), β, γ).

Generalizing RNA velocity to transient cell states through dynamical modeling.

Summary (3 min read)

Introduction

Results

Discussion

Methods

Modeling transcriptional dynamics

Steady-state model

Model description

Parameter inference

Validation metrics

Contributions

Figures (3)

Citations

References

Related Papers (5)

Frequently Asked Questions (17)

Q1. What are the main fates of endocrine commitment in the mouse pancreas?

Q2. What have the authors contributed in "Generalizing rna velocity to transient cell states through dynamical modeling" ?

Q3. What is the simplest way to infer mRNA velocity?

Q4. What is the way to measure RNA levels?

Q5. What is the role of the transcription factor Ngn3 in the endocrine development?

Q6. What is the main idea behind the steady-state ratio?

Q7. how long does the dynamical model take to recover the absolute values of the reaction rates?

Q8. How is the resulting Markov jump process solved?

Q9. how is the projection of velocities into the embedding?

Q10. How do the authors infer the parameters of the splicing kinetics?

Q11. How many highly variable genes are selected out of those that pass the minimum threshold?

Q12. How do the authors display the vector fields?

Q13. What is the likelihood-based framework for splicing?

Q14. How can the authors infer the latent variables of the splicing kinetics?

Q15. What is the difference between the steady-state and the dynamical model?

Q16. How does scVelo scale with the number of cells and genes?

Q17. What is the negative log-likelihood to be minimized?