Generalizing RNA velocity to transient cell states through dynamical modeling.
Summary (3 min read)
Introduction
- Single-cell transcriptomics has enabled the unbiased study of biological processes such as cellular differentiation and lineage choice at single cell resolution1,2.
- These assays are not straightforward to set up and are technically limited in many systems, such as human tissues.
- After inferring the ratio of unspliced to spliced mRNA abundance that is in a constant transcriptional steady state, velocities are determined as the deviation of the observed ratio from its steady-state ratio.
- Here, their inferred latent time is able to reconstruct the temporal sequence of transcriptomic events and cellular fates.
- The authors illustrate its considerable improvement over the steady-state model while being as efficient in computation time.
Results
- Solving the full gene-wise transcription dynamics at single-cell resolution.
- Hence, scVelo’s latent time yields faithful gene expression time-courses to delineate dynamical processes, and to extract gene cascades.
- While they successfully linked transient intermediate states to neuroblast stages and mature granule cells, the commitment of radial glia-like cells could not be conclusively determined.
- The stochastic steady-state model is capable of capturing the results of the full dynamical model to a greater extent than the deterministic steady-state model (Suppl. Fig. 7a).
Discussion
- ScVelo enables velocity estimation without assuming either the presence of steady states or a common splicing rate across genes.
- These assumptions might be violated in practice and can be addressed by extending scVelo towards more complex regulations:.
- On the gene level, full-length scRNA-seq protocols, such as Smart-seq236, allow accounting for gene structure, alternative splicing and state-dependent degradation rates.
- This additional readout can be easily included into the dynamical model, incorporating varying labeling lengths as additional prior.
- Beyond the identification of trajectories and the dynamics of single genes, the dynamic activation of pathways is of central importance.
Methods
- Preparing the scRNA-seq data for velocity estimation.
- The authors included samples from two experimental time points, P12 and P35.
- The raw dataset of pancreatic endocrinogenesis20 has been deposited under the accession number GSE132188.
- These are the default procedures in scVelo.
- After velocity estimation, the gene space can be further restricted to genes that pass a minimum threshold for the coefficient of determination (R2, derived from the steady-state model) or gene likelihood (P ((u, s)|(θ, η)), derived from the dynamical model).
Modeling transcriptional dynamics
- On the basis of the dynamical model of transcription shown in Fig. 1, the authors developed a computational framework for robust and scalable inference of RNA velocity.
- Assuming splicing and degradation rates to be constant (time-independent), the authors obtain the gene- specific rate equations du(t) dt = α(k)(t)− βu(t), ds(t) dt = βu(t)− γs(t), (1) which describe how the mRNA abundances evolve over time.
- The time derivative of mature spliced mRNA, termed RNA velocity, is denoted as ν(t) = ds(t)dt .
- In general, the sampled population is not time-resolved and t is a latent variable.
- Likewise, the cell’s transcriptional state k is a latent variable that is not known, and the rates α(k), β and γ are usually not experimentally measured.
Steady-state model
- Under the assumption that the authors observe both transcriptional phases of induction and repression, and that these phases last sufficiently long to reach a transcribing and a silenced (in) steady-state equilibrium, velocity estimation can be simplified as follows:.
- In steady states, the authors obtain on average a constant transcriptional state where dsdt = 0 which, by solving Eq. 1, yields γ ′ = γβ as the steady-state ratio of unspliced to spliced mRNA.
- Hence, the ratio can be approximated by a linear regression on these extreme quantiles.
- Taken together, under this simplified model, velocities are estimated along two simple equations as steady-state deviations.
- The cumbersome problem of estimating latent time is circumvented.
Model description
- In recognition that steady states are not always captured and that splicing rates differ between genes, the authors establish a framework that does not rely on these restrictions.
- Gene activity is orchestrated by transcriptional regulation, implying that gene up- or down-regulation is inscribed by alterations in the state-dependent transcription rate α(k).
- That is, α(k) can have multiple configurations each encoding one transcriptional state.
- Not only α(k) but also the initial conditions u(k)0 , s (k) 0 are state-dependent, as well as the time point of switching states t(k)0 .
Parameter inference
- In the following, the authors consider four phases, induction (k=1), and repression (k=0) each with an associated potential steady state (k=ss1, k=ss0).
- For latent time, the authors adopt an explicit formula that approximates the optimal time assignment for each cell.
- Rd predicts the change in gene expression of cell si ∈ Rd. Cell si is expected to have a high probability of transitioning towards cell sj when the corresponding change in gene expression δij = sj−si matches the predicted change according to the velocity vector vi.
- The resulting similarity matrix π encodes a graph, which the authors refer to as velocity graph.
Validation metrics
- To validate the coherence of the velocity vector field, the authors define a consistency score for each cell i as the mean correlation of its velocity νi with velocities from neighboring cells, ci = 〈corr(νi, νj)〉j , where cell j is neighboring cell i. (22).
- To validate the contribution of a selection of genes (e.g. top likelihood-ranked genes) to the overall inferred dynamics, the authors define a reconstructability score as follows:.
- The velocity graph consisting of correlations between velocities and cell-to-cell transitions (see previous sections), is computed once (i) including all genes yielding π, and once (ii) only including the selection of genes yielding π′.
- The reconstructability score is defined as the median correlation of outgoing transitions from cell i to all cells that it can potentially transition to, i.e., r = mediani corr(πi, π′i) across all cells i. (23).
Contributions
- VB designed and developed the method, implemented scVelo and analyzed the data.
- FJT conceived the study with contributions from VB and FAW.
- VB, FAW and FJT wrote the manuscript with contributions from the coauthors.
- SP contributed to developing scVelo, and ML contributed to developing validation metrics.
- All authors read and approved the final manuscript.
Did you find this useful? Give us your feedback
Citations
703 citations
526 citations
435 citations
254 citations
207 citations
References
23,312 citations
13,519 citations
11,078 citations
5,506 citations
5,390 citations
Related Papers (5)
Frequently Asked Questions (17)
Q2. What have the authors contributed in "Generalizing rna velocity to transient cell states through dynamical modeling" ?
The introduction of RNA velocity in single cells has opened up new ways of studying cellular differentiation. The authors demonstrate that scVelo enables disentangling heterogeneous subpopulation kinetics with unprecedented resolution in hippocampal dentate gyrus neurogenesis and pancreatic endocrinogenesis. The authors anticipate that scVelo will greatly facilitate the study of lineage decisions, gene regulation, and pathway activity identification.
Q3. What is the simplest way to infer mRNA velocity?
Assuming a simple per-gene reaction model that relates abundance of unspliced and spliced mRNA, the change in mRNA abundance, termed RNA velocity, can be inferred.
Q4. What is the way to measure RNA levels?
Metabolic labeling, e.g. using scSLAM-seq41,42, enables the quantification of total RNA levels together with newly transcribed RNA.
Q5. What is the role of the transcription factor Ngn3 in the endocrine development?
Endocrine cells are derived from endocrine progenitors located in the pancreatic epithelium, marked by transient expression of the transcription factor Ngn3.
Q6. What is the main idea behind the steady-state ratio?
Inferring the steady-state ratio makes two fundamental assumptions, namely that (i) on the gene level, the full splicing dynamics with transcriptional induction, repression and steady-state mRNA levels are captured; and (ii) on the cellular level, all genes share a common splicing rate.
Q7. how long does the dynamical model take to recover the absolute values of the reaction rates?
By imposing an overall timescale of 20 hours as prior information, the dynamical model recovers the absolute values of the reaction rates.
Q8. How is the resulting Markov jump process solved?
The resulting Markov jump process is commonly approximated by moment equations34, which can be solved in closed form in the linear ODE system under consideration.
Q9. how is the projection of velocities into the embedding?
The projection of velocities into a lower-dimensional embedding (e.g. UMAP23) for a cell i is obtained on the basis of a transition matrix π̃ (see previous section) which contains probabilities of cell-to-cell transitions that are in accordance with the corresponding velocity vectors,π̃ij = 1zi exp( cos∠(xj − xi,νi)σ2i) ,with row normalization factors zi = ∑j exp( πij σ2i ) and kernel width parameters σi.
Q10. How do the authors infer the parameters of the splicing kinetics?
the authors infer the parameters by expectation-maximization, iteratively estimating the reaction rates and latent variables via maximum likelihood.
Q11. How many highly variable genes are selected out of those that pass the minimum threshold?
The top 2,000 highly variable genes are selected out of those that pass a minimum threshold of 20 expressed counts commonly for spliced and unspliced mRNA.
Q12. How do the authors display the vector fields?
After basic preprocessing, the authors apply both the steady-state and the dynamical model and display the vector fields using streamline plots22 in a UMAP-based embedding23 of the data (Fig. 2a).
Q13. What is the likelihood-based framework for splicing?
With the assumption of the gene-specific σ to be constant across cells within one transcriptional state, and the observations to be i.i.d., the likelihood-based framework is derived in the following.
Q14. How can the authors infer the latent variables of the splicing kinetics?
In order to make the inferred parameters of reaction rates relatable across genes, the gene-wise latent times are coupled to a universal, gene-shared latent time that proxies a cell’s internal clock (Suppl. Fig. 2, Methods).
Q15. What is the difference between the steady-state and the dynamical model?
Velocities derived from the dynamical model are more consistent across velocities of neighboring cells than those derived from the steady-state model which results in a higher overall coherence of the velocity vector field (Fig. 2a top right, Supp. Fig. 7).Both the steady-state and the dynamical model yield additional dynamic flow within the mature compartment of granule cells, which was expected to be terminal and may be worthwhile to follow up experimentally.
Q16. How does scVelo scale with the number of cells and genes?
As it scales linearly with the number of cells and genes, its runtime is exceeded by velocyto’s quadratic runtime on large cell numbers of 35k and higher.
Q17. What is the negative log-likelihood to be minimized?
The negative log-likelihood to be minimized is given byl(θ) = log( √ 2πσ2) + 12σ2 ∑ i ∥∥∥xti(θ)− xobsi ∥∥∥2 , (7)where θ = (α(k), β, γ).