A case study on the use of scale separation-based analytic propagators for parameter inference in stochastic gene regulation
Summary (3 min read)
1. INTRODUCTION
- Gene expression is a complex and highly regulated multi-step process that is responsible for the timely synthesis of proteins necessary for cellular function.
- A simple, “two-stage” model for stochastic gene expression consists of a constitutively active gene from which an mRNA molecule can be transcribed, and protein, the production of which depends on the instantaneous abundance of mRNA (see Fig. 1A).
- In recent work [14], the procedure developed in [2] was extended to capture departure from the assumption of perfect scale separation: the ratio of degradation rates of protein and mRNA, denoted ε, was taken to be small and positive instead of zero, as was the case in [2].
- While fluorescence microscopy yields only time-series for the intensity, these can nonetheless be converted into absolute protein numbers if a calibration factor of molecules per unit intensity can be estimated, see e.g. [19].
- For comparison, both propagators are also contrasted with an approximate solution of the CME that is computed using a finite state projection.
2.1. Two-stage Gene Expression Model
- The authors model gene expression as a two-stage process, whereby DNA is transcribed to mRNA, which is then translated into protein (see Fig. 1A).
- (1) Here,m and n denote mRNA and protein copy numbers, respectively, a is the non-dimensional transcription rate and b is the non-dimensional translation rate, while the degradation rates of mRNA and protein are given by γ and 1, respectively (cf. Fig. 1A).
- Finally, τ denotes a suitably nondimensionalized time variable.
- It follows that for ε sufficiently small, the dynamics of Eq. (1) will vary on two distinct time-scales: the long-term behavior of the system is naturally described on the “slow” τ -scale, while the “fast” transients evolve according to the rescaled time t := τε .
2.2.2. Uniform (First-Order) Propagator
- Here, ε denotes the per- turbation parameter, as before, while t is the fast time variable.
- The authors remark that the transition probability Pn|n0(τ, ε) contributes on the slow τ -scale in Eq. (3), while the t-dependent contribution in Eq. (3) accounts for the transient dynamics on the fast time-scale.
2.3. Special Cases of the Hypergeometric Functions
- Care must be taken when evaluating the hypergeometric function 2F1(a, b, c, z).
- The following special cases are of use [20].
2.4. Stochastic Simulation
- Stochastic simulations were performed using the StochKit 2.0 [21] simulation framework and the standard stochastic simulation algorithm [3], with a non-dimensionalized transcription rate a = 20 and a non-dimensionalized translation rate b = 2.5, corresponding to “regime I” as defined in [14].
- Each value of γ was simulated 20 times, and the resulting trajectories were used for computing the probability landscapes of the rescaled model parameters a and b.
- Protein quantities were observed without measurement noise at intervals of 0.1 time units.
- All simulation runs assumed zero molecules of mRNA and protein initially, i.e., m0 = 0 = n0.
2.5. Implementation
- Both the zeroth-order propagator Pn|n0 , Eq. (2), and the uniform propagator Pn|n0 , Eq. (3), were implemented in C++ with a Matlab mex-file interface.
- Special functions were evaluated using the GNU scientific library [22], the Hyp_2F1 function implementation of the Gauss hypergeometric function [23], and the Algorithm 910multiprecision special function library [24].
- While the difference of such numbers is potentially below a double precision machine error of approximately 10−13, they are nonetheless essential in the correct computation of the transition probabilities.
- Their C++ implementation is still inaccurate in some extreme cases, typically for very large protein numbers n, due to numerical differences which are sometimes as small as 10−370 in Eq. (7), but which unfortunately cannot be neglected as they are inflated by the remaining terms in the expression.
- Such inaccuracies are infrequent, though, and generally occur during transitions for which the uniform propagator yields nonphysical values; thus, they do not substantially affect their analysis, or the conclusions obtained in this study.
3. RESULTS AND DISCUSSION
- To assess the applicability of the zeroth-order propagator Pn|n0(τ, 0), Eq. (2), and the uniform propagator Pn|n0(τ, t, ε), Eq. (3), for parameter inference in the two-stage gene expression model, the authors simulated time-series with a specific parameter pair (a∗, b∗).
- Then, the authors computed the likelihood of the observed data set on the basis of the two propagators for a range of values for the parameters a and b.
- For simplicity, the authors assumed the scale separation γ between mRNA and protein lifetimes to be known (see Methods for definitions).
3.2. Parameter Inference
- The term ni represents the number of proteins at measurement time ti.
- Thus, the authors compute the logarithm of the probability of each transition, from ni−1 protein molecules at time ti−1 to ni molecules at time ti, in the sequence of observed measurements (see Fig. 1B, inset).
- In order to estimate the parameters a and b from simulated protein time-courses, Eq. (13) has to be evaluated very frequently.
- The authors thus developed a numerically stable expression for the uniform propagator Pn|n0 (see Section 2 2.2), and they used an efficient implementation in C++ for both propagators that results in reasonable runtimes; see Section 2 2.5 for details.
3.3. Comparison of Propagator Accuracy and Efficiency
- For each pair (a, b), the authors computed the log-likelihood L(a, b), thus obtaining a likelihood landscape that should ideally have its maximum, the maximum likelihood estimator (MLE), at the true parameter values (a∗, b∗).
- The authors immediately encountered the obstacle that the uniform propagator Pn|n0 yields negative transition probabilities, or even probabilities larger than one, for some choices of (a, b).
- The resulting non-physicality is discussed in [14], and is due to the fact that Pn|n0 is derived from an asymptotic approximation; nonetheless, it is problematic when computing the overall log-likelihood, as the definition in Eq. (13) becomes meaningless.
- In Fig. 3B, a typical protein time-course with γ = 10 is shown (top), along with the log-likelihood obtained from the uniform propagator Pn|n0 , Eq. (3), for the true parameter values , and for the MLE (cyan).
- From that plot, it is obvious that large regions of the transition space yield non-physical values, shown in gray.
4. CONCLUSION
- The authors have investigated the utility of a propagator-based approach for approximating the transition probabilities in a simple two-stage gene expression model by attempting parameter inference from protein time-series.
- The simulations were initialized with zero molecules of both mRNA and protein; this simplification, as compared to a typical biological setting, does not affect the subsequent analysis.
- Alternatively, one could use Markov Chain Monte Carlo (MCMC) techniques to sample directly from the posterior in order to obtain the log-likelihood landscape [26].
- The variance of the noise then constitutes an additional unknown parameter σ which would have to be inferred.
Did you find this useful? Give us your feedback
Citations
124 citations
26 citations
Cites background or methods from "A case study on the use of scale se..."
...An example realisation of the above procedure can e.g. be found in the work by Feigelman et al. (2015)....
[...]
...…be applied in a straightforward fashion in cases where Assumption 3.2 fails, which is particularly relevant in relation to previous work (Shahrezaei and Swain 2008a; Feigelman et al. 2015), where the CME system (3.3) is studied for parameter values far beyond the range implied by Assumption 3.2....
[...]
7 citations
7 citations
6 citations
References
2,687 citations
810 citations
19 citations
"A case study on the use of scale se..." refers methods in this paper
...The presence of the (singular) perturbation parameter ε allows for the application of asymptotic techniques, such as geometric singular perturbation theory [15] and matched asymptotic expansions [16]....
[...]