30 Sep 2020-bioRxiv (Cold Spring Harbor Laboratory)-
TL;DR: This work demonstrates mathematically that, in some cases, the same noise decomposition can be achieved at the transcriptional level with non-identical and not-necessarily independent reporters, and uses the result to show that generic reporters lying in the same biochemical pathways can replace dual reporters, enabling the Noise decomposition to be obtained from only a single gene.
Abstract: Single-cell expression profiling is destructive, giving rise to only static snapshots of cellular states. This loss of temporal information presents significant challenges in inferring dynamics from population data. Here we provide a formal analysis of the extent to which dynamic variability from within individual systems ("intrinsic noise") is distinguishable from variability across the population ("extrinsic noise"). Our results mathematically formalise observations that it is impossible to identify these sources of variability from the transcript abundance distribution alone. Notably, we find that systems subject to population variation invariably inflate the apparent degree of burstiness of the underlying process. Such identifiability problems can be remedied by the dual-reporter method, which separates the total gene expression noise into intrinsic and extrinsic contributions. This noise decomposition, however, requires strictly independent and identical gene reporters integrated into the same cell, which can be difficult to implement experimentally in many systems. Here we demonstrate mathematically that, in some cases, the same noise decomposition can be achieved at the transcriptional level with non-identical and not-necessarily independent reporters. We use our result to show that generic reporters lying in the same biochemical pathways (e.g. mRNA and protein) can replace dual reporters, enabling the noise decomposition to be obtained from only a single gene. Stochastic simulations are used to support our theory, and show that our "pathway-reporter" method compares favourably to the dual-reporter method.
Sometimes it can afford evolutionary advantages, for example, in the context of bet-hedging strategies.
Such sources of variability contribute extrinsic noise, and reflect the variation in gene expression and transcription activity across the cell population.
Here the authors develop a widely applicable generalisation (and simplification) of the original dual-reporter approach [4].
The Telegraph Model The Telegraph model was first introduced in [21], and since then has been widely employed in the literature to model bursty gene expression in eukaryotic cells [22–25].
Throughout, the authors will refer to the probability mass function p̃T (n; θ) as the Telegraph distribution with parameters θ.
Identifiability Considerations
Decoupling the effects of extrinsic noise from experimental measurements has been notoriously challenging.
In Fig. 2A (middle panel), the authors compare the representation obtained in (4) with the corresponding fixed-parameter negative binomial distribution for two different sets of parameters.
Thus, the distribution of any instantaneously bursty system with mean burst intensity b can be obtained from one with greater burst frequency, by varying the mean burst intensity θ according to a shifted beta prime distribution.
Noise on the transcription rate will invariably produce copy number data that is suggestive of a more bursty model.
The truncated normal distribution is not chosen on the basis of biological relevance, but rather to demonstrate that even a symmetric noise distribution (except for truncation at 0) produces qualitatively similar results to the distributions used in the precise non-identifiability results.
Resolving Non-identifiability
The results of the previous section show that additional information, beyond the observed copy number distribution, is required to constrain the space of possible dynamics that could give rise to the same distribution.
The decomposition applies to dynamic noise [39], and generalises to higher moments in [40].
The dual-reporter method requires distinguishable measurements of transcripts or proteins from two independent and identically distributed reporter genes integrated into the same cell.
As the authors show in the next section, there are many situations where the random variable E(X;Z) is precisely the common part of E(Y ;Z) and E(X;Z) (i.e., h(Z′) = E(X;Z)), and the normalised intrinsic contribution to the covariance is either zero or negligible.
In these cases, the normalised covariance of X and Y will identify precisely the extrinsic noise contribution η2ext to the total noise η2X .
The Pathway-Reporter Method
The authors show that for some reporters X and Y belonging to the same biochemical pathway, the covariance of X and Y continues to identify the extrinsic, and subsequently intrinsic, noise contributions to the total noise.
Thus, again the NDP holds, and the normalised covariance of E(XN ;Z) and E(XP ;Z) will identify the noise on the transcriptional component KN λ(λ+µ) .
The time series of copy numbers for each of nascent mRNA, mature mRNA and protein broadly follow each other, each with delay from its predecessor (Fig. 4B).
The parameter KN is given the noise distribution Beta(3, 6), which has a slightly higher coefficient of variation η2 = 0.2.
The results for the nascent mRNA–protein reporters, case (c), given in Table 4 show comparable performance to dual reporters, with only modest overshoot; even in the worst performing case of λ = 0.5, µ = 1 the result of the pathway reporters is within one standard deviation, in a very tight distribution.
Discussion
The ability to extract transcriptional dynamics from measured distributions of mRNA copy numbers is limited.
It is therefore necessary to collect further information, beyond measurements of the transcripts alone, in order to constrain the number of possible theoretical models of gene activity that could represent the system.
The authors have developed a theoretical framework for estimating levels of extrinsic noise, which can assist in resolving the non-identifiability problems.
The dual reporter method of Swain et al. [4] already provides one such approach; but it is experimentally challenging to set up in many systems, and requires strictly identical and independent pairs of gene reporters.
The authors have exploited this to yield reliable estimates of noise strength, which they are confident will assist in setting better practices for model fitting and inference in the analysis of single-cell data.
Acknowledgments
The authors gratefully acknowledge Rowan D. Brackston helpful discussions in the early stages of this research.
The authors also wish to thank Arjun Raj for providing valuable feedback on this work.
L.H. and M.P.H.S. were supported by the University of Melbourne DVCR fund.
Data Availability
Simulations of the models used in the paper are performed using Gillespie’s Stochastic Simulation Algorithm (SSA) implemented in Julia.
The simulation code is available in the GitHub repository https://github.com/leham/PathwayReporters.
The data used in the paper are provided in the supplementary datasets.
Author Contributions
L.H. and M.J. conceptualised the research, with support from M.P.H.S.
All authors provided critical feedback and helped shape the research.
TL;DR: System biology for forecasting biological system dynamics from multi-omic data represents the future of cell biology empowering a new generation of technology-driven predictive medicine.
Abstract: As the single cell field races to characterize each cell type, state, and behavior, the complexity of the computational analysis approaches the complexity of the biological systems. Single cell and imaging technologies now enable unprecedented measurements of state transitions in biological systems, providing high-throughput data that capture tens-of-thousands of measurements on hundreds-of-thousands of samples. Thus, the definition of cell type and state is evolving to encompass the broad range of biological questions now attainable. To answer these questions requires the development of computational tools for integrated multi-omics analysis. Merged with mathematical models, these algorithms will be able to forecast future states of biological systems, going from statistical inferences of phenotypes to time course predictions of the biological systems with dynamic maps analogous to weather systems. Thus, systems biology for forecasting biological system dynamics from multi-omic data represents the future of cell biology empowering a new generation of technology-driven predictive medicine.
TL;DR: In this article , the authors derived the analytical time-dependent solution of an extended telegraph model that explicitly considers the doubling of gene copy numbers upon DNA replication, dependence of the mRNA synthesis rate on cellular volume, gene dosage compensation, partitioning of molecules during cell division, cell-cycle duration variability, and cell-size control strategies.
Abstract: The standard model describing the fluctuations of mRNA numbers in single cells is the telegraph model which includes synthesis and degradation of mRNA, and switching of the gene between active and inactive states. While commonly used, this model does not describe how fluctuations are influenced by the cell cycle phase, cellular growth and division, and other crucial aspects of cellular biology. Here we derive the analytical time-dependent solution of an extended telegraph model that explicitly considers the doubling of gene copy numbers upon DNA replication, dependence of the mRNA synthesis rate on cellular volume, gene dosage compensation, partitioning of molecules during cell division, cell-cycle duration variability, and cell-size control strategies. Based on the time-dependent solution, we obtain the analytical distributions of transcript numbers for lineage and population measurements in steady-state growth and also find a linear relation between the Fano factor of mRNA fluctuations and cell volume fluctuations. We show that generally the lineage and population distributions in steady-state growth cannot be accurately approximated by the steady-state solution of extrinsic noise models, i.e. a telegraph model with parameters drawn from probability distributions. This is because the mRNA lifetime is often not small enough compared to the cell cycle duration to erase the memory of division and replication. Accurate approximations are possible when this memory is weak, e.g. for genes with bursty expression and for which there is sufficient gene dosage compensation when replication occurs.
TL;DR: This mini-review perspective describes two possible scenarios of cell fate decisions based on the current knowledge about gene regulatory networks and how cellular environments are established and points out further possible research directions.
Abstract: Precise coordination of cell fate decisions is a hallmark of multicellular organisms. Especially in tissues with non-stereotypic anatomies, dynamic communication between developing cells is vital for ensuring functional tissue organization. Radial plant growth is driven by a plant stem cell niche known as vascular cambium, usually strictly producing secondary xylem (wood) inward and secondary phloem (bast) outward, two important structures serving as much-needed CO2 depositories and building materials. Because of its bidirectional nature and its developmental plasticity, the vascular cambium serves as an instructive paradigm for investigating principles of tissue patterning. Although genes and hormones involved in xylem and phloem formation have been identified, we have a yet incomplete picture of the initial steps of cell fate transitions of stem cell daughters into xylem and phloem progenitors. In this mini-review perspective, we describe two possible scenarios of cell fate decisions based on the current knowledge about gene regulatory networks and how cellular environments are established. In addition, we point out further possible research directions.
TL;DR: Monod as discussed by the authors integrates unspliced and spliced count matrices, and provides a route to identifying and studying differential expression patterns that do not cause changes in average gene expression, and may be extended to more sophisticated models of variation and further experimental observables.
Abstract: We present the Python package Monod for the analysis of single-cell RNA sequencing count data through biophysical modeling. Monod naturally “integrates” unspliced and spliced count matrices, and provides a route to identifying and studying differential expression patterns that do not cause changes in average gene expression. The Monod framework is open-source and modular, and may be extended to more sophisticated models of variation and further experimental observables. The Monod package can be installed from the command line using pip install monod. The source code is available and maintained at https://github.com/pachterlab/monod. A separate repository, which contains sample data and Python notebooks for analysis with Monod, is accessible at https://github.com/pachterlab/monod_examples/. Structured documentation and tutorials are hosted at https://monod-examples.readthedocs.io/.
TL;DR: A simple mathematical model of internalization is developed that captures the dynamical behaviour, cell-to-cell variation, and extrinsic noise introduced by flow cytometry and is broadly applicable to identify biological variability in single-cell data from internalization assays and similar experiments that probe cellular dynamical processes.
Abstract: Biological heterogeneity is a primary contributor to the variation observed in experiments that probe dynamical processes, such as the internalization of material by cells. Given that internalization is a critical process by which many therapeutics and viruses reach their intracellular site of action, quantifying cell-to-cell variability in internalization is of high biological interest. Yet, it is common for studies of internalization to neglect cell-to-cell variability. We develop a simple mathematical model of internalization that captures the dynamical behaviour, cell-to-cell variation, and extrinsic noise introduced by flow cytometry. We calibrate our model through a novel distribution-matching approximate Bayesian computation algorithm to flow cytometry data of internalization of anti-transferrin receptor antibody in a human B-cell lymphoblastoid cell line. This approach provides information relating to the region of the parameter space, and consequentially the nature of cell-to-cell variability, that produces model realizations consistent with the experimental data. Given that our approach is agnostic to sample size and signal-to-noise ratio, our modelling framework is broadly applicable to identify biological variability in single-cell data from internalization assays and similar experiments that probe cellular dynamical processes.
TL;DR: The Handbook of Mathematical Functions with Formulas (HOFF-formulas) as mentioned in this paper is the most widely used handbook for mathematical functions with formulas, which includes the following:
Abstract: (1965). Handbook of Mathematical Functions with Formulas. Technometrics: Vol. 7, No. 1, pp. 78-79.
7,538 citations
"Pathway dynamics can delineate the ..." refers background in this paper
...Here θ denotes the parameter vector (μ, λ,K, δ), the function 1F1 is the confluent hypergeometric function [27], and, for real number x and positive integer n, the notation x abbreviates the rising factorial of x (also known as the Pochhammer function)....
TL;DR: Using a quantitative model, the first genome-scale prediction of synthesis rates of mRNAs and proteins is obtained and it is found that the cellular abundance of proteins is predominantly controlled at the level of translation.
Abstract: Gene expression is a multistep process that involves the transcription, translation and turnover of messenger RNAs and proteins. Although it is one of the most fundamental processes of life, the entire cascade has never been quantified on a genome-wide scale. Here we simultaneously measured absolute mRNA and protein abundance and turnover by parallel metabolic pulse labelling for more than 5,000 genes in mammalian cells. Whereas mRNA and protein levels correlated better than previously thought, corresponding half-lives showed no correlation. Using a quantitative model we have obtained the first genome-scale prediction of synthesis rates of mRNAs and proteins. We find that the cellular abundance of proteins is predominantly controlled at the level of translation. Genes with similar combinations of mRNA and protein stability shared functional properties, indicating that half-lives evolved under energetic and dynamic constraints. Quantitative information about all stages of gene expression provides a rich resource and helps to provide a greater understanding of the underlying design principles.
5,635 citations
"Pathway dynamics can delineate the ..." refers background in this paper
...(13) Since mRNA tends to be less stable than protein, we have that δp < 1, and often δp 1 (45, 46)....
[...]
...For mammalian genes (46), it is reported that the median mRNA decay rate δM is (approximately) five times larger than the median protein decay rate δP , determined from 4,200 genes....
TL;DR: This work constructed strains of Escherichia coli that enable detection of noise and discrimination between the two mechanisms by which it is generated and reveals how low intracellular copy numbers of molecules can fundamentally limit the precision of gene regulation.
Abstract: Clonal populations of cells exhibit substantial phenotypic variation. Such heterogeneity can be essential for many biological processes and is conjectured to arise from stochasticity, or noise, in gene expression. We constructed strains of Escherichia coli that enable detection of noise and discrimination between the two mechanisms by which it is generated. Both stochasticity inherent in the biochemical process of gene expression (intrinsic noise) and fluctuations in other cellular components (extrinsic noise) contribute substantially to overall variation. Transcription rate, regulatory dynamics, and genetic factors control the amplitude of noise. These results establish a quantitative foundation for modeling noise in genetic networks and reveal how low intracellular copy numbers of molecules can fundamentally limit the precision of gene regulation.
5,209 citations
"Pathway dynamics can delineate the ..." refers background in this paper
...(13) Since mRNA tends to be less stable than protein, we have that δp < 1, and often δp 1 (45, 46)....
TL;DR: This handbook results from a 10-year project conducted by the National Institute of Standards and Technology with an international group of expert authors and validators and is destined to replace its predecessor, the classic but long-outdated Handbook of Mathematical Functions, edited by Abramowitz and Stegun.
Abstract: Modern developments in theoretical and applied science depend on knowledge of the properties of mathematical functions, from elementary trigonometric functions to the multitude of special functions. These functions appear whenever natural phenomena are studied, engineering problems are formulated, and numerical simulations are performed. They also crop up in statistics, financial models, and economic analysis. Using them effectively requires practitioners to have ready access to a reliable collection of their properties. This handbook results from a 10-year project conducted by the National Institute of Standards and Technology with an international group of expert authors and validators. Printed in full color, it is destined to replace its predecessor, the classic but long-outdated Handbook of Mathematical Functions, edited by Abramowitz and Stegun. Included with every copy of the book is a CD with a searchable PDF of each chapter.
TL;DR: A handbook of mathematical functions that is designed to provide scientific investigations with a comprehensive and self-contained summary of the mathematical functions arising in physical and engineering problems is presented in this article.
Abstract: A handbook of mathematical functions that is designed to provide scientific investigations with a comprehensive and self-contained summary of the mathematical functions that arise in physical and engineering problems.
Q1. What contributions have the authors mentioned in the paper "Pathway dynamics can delineate the sources of transcriptional noise in gene expression" ?
Such identifiability problems can, in principle, be remedied by dual-reporter assays, which separates total gene expression noise into intrinsic and extrinsic contributions ; unfortunately, however, this requires pairs of strictly independent and identical gene reporters to be integrated into the same cell, which is difficult to implement experimentally in most systems. Here the authors demonstrate mathematically that, in some cases decomposition of transcriptional noise is possible with non-identical and not-necessarily independent reporters. The authors use their result to show that generic reporters lying in the same biochemical pathways ( e. g. mRNA and protein ) can replace dual reporters, enabling the noise decomposition to be obtained