scispace - formally typeset
Search or ask a question
Posted ContentDOI

Classification-based Inference of Dynamical Models of Gene Regulatory Networks

18 Jun 2019-bioRxiv (Cold Spring Harbor Laboratory)-pp 673137
TL;DR: This work presents FIGR (Fast Inference of Gene Regulation), a novel classification-based inference approach to determining gene circuit parameters that is faster than global non-linear optimization by nearly three orders of magnitude and its computational complexity scales much better with GRN size.
Abstract: Cell-fate decisions during development are controlled by densely interconnected gene regulatory networks (GRNs) consisting of many genes. Inferring and predictively modeling these GRNs is crucial for understanding development and other physiological processes. Gene circuits, coupled differential equations that represent gene product synthesis with a switch-like function, provide a biologically realistic framework for modeling the time evolution of gene expression. However, their use has been limited to smaller networks due to the computational expense of inferring model parameters from gene expression data using global non-linear optimization. Here we show that the switch-like nature of gene regulation can be exploited to break the gene circuit inference problem into two simpler optimization problems that are amenable to computationally efficient supervised learning techniques. We present FIGR (Fast Inference of Gene Regulation), a novel classification-based inference approach to determining gene circuit parameters. We demonstrate FIGR9s effectiveness on synthetic data as well as experimental data from the gap gene system of Drosophila. FIGR is faster than global non-linear optimization by nearly three orders of magnitude and its computational complexity scales much better with GRN size. On a practical level, FIGR can accurately infer the biologically realistic gap gene network in under a minute on desktop-class hardware instead of requiring hours of parallel computing. We anticipate that FIGR would enable the inference of much larger biologically realistic GRNs than was possible before. FIGR Source code is freely available at http://github.com/mlekkha/FIGR.
Citations
More filters
01 Jan 2011
TL;DR: This work profiled gene expression in 38 distinct purified populations of human hematopoietic cells and used probabilistic models of gene expression and analysis of cis-elements in gene promoters to decipher the general organization of their regulatory circuitry.
Abstract: Though many individual transcription factors are known to regulate hematopoietic differentiation, major aspects of the global architecture of hematopoiesis remain unknown. Here, we profiled gene expression in 38 distinct purified populations of human hematopoietic cells and used probabilistic models of gene expression and analysis of cis-elements in gene promoters to decipher the general organization of their regulatory circuitry. We identified modules of highly coexpressed genes, some of which are restricted to a single lineage but most of which are expressed at variable levels across multiple lineages. We found densely interconnected cis-regulatory circuits and a large number of transcription factors that are differentially expressed across hematopoietic states. These findings suggest a more complex regulatory system for hematopoiesis than previously assumed.

49 citations

Journal ArticleDOI
TL;DR: In this article, the authors developed and applied a network inference method, exploiting the ability to infer dynamic information from single-cell snapshot expression data based on expression profiles of 48 genes in 2,167 blood stem and progenitor cells.

18 citations

Journal ArticleDOI
09 Jul 2020
TL;DR: The great variation of genome sequence and regulatory elements of the genome architecture are exploited in studies of genome-wide association with disease, in the framework of Precision Medicine and in general of Genomic Medicine.
Abstract: Determination of the DNA sequence of the human genome, revealing extensive genetic variation, and the mapping of the genes and the various regulatory elements of genome function within the genomic DNA, has revolutionized the way we view the states of health and disease in our time. Genetic complexity of the genome is manifested on different levels. The first level refers to the expression of protein coding genes, as regulated by their individual promoter in linear proximity. The next level of genetic complexity involves long distance action by far away enhancers, interacting with promoters through DNA looping. This 3dimensional (3D) regulation is further developing by chromosome folding into the so called transcription factories, for fully physiological expression. Chromosome folding, mediated by specific genetic elements – insulators – is adding to the genetic complexity by facilitating movements of chromatin of specific genomic regions – the so-called topologically associated domains (TAD) in support of transcription and other cellular functions. Further genetic complexity has emerged with the finding that over 75% of the genome is transcribed and except of the coding genes, a plethora of RNA transcripts are produced – the non-coding RNA – that has important regulatory roles in the gene expression context. The great variation of genome sequence and regulatory elements of the genome architecture are exploited in studies of genome-wide association with disease, in the framework of Precision Medicine and in general of Genomic Medicine.

1 citations

References
More filters
Journal ArticleDOI
13 May 1983-Science
TL;DR: There is a deep and useful connection between statistical mechanics and multivariate or combinatorial optimization (finding the minimum of a given function depending on many parameters), and a detailed analogy with annealing in solids provides a framework for optimization of very large and complex systems.
Abstract: There is a deep and useful connection between statistical mechanics (the behavior of systems with many degrees of freedom in thermal equilibrium at a finite temperature) and multivariate or combinatorial optimization (finding the minimum of a given function depending on many parameters). A detailed analogy with annealing in solids provides a framework for optimization of the properties of very large and complex systems. This connection to statistical mechanics exposes new information and provides an unfamiliar perspective on traditional optimization problems and methods.

41,772 citations

Book
28 Jul 2013
TL;DR: In this paper, the authors describe the important ideas in these areas in a common conceptual framework, and the emphasis is on concepts rather than mathematics, with a liberal use of color graphics.
Abstract: During the past decade there has been an explosion in computation and information technology. With it have come vast amounts of data in a variety of fields such as medicine, biology, finance, and marketing. The challenge of understanding these data has led to the development of new tools in the field of statistics, and spawned new areas such as data mining, machine learning, and bioinformatics. Many of these tools have common underpinnings but are often expressed with different terminology. This book describes the important ideas in these areas in a common conceptual framework. While the approach is statistical, the emphasis is on concepts rather than mathematics. Many examples are given, with a liberal use of color graphics. It is a valuable resource for statisticians and anyone interested in data mining in science or industry. The book's coverage is broad, from supervised learning (prediction) to unsupervised learning. The many topics include neural networks, support vector machines, classification trees and boosting---the first comprehensive treatment of this topic in any book. This major new edition features many topics not covered in the original, including graphical models, random forests, ensemble methods, least angle regression and path algorithms for the lasso, non-negative matrix factorization, and spectral clustering. There is also a chapter on methods for ``wide'' data (p bigger than n), including multiple testing and false discovery rates. Trevor Hastie, Robert Tibshirani, and Jerome Friedman are professors of statistics at Stanford University. They are prominent researchers in this area: Hastie and Tibshirani developed generalized additive models and wrote a popular book of that title. Hastie co-developed much of the statistical modeling software and environment in R/S-PLUS and invented principal curves and surfaces. Tibshirani proposed the lasso and is co-author of the very successful An Introduction to the Bootstrap. Friedman is the co-inventor of many data-mining tools including CART, MARS, projection pursuit and gradient boosting.

19,261 citations

Journal ArticleDOI
TL;DR: The Elements of Statistical Learning: Data Mining, Inference, and Prediction as discussed by the authors is a popular book for data mining and machine learning, focusing on data mining, inference, and prediction.
Abstract: (2004). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Journal of the American Statistical Association: Vol. 99, No. 466, pp. 567-567.

10,549 citations

Journal ArticleDOI
TL;DR: This approach should enhance the ability to use microarray data to elucidate functional mechanisms that underlie cellular processes and to identify molecular targets of pharmacological compounds in mammalian cellular networks.
Abstract: Elucidating gene regulatory networks is crucial for understanding normal cell physiology and complex pathologic phenotypes. Existing computational methods for the genome-wide "reverse engineering" of such networks have been successful only for lower eukaryotes with simple genomes. Here we present ARACNE, a novel algorithm, using microarray expression profiles, specifically designed to scale up to the complexity of regulatory networks in mammalian cells, yet general enough to address a wider range of network deconvolution problems. This method uses an information theoretic approach to eliminate the majority of indirect interactions inferred by co-expression methods. We prove that ARACNE reconstructs the network exactly (asymptotically) if the effect of loops in the network topology is negligible, and we show that the algorithm works well in practice, even in the presence of numerous loops and complex topologies. We assess ARACNE's ability to reconstruct transcriptional regulatory networks using both a realistic synthetic dataset and a microarray dataset from human B cells. On synthetic datasets ARACNE achieves very low error rates and outperforms established methods, such as Relevance Networks and Bayesian Networks. Application to the deconvolution of genetic networks in human B cells demonstrates ARACNE's ability to infer validated transcriptional targets of the cMYC proto-oncogene. We also study the effects of misestimation of mutual information on network reconstruction, and show that algorithms based on mutual information ranking are more resilient to estimation errors. ARACNE shows promise in identifying direct transcriptional interactions in mammalian cellular networks, a problem that has challenged existing reverse engineering algorithms. This approach should enhance our ability to use microarray data to elucidate functional mechanisms that underlie cellular processes and to identify molecular targets of pharmacological compounds in mammalian cellular networks.

2,533 citations


"Classification-based Inference of D..." refers background in this paper

  • ...GRNs can be reverse engineered from gene expression data sampled in time, in different cell types, or in different mutant backgrounds with a variety of statistical approaches [20, 26, 33]....

    [...]

  • ...Among higher-throughput statistical approaches for the inference of GRNs, some, such as ARACNe [26] and Module Networks [33], are limited to the inference, do not infer causality, and are incapable of predicting gene expression levels....

    [...]

Journal ArticleDOI
TL;DR: The procedure identifies modules of coregulated genes, their regulators and the conditions under which regulation occurs, generating testable hypotheses in the form 'regulator X regulates module Y under conditions W'.
Abstract: Much of a cell’s activity is organized as a network of interacting modules: sets of genes coregulated to respond to different conditions. We present a probabilistic method for identifying regulatory modules from gene expression data. Our procedure identifies modules of coregulated genes, their regulators and the conditions under which regulation occurs, generating testable hypotheses in the form ‘regulator X regulates module Y under conditions W’. We applied the method to a Saccharomyces cerevisiae expression data set, showing its ability to identify functionally coherent modules and their correct regulators. We present microarray experiments supporting three novel predictions, suggesting regulatory roles for previously uncharacterized proteins.

1,820 citations


"Classification-based Inference of D..." refers background in this paper

  • ...GRNs can be reverse engineered from gene expression data sampled in time, in different cell types, or in different mutant backgrounds with a variety of statistical approaches [20, 26, 33]....

    [...]

  • ...Among higher-throughput statistical approaches for the inference of GRNs, some, such as ARACNe [26] and Module Networks [33], are limited to the inference, do not infer causality, and are incapable of predicting gene expression levels....

    [...]