scispace - formally typeset
Search or ask a question
Proceedings Article

Reveal, a general reverse engineering algorithm for inference of genetic network architectures

01 Jan 1998-Vol. 3, pp 18-29
TL;DR: This study investigates the possibility of completely infer a complex regulatory network architecture from input/output patterns of its variables using binary models of genetic networks, and finds the problem to be tractable within the conditions tested so far.
Abstract: Given the immanent gene expression mapping covering whole genomes during development, health and disease, we seek computational methods to maximize functional inference from such large data sets. Is it possible, in principle, to completely infer a complex regulatory network architecture from input/output patterns of its variables? We investigated this possibility using binary models of genetic networks. Trajectories, or state transition tables of Boolean nets, resemble time series of gene expression. By systematically analyzing the mutual information between input states and output states, one is able to infer the sets of input elements controlling each element or gene in the network. This process is unequivocal and exact for complete state transition tables. We implemented this REVerse Engineering ALgorithm (REVEAL) in a C program, and found the problem to be tractable within the conditions tested so far. For n = 50 (elements) and k = 3 (inputs per element), the analysis of incomplete state transition tables (100 state transition pairs out of a possible 10(exp 15)) reliably produced the original rule and wiring sets. While this study is limited to synchronous Boolean networks, the algorithm is generalizable to include multi-state models, essentially allowing direct application to realistic biological data sets. The ability to adequately solve the inverse problem may enable in-depth analysis of complex dynamic systems in biology and other fields.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: This paper reviews formalisms that have been employed in mathematical biology and bioinformatics to describe genetic regulatory systems, in particular directed graphs, Bayesian networks, Boolean networks and their generalizations, ordinary and partial differential equations, qualitative differential equation, stochastic equations, and so on.
Abstract: The spatiotemporal expression of genes in an organism is determined by regulatory systems that involve a large number of genes connected through a complex network of interactions. As an intuitive understanding of the behavior of these systems is hard to obtain, computer tools for the modeling and simulation of genetic regulatory networks will be indispensable. This report reviews formalisms that have been employed in mathematical biology and bioinformatics to describe genetic regulatory systems, in particular directed graphs, Bayesian networks, ordinary and partial differential equations, stochastic equations, Boolean networks and their generalizations, qualitative differential equations, and rule-based formalisms. In addition, the report discusses how these formalisms have been used in the modeling and simulation of regulatory systems.

2,739 citations


Cites methods from "Reveal, a general reverse engineeri..."

  • ...Boolean networks were among the rst formalisms for which model induction methods were proposed, the REVEAL algorithm developed by Liang et al. (1998) being an example (see Akutsu et al. [1998a, 1998b, 1999], Ideker et al. [2000], Karp et al. [1999b], Maki et al. [2001], Noda et al. [1998] for other…...

    [...]

Journal ArticleDOI
TL;DR: Probabilistic Boolean Networks (PBN) are introduced that share the appealing rule-based properties of Boolean networks, but are robust in the face of uncertainty.
Abstract: Motivation: Our goal is to construct a model for genetic regulatory networks such that the model class: (i) incorporates rule-based dependencies between genes; (ii) allows the systematic study of global network dynamics; (iii) is able to cope with uncertainty, both in the data and the model selection; and (iv) permits the quantification of the relative influence and sensitivity of genes in their interactions with other genes. Results: We introduce Probabilistic Boolean Networks (PBN) that share the appealing rule-based properties of Boolean networks, but are robust in the face of uncertainty. We show how the dynamics of these networks can be studied in the probabilistic context of Markov chains, with standard Boolean networks being special cases. Then, we discuss the relationship between PBNs and Bayesian networks—a family of graphical models that explicitly represent probabilistic relationships between variables. We show how probabilistic dependencies between a gene and its parent genes, constituting the basic building blocks of Bayesian networks, can be obtained from PBNs. Finally, we present methods for quantifying the influence of genes on other genes, within the context of PBNs. Examples illustrating the above concepts are presented throughout the paper.

1,571 citations


Cites background from "Reveal, a general reverse engineeri..."

  • ...To that end, much recent work has gone into identifying the structure of gene regulatory networks from expression data (Liang et al., 1998; Akutsu et al., 1998, 1999; D’Haeseleer et al., 2000; Akutsu et al., 2000; Shmulevich et al., 2001)....

    [...]

  • ...Conversely, Boolean models encode rules of genetic regulation, are inherently dynamic, and lend themselves to tractable inference (Shmulevich et al., 2001; Liang et al., 1998; Akutsu et al., 1998, 1999)....

    [...]

Journal ArticleDOI
TL;DR: A novel mathematical and bioinformatics framework to construct ecological association networks named molecular ecological networks (MENs) through Random Matrix Theory (RMT)-based methods is described, which provides powerful tools to elucidate network interactions in microbial communities and their responses to environmental changes.
Abstract: Background: Understanding the interaction among different species within a community and their responses to environmental changes is a central goal in ecology. However, defining the network structure in a microbial community is very challenging due to their extremely high diversity and as-yet uncultivated status. Although recent advance of metagenomic technologies, such as high throughout sequencing and functional gene arrays, provide revolutionary tools for analyzing microbial community structure, it is still difficult to examine network interactions in a microbial community based on high-throughput metagenomics data. Results: Here, we describe a novel mathematical and bioinformatics framework to construct ecological association networks named molecular ecological networks (MENs) through Random Matrix Theory (RMT)-based methods. Compared to other network construction methods, this approach is remarkable in that the network is automatically defined and robust to noise, thus providing excellent solutions to several common issues associated with highthroughput metagenomics data. We applied it to determine the network structure of microbial communities subjected to long-term experimental warming based on pyrosequencing data of 16 S rRNA genes. We showed that the constructed MENs under both warming and unwarming conditions exhibited topological features of scale free, small world and modularity, which were consistent with previously described molecular ecological networks. Eigengene analysis indicated that the eigengenes represented the module profiles relatively well. In consistency with many other studies, several major environmental traits including temperature and soil pH were found to be important in determining network interactions in the microbial communities examined. To facilitate its application by the scientific community, all these methods and statistical tools have been integrated into a comprehensive Molecular Ecological Network Analysis Pipeline (MENAP), which is open-accessible now (http://ieg2.ou.edu/MENA). Conclusions: The RMT-based molecular ecological network analysis provides powerful tools to elucidate network interactions in microbial communities and their responses to environmental changes, which are fundamentally important for research in microbial ecology and environmental microbiology.

1,568 citations

Journal ArticleDOI
TL;DR: It is demonstrated that when correction is made for the selection bias, the cross-validated error is no longer zero for a subset of only a few genes.
Abstract: In the context of cancer diagnosis and treatment, we consider the problem of constructing an accurate prediction rule on the basis of a relatively small number of tumor tissue samples of known type containing the expression data on very many (possibly thousands) genes. Recently, results have been presented in the literature suggesting that it is possible to construct a prediction rule from only a few genes such that it has a negligible prediction error rate. However, in these results the test error or the leave-one-out cross-validated error is calculated without allowance for the selection bias. There is no allowance because the rule is either tested on tissue samples that were used in the first instance to select the genes being used in the rule or because the cross-validation of the rule is not external to the selection process; that is, gene selection is not performed in training the rule at each stage of the cross-validation process. We describe how in practice the selection bias can be assessed and corrected for by either performing a cross-validation or applying the bootstrap external to the selection process. We recommend using 10-fold rather than leave-one-out cross-validation, and concerning the bootstrap, we suggest using the so-called .632+ bootstrap error estimate designed to handle overfitted prediction rules. Using two published data sets, we demonstrate that when correction is made for the selection bias, the cross-validated error is no longer zero for a subset of only a few genes.

1,387 citations

01 Jul 2012
TL;DR: A comprehensive blind assessment of over 30 network inference methods on Escherichia coli, Staphylococcus aureus, Saccharomyces cerevisiae and in silico microarray data defines the performance, data requirements and inherent biases of different inference approaches, and provides guidelines for algorithm application and development.
Abstract: Reconstructing gene regulatory networks from high-throughput data is a long-standing challenge. Through the Dialogue on Reverse Engineering Assessment and Methods (DREAM) project, we performed a comprehensive blind assessment of over 30 network inference methods on Escherichia coli, Staphylococcus aureus, Saccharomyces cerevisiae and in silico microarray data. We characterize the performance, data requirements and inherent biases of different inference approaches, and we provide guidelines for algorithm application and development. We observed that no single inference method performs optimally across all data sets. In contrast, integration of predictions from multiple inference methods shows robust and high performance across diverse data sets. We thereby constructed high-confidence networks for E. coli and S. aureus, each comprising ∼1,700 transcriptional interactions at a precision of ∼50%. We experimentally tested 53 previously unobserved regulatory interactions in E. coli, of which 23 (43%) were supported. Our results establish community-based methods as a powerful and robust tool for the inference of transcriptional gene regulatory networks.

1,355 citations

References
More filters
Journal ArticleDOI
TL;DR: The theory of communication is extended to include a number of new factors, in particular the effect of noise in the channel, and the savings possible due to the statistical structure of the original message anddue to the nature of the final destination of the information.
Abstract: HE recent development of various methods of modulation such as PCM and PPM which exchange bandwidth for signal-to-noise ratio has intensified the interest in a general theory of communication. A basis for such a theory is contained in the important papers of Nyquist1 and Hartley2 on this subject. In the present paper we will extend the theory to include a number of new factors, in particular the effect of noise in the channel, and the savings possible due to the statistical structure of the original message and due to the nature of the final destination of the information. The fundamental problem of communication is that of reproducing at one point either exactly or approximately a message selected at another point. Frequently the messages have meaning; that is they refer to or are correlated according to some system with certain physical or conceptual entities. These semantic aspects of communication are irrelevant to the engineering problem. The significant aspect is that the actual message is one selected from a set of possible messages. The system must be designed to operate for each possible selection, not just the one which will actually be chosen since this is unknown at the time of design. If the number of messages in the set is finite then this number or any monotonic function of this number can be regarded as a measure of the information produced when one message is chosen from the set, all choices being equally likely. As was pointed out by Hartley the most natural choice is the logarithmic function. Although this definition must be generalized considerably when we consider the influence of the statistics of the message and when we have a continuous range of messages, we will in all cases use an essentially logarithmic measure. The logarithmic measure is more convenient for various reasons:

10,281 citations

Book
01 Jan 1993
TL;DR: The structure of rugged fitness landscapes and the structure of adaptive landscapes underlying protein evolution, and the architecture of genetic regulatory circuits and its evolution.
Abstract: 1. Conceptual outline of current evolutionary theory PART I: ADAPTATION ON THE EDGE OF CHAOS 2. The structure of rugged fitness landscapes 3. Biological implications of rugged fitness landscapes 4. The structure of adaptive landscapes underlying protein evolution 5. Self organization and adaptation in complex systems 6. Coevolving complex systems PART II: THE CRYSTALLIZATION OF LIFE 7. The origins of life: a new view 8. The origin of a connected metabolism 9. Autocatalytic polynucleotide systems: hypercycles, spin glasses and coding 10. Random grammars PART III: ORDER AND ONTOGENY 11. The architecture of genetic regulatory circuits and its evolution 12. Differentiation: the dynamical behaviors of genetic regulatory networks 13. Selection for gene expression in cell type 14. Morphology, maps and the spatial ordering of integrated tissues

7,835 citations


"Reveal, a general reverse engineeri..." refers background in this paper

  • ...…as the “target area” of an organism, e.g. cell types at the end of development, repaired tissue following a response to injury, or even adaptation of metabolic gene expression following a change in nutrient environment in bacteria (see Kauffman, 1993; Somogyi and Sniegoski, 1996; Wuensche, 1992)....

    [...]

Journal ArticleDOI
TL;DR: An introduction to Boolean networks and their relevance to present-day experimental research is provided, bringing us closer to an understanding of complex molecular physiological processes like brain development and intractable medical problems of immediate importance.
Abstract: Molecular genetics presents an increasingly complex picture of the genome and biological function. Evidence is mounting for distributed function, redundancy, and combinatorial coding in the regulation of genes. Satisfactory explanation will require the concept of a parallel processing signaling network. Here we provide an introduction to Boolean networks and their relevance to present-day experimental research. Boolean network models exhibit global complex behavior, self-organization, stability, redundancy and periodicity, properties that deeply characterize biological systems. While the life sciences must inevitably face the issue of complexity, we may well look to cybernetics for a modeling language such as Boolean networks which can manageably describe parallel processing biological systems and provide a framework for the growing accumulation of data. We finally discuss experimental strategies and database systems that will enable mapping of genetic networks. The synthesis of these approaches holds an immense potential for new discoveries on the intimate nature of genetic networks, bringing us closer to an understanding of complex molecular physiological processes like brain development, and intractable medical problems of immediate importance, such as neurodegenerative disorders, cancer, and a variety of genetic diseases.

365 citations


"Reveal, a general reverse engineeri..." refers background in this paper

  • ...Effectively, genes turn each other on and off within a proximal genetic network of transcriptional regulators (Somogyi and Sniegoski, 1996)....

    [...]

  • ...All in all, the information stored in the DNA determines the dynamics of the extended genetic network, the state of which at a particular time point should be reflected in gene expression patterns (Somogyi and Sniegoski, 1996)....

    [...]

  • ...cell types at the end of development, repaired tissue following a response to injury, or even adaptation of metabolic gene expression following a change in nutrient environment in bacteria (see Kauffman, 1993; Somogyi and Sniegoski, 1996; Wuensche, 1992)....

    [...]

  • ...…as the “target area” of an organism, e.g. cell types at the end of development, repaired tissue following a response to injury, or even adaptation of metabolic gene expression following a change in nutrient environment in bacteria (see Kauffman, 1993; Somogyi and Sniegoski, 1996; Wuensche, 1992)....

    [...]

Journal ArticleDOI
TL;DR: Techniques are given to classify biological networks into classes having similar qualitative dynamics, illustrated by considering dynamic data from a number of biological systems and showing how the deep structure of each system can be determined.

256 citations


"Reveal, a general reverse engineeri..." refers background in this paper

  • ...This behavior can be approximated by asynchronous Boolean networks (reviewed in Thieffry & Thomas, 1998), or continuous differential equations that capture the structure of logical switching networks (Glass, 1975)....

    [...]

Proceedings Article
01 Jan 1998
TL;DR: This work presents a strategy for the analysis for large-scale quantitative gene expression measurement data from time course experiments that takes advantage of cluster analysis and graphical visualization methods to reveal correlated patterns of gene expression from time series data.
Abstract: The discovery of any new gene requires an analysis of the expression context for that gene. Now that the cDNA and genomic sequencing projects are progressing at such a rapid rate, high throughput gene expression screening approaches are beginning to appear to take advantage of that data. We present a strategy for the analysis for large-scale quantitative gene expression measurement data from time course experiments. Our approach takes advantage of cluster analysis and graphical visualization methods to reveal correlated patterns of gene expression from time series data. The coherence of these patterns suggests an order that conforms to a notion of shared pathways and control processes that can be experimentally verified.

221 citations


"Reveal, a general reverse engineeri..." refers methods in this paper

  • ...For example, integration of cluster analysis for the inference of shared inputs (currently applied to continuous, large scale gene expression data sets; see Michaels et al., 1998) could quickly identify wiring constraints and simplify the overall inference process....

    [...]