scispace - formally typeset
Search or ask a question

Showing papers in "BMC Systems Biology in 2011"


Journal ArticleDOI
TL;DR: It is demonstrated for the first time that high partial correlation coefficients generally correspond to known metabolic reactions, and provide a valuable tool for the unbiased reconstruction of metabolic reactions from large-scale metabolomics data sets.
Abstract: With the advent of high-throughput targeted metabolic profiling techniques, the question of how to interpret and analyze the resulting vast amount of data becomes more and more important. In this work we address the reconstruction of metabolic reactions from cross-sectional metabolomics data, that is without the requirement for time-resolved measurements or specific system perturbations. Previous studies in this area mainly focused on Pearson correlation coefficients, which however are generally incapable of distinguishing between direct and indirect metabolic interactions. In our new approach we propose the application of a Gaussian graphical model (GGM), an undirected probabilistic graphical model estimating the conditional dependence between variables. GGMs are based on partial correlation coefficients, that is pairwise Pearson correlation coefficients conditioned against the correlation with all other metabolites. We first demonstrate the general validity of the method and its advantages over regular correlation networks with computer-simulated reaction systems. Then we estimate a GGM on data from a large human population cohort, covering 1020 fasting blood serum samples with 151 quantified metabolites. The GGM is much sparser than the correlation network, shows a modular structure with respect to metabolite classes, and is stable to the choice of samples in the data set. On the example of human fatty acid metabolism, we demonstrate for the first time that high partial correlation coefficients generally correspond to known metabolic reactions. This feature is evaluated both manually by investigating specific pairs of high-scoring metabolites, and then systematically on a literature-curated model of fatty acid synthesis and degradation. Our method detects many known reactions along with possibly novel pathway interactions, representing candidates for further experimental examination. In summary, we demonstrate strong signatures of intracellular pathways in blood serum data, and provide a valuable tool for the unbiased reconstruction of metabolic reactions from large-scale metabolomics data sets.

312 citations


Journal ArticleDOI
TL;DR: The network target-based approaches may adjust current virtual screen mode and provide a systematic paradigm for facilitating the development of multicomponent therapeutics as well as the modernization of TCM.
Abstract: Background Multicomponent therapeutics offer bright prospects for the control of complex diseases in a synergistic manner. However, finding ways to screen the synergistic combinations from numerous pharmacological agents is still an ongoing challenge.

304 citations


Journal ArticleDOI
TL;DR: With SED-ML, software can exchange simulation experiment descriptions, enabling the validation and reuse of simulation experiments in different tools, and Authors of papers reporting simulation experiments can make their simulation protocols available for other scientists to reproduce the results.
Abstract: The increasing use of computational simulation experiments to inform modern biological research creates new challenges to annotate, archive, share and reproduce such experiments. The recently published Minimum Information About a Simulation Experiment (MIASE) proposes a minimal set of information that should be provided to allow the reproduction of simulation experiments among users and software tools. In this article, we present the Simulation Experiment Description Markup Language (SED-ML). SED-ML encodes in a computer-readable exchange format the information required by MIASE to enable reproduction of simulation experiments. It has been developed as a community project and it is defined in a detailed technical specification and additionally provides an XML schema. The version of SED-ML described in this publication is Level 1 Version 1. It covers the description of the most frequent type of simulation experiments in the area, namely time course simulations. SED-ML documents specify which models to use in an experiment, modifications to apply on the models before using them, which simulation procedures to run on each model, what analysis results to output, and how the results should be presented. These descriptions are independent of the underlying model implementation. SED-ML is a software-independent format for encoding the description of simulation experiments; it is not specific to particular simulation tools. Here, we demonstrate that with the growing software support for SED-ML we can effectively exchange executable simulation descriptions. With SED-ML, software can exchange simulation experiment descriptions, enabling the validation and reuse of simulation experiments in different tools. Authors of papers reporting simulation experiments can make their simulation protocols available for other scientists to reproduce the results. Because SED-ML is agnostic about exact modeling language(s) used, experiments covering models from different fields of research can be accurately described and combined.

243 citations


Journal ArticleDOI
TL;DR: The extended LSA analysis technique was demonstrated to reveal statistically significant local and potentially time-delayed association patterns in replicated time series data beyond that of ordinary correlation analysis.
Abstract: Background The increasing availability of time series microbial community data from metagenomics and other molecular biological studies has enabled the analysis of large-scale microbial co-occurrence and association networks. Among the many analytical techniques available, the Local Similarity Analysis (LSA) method is unique in that it captures local and potentially time-delayed co-occurrence and association patterns in time series data that cannot otherwise be identified by ordinary correlation analysis. However LSA, as originally developed, does not consider time series data with replicates, which hinders the full exploitation of available information. With replicates, it is possible to understand the variability of local similarity (LS) score and to obtain its confidence interval.

207 citations


Journal ArticleDOI
TL;DR: A novel multi-tissue type modeling approach was developed to integrate the metabolic functions for the three cell types, and subsequently used to simulate known integrated metabolic cycles to study intercellular interactions.
Abstract: Genome-scale metabolic reconstructions provide a biologically meaningful mechanistic basis for the genotype-phenotype relationship. The global human metabolic network, termed Recon 1, has recently been reconstructed allowing the systems analysis of human metabolic physiology and pathology. Utilizing high-throughput data, Recon 1 has recently been tailored to different cells and tissues, including the liver, kidney, brain, and alveolar macrophage. These models have shown utility in the study of systems medicine. However, no integrated analysis between human tissues has been done. To describe tissue-specific functions, Recon 1 was tailored to describe metabolism in three human cells: adipocytes, hepatocytes, and myocytes. These cell-specific networks were manually curated and validated based on known cellular metabolic functions. To study intercellular interactions, a novel multi-tissue type modeling approach was developed to integrate the metabolic functions for the three cell types, and subsequently used to simulate known integrated metabolic cycles. In addition, the multi-tissue model was used to study diabetes: a pathology with systemic properties. High-throughput data was integrated with the network to determine differential metabolic activity between obese and type II obese gastric bypass patients in a whole-body context. The multi-tissue type modeling approach presented provides a platform to study integrated metabolic states. As more cell and tissue-specific models are released, it is critical to develop a framework in which to study their interdependencies.

179 citations


Journal ArticleDOI
TL;DR: Rule-based languages are a suitable starting point for developing a concise and compact language for multi-level modeling of cell biological systems and the combination of nesting species, assigning attributes, and constraining reactions according to these attributes is crucial in achieving the desired expressiveness.
Abstract: Proteins, individual cells, and cell populations denote different levels of an organizational hierarchy, each of which with its own dynamics. Multi-level modeling is concerned with describing a system at these different levels and relating their dynamics. Rule-based modeling has increasingly attracted attention due to enabling a concise and compact description of biochemical systems. In addition, it allows different methods for model analysis, since more than one semantics can be defined for the same syntax. Multi-level modeling implies the hierarchical nesting of model entities and explicit support for downward and upward causation between different levels. Concepts to support multi-level modeling in a rule-based language are identified. To those belong rule schemata, hierarchical nesting of species, assigning attributes and solutions to species at each level and preserving content of nested species while applying rules. Further necessities are the ability to apply rules and flexibly define reaction rate kinetics and constraints on nested species as well as species that are nested within others. An example model is presented that analyses the interplay of an intracellular control circuit with states at cell level, its relation to cell division, and connections to intercellular communication within a population of cells. The example is described in ML-Rules - a rule-based multi-level approach that has been realized within the plug-in-based modeling and simulation framework JAMES II. Rule-based languages are a suitable starting point for developing a concise and compact language for multi-level modeling of cell biological systems. The combination of nesting species, assigning attributes, and constraining reactions according to these attributes is crucial in achieving the desired expressiveness. Rule schemata allow a concise and compact description of complex models. As a result, the presented approach facilitates developing and maintaining multi-level models that, for instance, interrelate intracellular and intercellular dynamics.

161 citations


Journal ArticleDOI
TL;DR: This work describes a community-driven effort, in which more than 20 experts in S. Typhimurium biology and systems biology collaborated to reconcile and expand the S.Typhonium BiGG knowledge-base, and uses the consensus MR to identify potential multi-target drug therapy approaches.
Abstract: Background: Metabolic reconstructions (MRs) are common denominators in systems biology and represent biochemical, genetic, and genomic (BiGG) knowledge-bases for target organisms by capturing currently available information in a consistent, structured manner. Salmonella enterica subspecies I serovar Typhimurium is a human pathogen, causes various diseases and its increasing antibiotic resistance poses a public health problem. Results: Here, we describe a community-driven effort, in which more than 20 experts in S. Typhimurium biology and systems biology collaborated to reconcile and expand the S. Typhimurium BiGG knowledge-base. The consensus MR was obtained starting from two independently developed MRs for S. Typhimurium. Key results of this reconstruction jamboree include i) development and implementation of a community-based workflow for MR annotation and reconciliation; ii) incorporation of thermodynamic information; and iii) use of the consensus MR to identify potential multi-target drug therapy approaches. Conclusion: Taken together, with the growing number of parallel MRs a structured, community-driven approach will be necessary to maximize quality while increasing adoption of MRs in experimental design and interpretation.

145 citations


Journal ArticleDOI
TL;DR: It is demonstrated that the graph-clustering approach identifies tissue- and/or genotype-dependent metabolomic clusters related to the biochemical pathway and that the obtained clusters were significantly enriched for metabolites included in biochemical pathways.
Abstract: Deciphering the metabolome is essential for a better understanding of the cellular metabolism as a system. Typical metabolomics data show a few but significant correlations among metabolite levels when data sampling is repeated across individuals grown under strictly controlled conditions. Although several studies have assessed topologies in metabolomic correlation networks, it remains unclear whether highly connected metabolites in these networks have specific functions in known tissue- and/or genotype-dependent biochemical pathways. In our study of metabolite profiles we subjected root tissues to gas chromatography-time-of-flight/mass spectrometry (GC-TOF/MS) and used published information on the aerial parts of 3 Arabidopsis genotypes, Col-0 wild-type, methionine over-accumulation 1 (mto1), and transparent testa4 (tt4) to compare systematically the metabolomic correlations in samples of roots and aerial parts. We then applied graph clustering to the constructed correlation networks to extract densely connected metabolites and evaluated the clusters by biochemical-pathway enrichment analysis. We found that the number of significant correlations varied by tissue and genotype and that the obtained clusters were significantly enriched for metabolites included in biochemical pathways. We demonstrate that the graph-clustering approach identifies tissue- and/or genotype-dependent metabolomic clusters related to the biochemical pathway. Metabolomic correlations complement information about changes in mean metabolite levels and may help to elucidate the organization of metabolically functional modules.

137 citations


Journal ArticleDOI
TL;DR: A comprehensive transcriptional analysis of individual carotenoid and isoprenoid-related biosynthesis pathway genes was performed in order to elucidate the role of transcriptional regulation in the coordinated synthesis of these compounds and to identify regulatory components that may mediate this process in Arabidopsis thaliana.
Abstract: The carotenoids are pure isoprenoids that are essential components of the photosynthetic apparatus and are coordinately synthesized with chlorophylls in chloroplasts. However, little is known about the mechanisms that regulate carotenoid biosynthesis or the mechanisms that coordinate this synthesis with that of chlorophylls and other plastidial synthesized isoprenoid-derived compounds, including quinones, gibberellic acid and abscisic acid. Here, a comprehensive transcriptional analysis of individual carotenoid and isoprenoid-related biosynthesis pathway genes was performed in order to elucidate the role of transcriptional regulation in the coordinated synthesis of these compounds and to identify regulatory components that may mediate this process in Arabidopsis thaliana. A global microarray expression correlation analysis revealed that the phytoene synthase gene, which encodes the first dedicated and rate-limiting enzyme of carotenogenesis, is highly co-expressed with many photosynthesis-related genes including many isoprenoid-related biosynthesis pathway genes. Chemical and mutant analysis revealed that induction of the co-expressed genes following germination was dependent on gibberellic acid and brassinosteroids (BR) but was inhibited by abscisic acid (ABA). Mutant analyses further revealed that expression of many of the genes is suppressed in dark grown plants by Phytochrome Interacting transcription Factors (PIFs) and activated by photoactivated phytochromes, which in turn degrade PIFs and mediate a coordinated induction of the genes. The promoters of PSY and the co-expressed genes were found to contain an enrichment in putative BR-auxin response elements and G-boxes, which bind PIFs, further supporting a role for BRs and PIFs in regulating expression of the genes. In osmotically stressed root tissue, transcription of Calvin cycle, methylerythritol 4-phosphate pathway and carotenoid biosynthesis genes is induced and uncoupled from that of chlorophyll biosynthesis genes in a manner that is consistent with the increased synthesis of carotenoid precursors for ABA biosynthesis. In all tissues examined, induction of β-carotene hydroxylase transcript levels are linked to an increased demand for ABA. This analysis provides compelling evidence to suggest that coordinated transcriptional regulation of isoprenoid-related biosynthesis pathway genes plays a major role in coordinating the synthesis of functionally related chloroplast localized isoprenoid-derived compounds.

132 citations


Journal ArticleDOI
TL;DR: This quantitative map of the epigenetic landscape underlying cell fate choice provides mechanistic insights into the "forces" that direct cellular differentiation in the context of physiological development, as well as during artificially induced cell lineage reprogramming.
Abstract: Background The image of the "epigenetic landscape", with a series of branching valleys and ridges depicting stable cellular states and the barriers between those states, has been a popular visual metaphor for cell lineage specification - especially in light of the recent discovery that terminally differentiated adult cells can be reprogrammed into pluripotent stem cells or into alternative cell lineages. However the question of whether the epigenetic landscape can be mapped out quantitatively to provide a predictive model of cellular differentiation remains largely unanswered.

130 citations


Journal ArticleDOI
TL;DR: This work suggests that the metabolism of the bacterium has evolved both structurally and functionally to an efficient but transitory utilization of methanol, and provides a basis for metabolic engineering to convert meethanol into value-added products.
Abstract: Background Methylotrophic microorganisms are playing a key role in biogeochemical processes - especially the global carbon cycle - and have gained interest for biotechnological purposes Significant progress was made in the recent years in the biochemistry, genetics, genomics, and physiology of methylotrophic bacteria, showing that methylotrophy is much more widespread and versatile than initially assumed Despite such progress, system-level description of the methylotrophic metabolism is currently lacking, and much remains to understand regarding the network-scale organization and properties of methylotrophy, and how the methylotrophic capacity emerges from this organization, especially in facultative organisms

Journal ArticleDOI
TL;DR: This work presents a unified framework that integrates diverse techniques involved in the design of heterologous biosynthetic pathways through a retrosynthetic approach in the reaction signature space and enables the flexible design of industrial microorganisms for the efficient on-demand production of chemical compounds with therapeutic applications.
Abstract: Synthetic biology is used to develop cell factories for production of chemicals by constructively importing heterologous pathways into industrial microorganisms. In this work we present a retrosynthetic approach to the production of therapeutics with the goal of developing an in situ drug delivery device in host cells. Retrosynthesis, a concept originally proposed for synthetic chemistry, iteratively applies reversed chemical transformations (reversed enzyme-catalyzed reactions in the metabolic space) starting from a target product to reach precursors that are endogenous to the chassis. So far, a wider adoption of retrosynthesis into the manufacturing pipeline has been hindered by the complexity of enumerating all feasible biosynthetic pathways for a given compound. In our method, we efficiently address the complexity problem by coding substrates, products and reactions into molecular signatures. Metabolic maps are represented using hypergraphs and the complexity is controlled by varying the specificity of the molecular signature. Furthermore, our method enables candidate pathways to be ranked to determine which ones are best to engineer. The proposed ranking function can integrate data from different sources such as host compatibility for inserted genes, the estimation of steady-state fluxes from the genome-wide reconstruction of the organism's metabolism, or the estimation of metabolite toxicity from experimental assays. We use several machine-learning tools in order to estimate enzyme activity and reaction efficiency at each step of the identified pathways. Examples of production in bacteria and yeast for two antibiotics and for one antitumor agent, as well as for several essential metabolites are outlined. We present here a unified framework that integrates diverse techniques involved in the design of heterologous biosynthetic pathways through a retrosynthetic approach in the reaction signature space. Our engineering methodology enables the flexible design of industrial microorganisms for the efficient on-demand production of chemical compounds with therapeutic applications.

Journal ArticleDOI
TL;DR: The first genome-scale metabolic model for C. beijerinckii was presented in this paper, containing 925 genes, 938 reactions, and 881 metabolites.
Abstract: Solventogenic clostridia offer a sustainable alternative to petroleum-based production of butanol--an important chemical feedstock and potential fuel additive or replacement. C. beijerinckii is an attractive microorganism for strain design to improve butanol production because it (i) naturally produces the highest recorded butanol concentrations as a byproduct of fermentation; and (ii) can co-ferment pentose and hexose sugars (the primary products from lignocellulosic hydrolysis). Interrogating C. beijerinckii metabolism from a systems viewpoint using constraint-based modeling allows for simulation of the global effect of genetic modifications. We present the first genome-scale metabolic model (i CM925) for C. beijerinckii, containing 925 genes, 938 reactions, and 881 metabolites. To build the model we employed a semi-automated procedure that integrated genome annotation information from KEGG, BioCyc, and The SEED, and utilized computational algorithms with manual curation to improve model completeness. Interestingly, we found only a 34% overlap in reactions collected from the three databases--highlighting the importance of evaluating the predictive accuracy of the resulting genome-scale model. To validate i CM925, we conducted fermentation experiments using the NCIMB 8052 strain, and evaluated the ability of the model to simulate measured substrate uptake and product production rates. Experimentally observed fermentation profiles were found to lie within the solution space of the model; however, under an optimal growth objective, additional constraints were needed to reproduce the observed profiles--suggesting the existence of selective pressures other than optimal growth. Notably, a significantly enriched fraction of actively utilized reactions in simulations--constrained to reflect experimental rates--originated from the set of reactions that overlapped between all three databases (P = 3.52 × 10-9, Fisher's exact test). Inhibition of the hydrogenase reaction was found to have a strong effect on butanol formation--as experimentally observed. Microbial production of butanol by C. beijerinckii offers a promising, sustainable, method for generation of this important chemical and potential biofuel. i CM925 is a predictive model that can accurately reproduce physiological behavior and provide insight into the underlying mechanisms of microbial butanol production. As such, the model will be instrumental in efforts to better understand, and metabolically engineer, this microorganism for improved butanol production.

Journal ArticleDOI
TL;DR: The TIGER package provides a consistent platform for algorithm development and extending existing genome-scale metabolic models with regulatory networks and high-throughput data and converts a series of generalized, Boolean or multilevel rules into a set of mixed integer inequalities.
Abstract: Background Several methods have been developed for analyzing genome-scale models of metabolism and transcriptional regulation. Many of these methods, such as Flux Balance Analysis, use constrained optimization to predict relationships between metabolic flux and the genes that encode and regulate enzyme activity. Recently, mixed integer programming has been used to encode these gene-protein-reaction (GPR) relationships into a single optimization problem, but these techniques are often of limited generality and lack a tool for automating the conversion of rules to a coupled regulatory/metabolic model.

Journal ArticleDOI
TL;DR: Here it is demonstrated that the negative auto-regulation motif in the native arabinose system of Escherichia coli increases the range of arabinOSE signals over which the system can respond, which may contribute to explaining the common occurrence of negativeauto-regulation in biological systems.
Abstract: Gene regulation networks are made of recurring regulatory patterns, called network motifs. One of the most common network motifs is negative auto-regulation, in which a transcription factor represses its own production. Negative auto-regulation has several potential functions: it can shorten the response time (time to reach halfway to steady-state), stabilize expression against noise, and linearize the gene's input-output response curve. This latter function of negative auto-regulation, which increases the range of input signals over which downstream genes respond, has been studied by theory and synthetic gene circuits. Here we ask whether negative auto-regulation preserves this function also in the context of a natural system, where it is embedded within many additional interactions. To address this, we studied the negative auto-regulation motif in the arabinose utilization system of Escherichia coli, in which negative auto-regulation is part of a complex regulatory network. We find that when negative auto-regulation is disrupted by placing the regulator araC under constitutive expression, the input dynamic range of the arabinose system is reduced by 10-fold. The apparent Hill coefficient of the induction curve changes from about n = 1 with negative auto-regulation, to about n = 2 when it is disrupted. We present a mathematical model that describes how negative auto-regulation can increase input dynamic-range, by coupling the transcription factor protein level to the input signal. Here we demonstrate that the negative auto-regulation motif in the native arabinose system of Escherichia coli increases the range of arabinose signals over which the system can respond. In this way, negative auto-regulation may help to increase the input dynamic-range while maintaining the specificity of cooperative regulatory systems. This function may contribute to explaining the common occurrence of negative auto-regulation in biological systems.

Journal ArticleDOI
TL;DR: This study is the first systematic network and pathway analysis of candidate genes in MDD, providing abundant important information about gene interaction and regulation in a major psychiatric disease.
Abstract: Numerous genetic and genomic datasets related to complex diseases have been made available during the last decade. It is now a great challenge to assess such heterogeneous datasets to prioritize disease genes and perform follow up functional analysis and validation. Among complex disease studies, psychiatric disorders such as major depressive disorder (MDD) are especially in need of robust integrative analysis because these diseases are more complex than others, with weak genetic factors at various levels, including genetic markers, transcription (gene expression), epigenetics (methylation), protein, pathways and networks. In this study, we proposed a comprehensive analysis framework at the systems level and demonstrated it in MDD using a set of candidate genes that have recently been prioritized based on multiple lines of evidence including association, linkage, gene expression (both human and animal studies), regulatory pathway, and literature search. In the network analysis, we explored the topological characteristics of these genes in the context of the human interactome and compared them with two other complex diseases. The network topological features indicated that MDD is similar to schizophrenia compared to cancer. In the functional analysis, we performed the gene set enrichment analysis for both Gene Ontology categories and canonical pathways. Moreover, we proposed a unique pathway crosstalk approach to examine the dynamic interactions among biological pathways. Our pathway enrichment and crosstalk analyses revealed two unique pathway interaction modules that were significantly enriched with MDD genes. These two modules are neuro-transmission and immune system related, supporting the neuropathology hypothesis of MDD. Finally, we constructed a MDD-specific subnetwork, which recruited novel candidate genes with association signals from a major MDD GWAS dataset. This study is the first systematic network and pathway analysis of candidate genes in MDD, providing abundant important information about gene interaction and regulation in a major psychiatric disease. The results suggest potential functional components underlying the molecular mechanisms of MDD and, thus, facilitate generation of novel hypotheses in this disease. The systems biology based strategy in this study can be applied to many other complex diseases.

Journal ArticleDOI
TL;DR: A novel algorithm that permits an efficient analysis of high-dimensional, nonconvex, and poorly connected viable spaces characteristic of complex biological circuitry and allows a systematic use of robustness as a tool for model discrimination.
Abstract: A biological system's robustness to mutations and its evolution are influenced by the structure of its viable space, the region of its space of biochemical parameters where it can exert its function. In systems with a large number of biochemical parameters, viable regions with potentially complex geometries fill a tiny fraction of the whole parameter space. This hampers explorations of the viable space based on "brute force" or Gaussian sampling. We here propose a novel algorithm to characterize viable spaces efficiently. The algorithm combines global and local explorations of a parameter space. The global exploration involves an out-of-equilibrium adaptive Metropolis Monte Carlo method aimed at identifying poorly connected viable regions. The local exploration then samples these regions in detail by a method we call multiple ellipsoid-based sampling. Our algorithm explores efficiently nonconvex and poorly connected viable regions of different test-problems. Most importantly, its computational effort scales linearly with the number of dimensions, in contrast to "brute force" sampling that shows an exponential dependence on the number of dimensions. We also apply this algorithm to a simplified model of a biochemical oscillator with positive and negative feedback loops. A detailed characterization of the model's viable space captures well known structural properties of circadian oscillators. Concretely, we find that model topologies with an essential negative feedback loop and a nonessential positive feedback loop provide the most robust fixed period oscillations. Moreover, the connectedness of the model's viable space suggests that biochemical oscillators with varying topologies can evolve from one another. Our algorithm permits an efficient analysis of high-dimensional, nonconvex, and poorly connected viable spaces characteristic of complex biological circuitry. It allows a systematic use of robustness as a tool for model discrimination.

Journal ArticleDOI
TL;DR: It is shown how the structural uniqueness and identifiability of the models can be guaranteed by carefully adding extra constraints, and that these important properties can be checked through appropriate computation methods.
Abstract: The inference of biological networks from high-throughput data has received huge attention during the last decade and can be considered an important problem class in systems biology. However, it has been recognized that reliable network inference remains an unsolved problem. Most authors have identified lack of data and deficiencies in the inference algorithms as the main reasons for this situation. We claim that another major difficulty for solving these inference problems is the frequent lack of uniqueness of many of these networks, especially when prior assumptions have not been taken properly into account. Our contributions aid the distinguishability analysis of chemical reaction network (CRN) models with mass action dynamics. The novel methods are based on linear programming (LP), therefore they allow the efficient analysis of CRNs containing several hundred complexes and reactions. Using these new tools and also previously published ones to obtain the network structure of biological systems from the literature, we find that, often, a unique topology cannot be determined, even if the structure of the corresponding mathematical model is assumed to be known and all dynamical variables are measurable. In other words, certain mechanisms may remain undetected (or they are falsely detected) while the inferred model is fully consistent with the measured data. It is also shown that sparsity enforcing approaches for determining 'true' reaction structures are generally not enough without additional prior information. The inference of biological networks can be an extremely challenging problem even in the utopian case of perfect experimental information. Unfortunately, the practical situation is often more complex than that, since the measurements are typically incomplete, noisy and sometimes dynamically not rich enough, introducing further obstacles to the structure/parameter estimation process. In this paper, we show how the structural uniqueness and identifiability of the models can be guaranteed by carefully adding extra constraints, and that these important properties can be checked through appropriate computation methods.

Journal ArticleDOI
TL;DR: Validation on several signaling networks describing the immune response of mammals to bacteria, guard cell abscisic acid signaling in plants, and T cell receptor signaling shows that this method can effectively uncover the essentiality of components mediating a signal transduction process.
Abstract: Background Understanding how signals propagate through signaling pathways and networks is a central goal in systems biology. Quantitative dynamic models help to achieve this understanding, but are difficult to construct and validate because of the scarcity of known mechanistic details and kinetic parameters. Structural and qualitative analysis is emerging as a feasible and useful alternative for interpreting signal transduction.

Journal ArticleDOI
TL;DR: Networking the Human Infectome and Diseasome unravels the connectivity of viruses to a wide range of diseases and profiled molecular basis of Hepatitis C Virus-induced diseases as well as 38 new candidate genetic predisposition factors involved in type 1 diabetes mellitus.
Abstract: Background: Comprehensive understanding of molecular mechanisms underlying viral infection is a major challenge towards the discovery of new antiviral drugs and susceptibility factors of human diseases. New advances in the field are expected from systems-level modelling and integration of the incessant torrent of high-throughput “-omics” data. Results: Here, we describe the Human Infectome protein interaction Network, a novel systems virology model of a virtual virus-infected human cell concerning 110 viruses. This in silico model was applied to comprehensively explore the molecular relationships between viruses and their associated diseases. This was done by merging virushost and host-host physical protein-protein interactomes with the set of genes essential for viral replication and involved in human genetic diseases. This systems-level approach provides strong evidence that viral proteomes target a wide range of functional and inter-connected modules of proteins as well as highly central and bridging proteins within the human interactome. The high centrality of targeted proteins was correlated to their essentiality for viruses’ lifecycle, using functional genomic RNAi data. A stealth-attack of viruses on proteins bridging cellular functions was demonstrated by simulation of cellular network perturbations, a property that could be essential in the molecular aetiology of some human diseases. Networking the Human Infectome and Diseasome unravels the connectivity of viruses to a wide range of diseases and profiled molecular basis of Hepatitis C Virus-induced diseases as well as 38 new candidate genetic predisposition factors involved in type 1 diabetes mellitus. Conclusions: The Human Infectome and Diseasome Networks described here provide a unique gateway towards the comprehensive modelling and analysis of the systems level properties associated to viral infection as well as candidate genes potentially involved in the molecular aetiology of human diseases.

Journal ArticleDOI
TL;DR: Through a series of large-scale leave-one-out cross-validation experiments, it is shown that the gene semantic similarity network can achieve not only higher coverage but also higher accuracy than the PPI network in the inference of disease genes.
Abstract: Motivation The inference of genes that are truly associated with inherited human diseases from a set of candidates resulting from genetic linkage studies has been one of the most challenging tasks in human genetics. Although several computational approaches have been proposed to prioritize candidate genes relying on protein-protein interaction (PPI) networks, these methods can usually cover less than half of known human genes.

Journal ArticleDOI
TL;DR: The impact of the results is to highlight that classical and simple control theory methods are extremely useful to characterize the behavior of biological networks analytically, and demonstrate that some biological networks are robust thanks to their structure and some qualitative properties of the interactions, regardless of the specific values of their parameters.
Abstract: Background: The molecular circuitry of living organisms performs remarkably robust regulatory tasks, despite the often intrinsic variability of its components. A large body of research has in fact highlighted that robustness is often a structural property of biological systems. However, there are few systematic methods to mathematically model and describe structural robustness. With a few exceptions, numerical studies are often the preferred approach to this type of investigation. Results: In this paper, we propose a framework to analyze robust stability of equilibria in biological networks. We employ Lyapunov and invariant sets theory, focusing on the structure of ordinary differential equation models. Without resorting to extensive numerical simulations, often necessary to explore the behavior of a model in its parameter space, we provide rigorous proofs of robust stability of known bio-molecular networks. Our results are in line with existing literature. Conclusions: The impact of our results is twofold: on the one hand, we highlight that classical and simple control theory methods are extremely useful to characterize the behavior of biological networks analytically. On the other hand, we are able to demonstrate that some biological networks are robust thanks to their structure and some qualitative properties of the interactions, regardless of the specific values of their parameters.

Journal ArticleDOI
TL;DR: A network-based approach for cancer biomarker identification, netSVM, is developed, resulting in an improved prediction performance with network biomarkers and several novel hub genes, which may provide new insight to the underlying mechanism of breast cancer metastasis.
Abstract: Background One of the major goals in gene and protein expression profiling of cancer is to identify biomarkers and build classification models for prediction of disease prognosis or treatment response. Many traditional statistical methods, based on microarray gene expression data alone and individual genes' discriminatory power, often fail to identify biologically meaningful biomarkers thus resulting in poor prediction performance across data sets. Nonetheless, the variables in multivariable classifiers should synergistically interact to produce more effective classifiers than individual biomarkers.

Journal ArticleDOI
TL;DR: Network reconstructions provide a systematic means to integrate and analyze proteomic data in a biologically meaning manner and reveal an unexpected level of complexity in the functional capabilities of human erythrocyte metabolism.
Abstract: The development of high-throughput technologies capable of whole cell measurements of genes, proteins, and metabolites has led to the emergence of systems biology. Integrated analysis of the resulting omic data sets has proved to be hard to achieve. Metabolic network reconstructions enable complex relationships amongst molecular components to be represented formally in a biologically relevant manner while respecting physical constraints. In silico models derived from such reconstructions can then be queried or interrogated through mathematical simulations. Proteomic profiling studies of the mature human erythrocyte have shown more proteins present related to metabolic function than previously thought; however the significance and the causal consequences of these findings have not been explored. Erythrocyte proteomic data was used to reconstruct the most expansive description of erythrocyte metabolism to date, following extensive manual curation, assessment of the literature, and functional testing. The reconstruction contains 281 enzymes representing functions from glycolysis to cofactor and amino acid metabolism. Such a comprehensive view of erythrocyte metabolism implicates the erythrocyte as a potential biomarker for different diseases as well as a 'cell-based' drug-screening tool. The analysis shows that 94 erythrocyte enzymes are implicated in morbid single nucleotide polymorphisms, representing 142 pathologies. In addition, over 230 FDA-approved and experimental pharmaceuticals have enzymatic targets in the erythrocyte. The advancement of proteomic technologies and increased generation of high-throughput proteomic data have created the need for a means to analyze these data in a coherent manner. Network reconstructions provide a systematic means to integrate and analyze proteomic data in a biologically meaning manner. Analysis of the red cell proteome has revealed an unexpected level of complexity in the functional capabilities of human erythrocyte metabolism.

Journal ArticleDOI
TL;DR: The present data suggest that miR-148a could be a potential prognostic biomarker of gastric cancer and function as a tumor suppressor through repressing the activity of its regulated PIN.
Abstract: Background MicroRNAs (miRNAs) are a class of endogenous, small and highly conserved noncoding RNAs that control gene expression either by degradation of target mRNAs or by inhibition of protein translation. They play important roles in cancer progression. A single miRNA can provoke a chain reaction and further affect protein interaction network (PIN). Therefore, we developed a novel integrative approach to identify the functional roles and the regulated PIN of oncomirs.

Journal ArticleDOI
TL;DR: This work reconstructed a stoichiometric model capturing the central metabolism of three important representatives of PNSB (Rhodospirillum rubrum, Rhodobacter sphaeroides and Rhodopseudomonas palustris), revealing key metabolic constraints related to redox homeostasis in these bacteria.
Abstract: Purple nonsulfur bacteria (PNSB) are facultative photosynthetic bacteria and exhibit an extremely versatile metabolism. A central focus of research on PNSB dealt with the elucidation of mechanisms by which they manage to balance cellular redox under diverse conditions, in particular under photoheterotrophic growth. Given the complexity of the central metabolism of PNSB, metabolic modeling becomes crucial for an integrated analysis of the accumulated biological knowledge. We reconstructed a stoichiometric model capturing the central metabolism of three important representatives of PNSB (Rhodospirillum rubrum, Rhodobacter sphaeroides and Rhodopseudomonas palustris). Using flux variability analysis, the model reveals key metabolic constraints related to redox homeostasis in these bacteria. With the help of the model we can (i) give quantitative explanations for non-intuitive, partially species-specific phenomena of photoheterotrophic growth of PNSB, (ii) reproduce various quantitative experimental data, and (iii) formulate several new hypotheses. For example, model analysis of photoheterotrophic growth reveals that - despite a large number of utilizable catabolic pathways - substrate-specific biomass and CO2 yields are fixed constraints, irrespective of the assumption of optimal growth. Furthermore, our model explains quantitatively why a CO2 fixing pathway such as the Calvin cycle is required by PNSB for many substrates (even if CO2 is released). We also analyze the role of other pathways potentially involved in redox metabolism and how they affect quantitatively the required capacity of the Calvin cycle. Our model also enables us to discriminate between different acetate assimilation pathways that were proposed recently for R. sphaeroides and R. rubrum, both lacking the isocitrate lyase. Finally, we demonstrate the value of the metabolic model also for potential biotechnological applications: we examine the theoretical capabilities of PNSB for photoheterotrophic hydrogen production and identify suitable genetic interventions to increase the hydrogen yield. Taken together, the metabolic model (i) explains various redox-related phenomena of the versatile metabolism of PNSB, (ii) delivers new hypotheses on the operation and relevance of several metabolic pathways, and (iii) holds significant potential as a tool for rational metabolic engineering of PNSB in biotechnological applications.

Journal ArticleDOI
TL;DR: Two parameter estimation methods of combining spline theory with Linear Programming (LP) and Nonlinear Programming (NLP) are developed and have general application to identify unknown parameter values of a wide range of systems biology models.
Abstract: Background Mathematical models for revealing the dynamics and interactions properties of biological systems play an important role in computational systems biology. The inference of model parameter values from time-course data can be considered as a "reverse engineering" process and is still one of the most challenging tasks. Many parameter estimation methods have been developed but none of these methods is effective for all cases and can overwhelm all other approaches. Instead, various methods have their advantages and disadvantages. It is worth to develop parameter estimation methods which are robust against noise, efficient in computation and flexible enough to meet different constraints.

Journal ArticleDOI
TL;DR: MetaDBSite will become a useful and integrative tool for protein DNA-binding residues prediction and the comparison results show that metaDBSites outperforms single individual approach.
Abstract: Background: Protein-DNA interactions play an important role in many fundamental biological activities such as DNA replication, transcription and repair. Identification of amino acid residues involved in DNA binding site is critical for understanding of the mechanism of gene regulations. In the last decade, there have been a number of computational approaches developed to predict protein-DNA binding sites based on protein sequence and/or structural information. Results: In this article, we present metaDBSite, a meta web server to predict DNA-binding residues for DNAbinding proteins. MetaDBSite integrates the prediction results from six available online web servers: DISIS, DNABindR, BindN, BindN-rf, DP-Bind and DBS-PRED and it solely uses sequence information of proteins. A large dataset of DNA-binding proteins is constructed from the Protein Data Bank and it serves as a gold-standard benchmark to evaluate the metaDBSite approach and the other six predictors. Conclusions: The comparison results show that metaDBSite outperforms single individual approach. We believe that metaDBSite will become a useful and integrative tool for protein DNA-binding residues prediction. The MetaDBSite web-server is freely available at http://projects.biotec.tu-dresden.de/metadbsite/ and http://sysbio.zju. edu.cn/metadbsite.

Journal ArticleDOI
TL;DR: The construction of a cellular stress network model and its application towards the analysis of environmental stress using transcriptomic data is described and the ability of the network model to identify the mechanisms that are activated in response to CS, a broad inducer of cellular stress is tested.
Abstract: Humans and other organisms are equipped with a set of responses that can prevent damage from exposure to a multitude of endogenous and environmental stressors. If these stress responses are overwhelmed, this can result in pathogenesis of diseases, which is reflected by an increased development of, e.g., pulmonary and cardiac diseases in humans exposed to chronic levels of environmental stress, including inhaled cigarette smoke (CS). Systems biology data sets (e.g., transcriptomics, phosphoproteomics, metabolomics) could enable comprehensive investigation of the biological impact of these stressors. However, detailed mechanistic networks are needed to determine which specific pathways are activated in response to different stressors and to drive the qualitative and eventually quantitative assessment of these data. A current limiting step in this process is the availability of detailed mechanistic networks that can be used as an analytical substrate. We have built a detailed network model that captures the biology underlying the physiological cellular response to endogenous and exogenous stressors in non-diseased mammalian pulmonary and cardiovascular cells. The contents of the network model reflect several diverse areas of signaling, including oxidative stress, hypoxia, shear stress, endoplasmic reticulum stress, and xenobiotic stress, that are elicited in response to common pulmonary and cardiovascular stressors. We then tested the ability of the network model to identify the mechanisms that are activated in response to CS, a broad inducer of cellular stress. Using transcriptomic data from the lungs of mice exposed to CS, the network model identified a robust increase in the oxidative stress response, largely mediated by the anti-oxidant NRF2 pathways, consistent with previous reports on the impact of CS exposure in the mammalian lung. The results presented here describe the construction of a cellular stress network model and its application towards the analysis of environmental stress using transcriptomic data. The proof-of-principle analysis described here, coupled with the future development of additional network models covering distinct areas of biology, will help to further clarify the integrated biological responses elicited by complex environmental stressors such as CS, in pulmonary and cardiovascular cells.

Journal ArticleDOI
TL;DR: The architecture of brain transcript regulation is surveyed and preservation of gene co-expression modules in hippocampus and striatum is demonstrated, while also highlighting important differences.
Abstract: Our understanding of the genetic basis of learning and memory remains shrouded in mystery. To explore the genetic networks governing the biology of conditional fear, we used a systems genetics approach to analyze a hybrid mouse diversity panel (HMDP) with high mapping resolution. A total of 27 behavioral quantitative trait loci were mapped with a false discovery rate of 5%. By integrating fear phenotypes, transcript profiling data from hippocampus and striatum and also genotype information, two gene co-expression networks correlated with context-dependent immobility were identified. We prioritized the key markers and genes in these pathways using intramodular connectivity measures and structural equation modeling. Highly connected genes in the context fear modules included Psmd6, Ube2a and Usp33, suggesting an important role for ubiquitination in learning and memory. In addition, we surveyed the architecture of brain transcript regulation and demonstrated preservation of gene co-expression modules in hippocampus and striatum, while also highlighting important differences. Rps15a, Kif3a, Stard7, 6330503K22RIK, and Plvap were among the individual genes whose transcript abundance were strongly associated with fear phenotypes. Application of our multi-faceted mapping strategy permits an increasingly detailed characterization of the genetic networks underlying behavior.