scispace - formally typeset
Search or ask a question

Showing papers on "Interaction network published in 2005"


Journal ArticleDOI
03 Feb 2005-Nature
TL;DR: Insight is provided into the function of previously uncharacterized bacterial proteins and the overall topology of a microbial interaction network, the core components of which are broadly conserved across Prokaryota.
Abstract: Proteins often function as components of multi-subunit complexes. Despite its long history as a model organism, no large-scale analysis of protein complexes in Escherichia coli has yet been reported. To this end, we have targeted DNA cassettes into the E. coli chromosome to create carboxy-terminal, affinity-tagged alleles of 1,000 open reading frames (approximately 23% of the genome). A total of 857 proteins, including 198 of the most highly conserved, soluble non-ribosomal proteins essential in at least one bacterial species, were tagged successfully, whereas 648 could be purified to homogeneity and their interacting protein partners identified by mass spectrometry. An interaction network of protein complexes involved in diverse biological processes was uncovered and validated by sequential rounds of tagging and purification. This network includes many new interactions as well as interactions predicted based solely on genomic inference or limited phenotypic data. This study provides insight into the function of previously uncharacterized bacterial proteins and the overall topology of a microbial interaction network, the core components of which are broadly conserved across Prokaryota.

1,175 citations


Journal ArticleDOI
TL;DR: A new centrality measure that characterizes the participation of each node in all subgraphs in a network, C(S)(i), which is better able to discriminate the nodes of a network than alternate measures such as degree, closeness, betweenness, and eigenvector centralities.
Abstract: We introduce a new centrality measure that characterizes the participation of each node in all subgraphs in a network. Smaller subgraphs are given more weight than larger ones, which makes this measure appropriate for characterizing network motifs. We show that the subgraph centrality [C(S)(i)] can be obtained mathematically from the spectra of the adjacency matrix of the network. This measure is better able to discriminate the nodes of a network than alternate measures such as degree, closeness, betweenness, and eigenvector centralities. We study eight real-world networks for which C(S)(i) displays useful and desirable properties, such as clear ranking of nodes and scale-free characteristics. Compared with the number of links per node, the ranking introduced by C(S)(i) (for the nodes in the protein interaction network of S. cereviciae) is more highly correlated with the lethality of individual proteins removed from the proteome.

1,102 citations


Journal ArticleDOI
TL;DR: It is found that proteins with high betweenness are more likely to be essential and that evolutionary age of proteins is positively correlated with betweenness, which suggests the existence of some modular organization of the network, and that the high-betweenness, low-connectivity proteins may act as important links between these modules.
Abstract: Structural features found in biomolecular networks that are absent in random networks produced by simple algorithms can provide insight into the function and evolution of cell regulatory networks. Here we analyze “betweenness” of network nodes, a graph theoretical centrality measure, in the yeast protein interaction network. Proteins that have high betweenness, but low connectivity (degree), were found to be abundant in the yeast proteome. This finding is not explained by algorithms proposed to explain the scale-free property of protein interaction networks, where low-connectivity proteins also have low betweenness. These data suggest the existence of some modular organization of the network, and that the high-betweenness, low-connectivity proteins may act as important links between these modules. We found that proteins with high betweenness are more likely to be essential and that evolutionary age of proteins is positively correlated with betweenness. By comparing different models of genome evolution that generate scale-free networks, we show that rewiring of interactions via mutation is an important factor in the production of such proteins. The evolutionary and functional significance of these observations are discussed.

469 citations


Journal ArticleDOI
TL;DR: It is demonstrated that a probabilistic analysis integrating model organism interactome data, protein domain data, genome-wide gene expression data and functional annotation data predicts nearly 40,000 protein-protein interactions in humans—a result comparable to those obtained with experimental and computational approaches in model organisms.
Abstract: A catalog of all human protein-protein interactions would provide scientists with a framework to study protein deregulation in complex diseases such as cancer. Here we demonstrate that a probabilistic analysis integrating model organism interactome data, protein domain data, genome-wide gene expression data and functional annotation data predicts nearly 40,000 protein-protein interactions in humans-a result comparable to those obtained with experimental and computational approaches in model organisms. We validated the accuracy of the predictive model on an independent test set of known interactions and also experimentally confirmed two predicted interactions relevant to human cancer, implicating uncharacterized proteins into definitive pathways. We also applied the human interactome network to cancer genomics data and identified several interaction subnetworks activated in cancer. This integrative analysis provides a comprehensive framework for exploring the human protein interaction network.

446 citations


Journal ArticleDOI
TL;DR: These interactions and the accuracy benchmarks will aid interpretation of current functional genomics data and provide a basis for determining the quality of future large-scale human protein interaction assays.
Abstract: Background: Extensive protein interaction maps are being constructed for yeast, worm, and fly to ask how the proteins organize into pathways and systems, but no such genome-wide interaction map yet exists for the set of human proteins. To prepare for studies in humans, we wished to establish tests for the accuracy of future interaction assays and to consolidate the known interactions among human proteins. Results: We established two tests of the accuracy of human protein interaction datasets and measured the relative accuracy of the available data. We then developed and applied natural language processing and literature-mining algorithms to recover from Medline abstracts 6,580 interactions among 3,737 human proteins. A three-part algorithm was used: first, human protein names were identified in Medline abstracts using a discriminator based on conditional random fields, then interactions were identified by the co-occurrence of protein names across the set of Medline abstracts, filtering the interactions with a Bayesian classifier to enrich for legitimate physical interactions. These mined interactions were combined with existing interaction data to obtain a network of 31,609 interactions among 7,748 human proteins, accurate to the same degree as the existing datasets. Conclusion: These interactions and the accuracy benchmarks will aid interpretation of current functional genomics data and provide a basis for determining the quality of future large-scale human protein interaction assays. Projecting from the approximately 15 interactions per protein in the best-sampled interaction set to the estimated 25,000 human genes implies more than 375,000 interactions in the complete human protein interaction network. This set therefore represents no more than 10% of the complete network.

269 citations


Journal ArticleDOI
TL;DR: This work presents a method for inferring the mechanism most accurately capturing a given network topology, exploiting discriminative tools from machine learning.
Abstract: Naturally occurring networks exhibit quantitative features revealing underlying growth mechanisms. Numerous network mechanisms have recently been proposed to reproduce specific properties such as degree distributions or clustering coefficients. We present a method for inferring the mechanism most accurately capturing a given network topology, exploiting discriminative tools from machine learning. The Drosophila melanogaster protein network is confidently and robustly (to noise and training data subsampling) classified as a duplication–mutation–complementation network over preferential attachment, small-world, and a duplication–mutation mechanism without complementation. Systematic classification, rather than statistical study of specific properties, provides a discriminative approach to understand the design of complex networks.

234 citations


Journal ArticleDOI
TL;DR: Applying a core decomposition method which allows us to identify the inherent layer structure of the protein interaction network, it is found that the probability of nodes both being essential and evolutionary conserved successively increases toward the innermost cores.
Abstract: A set of highly connected proteins (or hubs) plays an important role for the integrity of the protein interaction network of Saccharomyces cerevisae by connecting the network's intrinsic modules [1, 2]. The importance of the hubs' central placement is further confirmed by their propensity to be lethal. However, although highly emphasized, little is known about the topological coherence among the hubs. Applying a core decomposition method which allows us to identify the inherent layer structure of the protein interaction network, we find that the probability of nodes both being essential and evolutionary conserved successively increases toward the innermost cores. While connectivity alone is often not a sufficient criterion to assess a protein's functional, evolutionary and topological relevance, we classify nodes as globally and locally central depending on their appearance in the inner or outer cores. The observation that globally central proteins participate in a substantial number of protein complexes which display an elevated degree of evolutionary conservation allows us to hypothesize that globally central proteins serve as the evolutionary backbone of the proteome. Even though protein interaction data are extensively flawed, we find that our results are very robust against inaccurately determined protein interactions.

231 citations


Journal ArticleDOI
TL;DR: An integrated Saccharomyces cerevisiae interaction network is assembled in which nodes represent genes (or their protein products) and differently colored links represent the aforementioned five biological interaction types and it is shown that most of the motifs form 'network themes' – classes of higher-order recurring interconnection patterns that encompass multiple occurrences of network motifs.
Abstract: Background: Large-scale studies have revealed networks of various biological interaction types, such as protein-protein interaction, genetic interaction, transcriptional regulation, sequence homology, and expression correlation. Recurring patterns of interconnection, or ‘network motifs’, have revealed biological insights for networks containing either one or two types of interaction. Results: To study more complex relationships involving multiple biological interaction types, we assembled an integrated Saccharomyces cerevisiae network in which nodes represent genes (or their protein products) and differently colored links represent the aforementioned five biological interaction types. We examined three- and four-node interconnection patterns containing multiple interaction types and found many enriched multi-color network motifs. Furthermore, we showed that most of the motifs form ‘network themes’ - classes of higherorder recurring interconnection patterns that encompass multiple occurrences of network motifs. Network themes can be tied to specific biological phenomena and may represent more fundamental network design principles. Examples of network themes include a pair of protein complexes with many inter-complex genetic interactions - the ‘compensatory complexes’ theme. Thematic maps - networks rendered in terms of such themes - can simplify an otherwise confusing tangle of biological relationships. We show this by mapping the S. cerevisiae network in terms of two specific network themes. Conclusions: Significantly enriched motifs in an integrated S. cerevisiae interaction network are often signatures of network themes, higher-order network structures that correspond to biological phenomena. Representing networks in terms of network themes provides a useful simplification of complex biological relationships.

200 citations


Journal ArticleDOI
TL;DR: A novel algorithm for automated prediction of protein-protein interactions that employs a unique bottom-up approach combining structure and sequence conservation in protein interfaces is presented.
Abstract: Motivation: Elucidation of the full network of protein--protein interactions is crucial for understanding of the principles of biological systems and processes. Thus, there is a need for in silico methods for predicting interactions. We present a novel algorithm for automated prediction of protein--protein interactions that employs a unique bottom-up approach combining structure and sequence conservation in protein interfaces. Results: Running the algorithm on a template dataset of 67 interfaces and a sequentially non-redundant dataset of 6170 protein structures, 62 616 potential interactions are predicted. These interactions are compared with the ones in two publicly available interaction databases (Database of Interacting Proteins and Biomolecular Interaction Network Database) and also the Protein Data Bank. A significant number of predictions are verified in these databases. The unverified ones may correspond to (1) interactions that are not covered in these databases but known in literature, (2) unknown interactions that actually occur in nature and (3) interactions that do not occur naturally but may possibly be realized synthetically in laboratory conditions. Some unverified interactions, supported significantly with studies found in the literature, are discussed. Availability: http://gordion.hpc.eng.ku.edu.tr/prism Contact:[email protected]; [email protected]

193 citations


Journal ArticleDOI
TL;DR: It is found that for evolved complex networks as well as for the yeast protein–protein interaction network, synthetic lethal gene pairs consist mostly of redundant genes that lie close to each other and therefore within modules, while knockdown suppressor gene pairs are farther apart and often straddle modules, suggesting that knockdown rescue is mediated by alternative pathways or modules.
Abstract: Biological networks have evolved to be highly functional within uncertain environments while remaining extremely adaptable. One of the main contributors to the robustness and evolvability of biological networks is believed to be their modularity of function, with modules defined as sets of genes that are strongly interconnected but whose function is separable from those of other modules. Here, we investigate the in silico evolution of modularity and robustness in complex artificial metabolic networks that encode an increasing amount of information about their environment while acquiring ubiquitous features of biological, social, and engineering networks, such as scale-free edge distribution, small-world property, and fault-tolerance. These networks evolve in environments that differ in their predictability, and allow us to study modularity from topological, information-theoretic, and gene-epistatic points of view using new tools that do not depend on any preconceived notion of modularity. We find that for our evolved complex networks as well as for the yeast protein-protein interaction network, synthetic lethal gene pairs consist mostly of redundant genes that lie close to each other and therefore within modules, while knockdown suppressor gene pairs are farther apart and often straddle modules, suggesting that knockdown rescue is mediated by alternative pathways or modules. The combination of network modularity tools together with genetic interaction data constitutes a powerful approach to study and dissect the role of modularity in the evolution and function of biological networks.

186 citations


Proceedings ArticleDOI
01 Dec 2005
TL;DR: This work developed a computational method to rank-order AD-related proteins, based on an initial list of AD- related genes and public human protein interaction data, and showed that functionally relevant AD proteins were consistently ranked at the top.
Abstract: Huge unrealized post-genome opportunities remain in the understanding of detailed molecular mechanisms for Alzheimer Disease (AD). In this work, we developed a computational method to rank-order AD-related proteins, based on an initial list of AD-related genes and public human protein interaction data. In this method, we first collected an initial seed list of 65 AD-related genes from the OMIM database and mapped them to 70 AD seed proteins. We then expanded the seed proteins to an enriched AD set of 765 proteins using protein interactions from the Online Predicated Human Interaction Database (OPHID). We showed that the expanded AD-related proteins form a highly connected and statistically significant protein interaction sub-network. We further analyzed the sub-network to develop an algorithm, which can be used to automatically score and rank-order each protein for its biological relevance to AD pathways(s). Our results show that functionally relevant AD proteins were consistently ranked at the top: among the top 20 of 765 expanded AD proteins, 19 proteins are confirmed to belong to the original 70 AD seed protein set. Our method represents a novel use of protein interaction network data for Alzheimer disease studies and may be generalized for other disease areas in the future.

Journal ArticleDOI
TL;DR: An inferred human protein interaction network where interactions discovered in model organisms are mapped onto the corresponding human orthologs, based on the orthology table filtered by the domain architecture matching algorithm.
Abstract: Background: The application of high throughput approaches to the identification of protein interactions has offered for the first time a glimpse of the global interactome of some model organisms Until now, however, such genome-wide approaches have not been applied to the human proteome Results: In order to fill this gap we have assembled an inferred human protein interaction network where interactions discovered in model organisms are mapped onto the corresponding human orthologs In addition to a stringent assignment to orthology classes based on the InParanoid algorithm, we have implemented a string matching algorithm to filter out orthology assignments of proteins whose global domain organization is not conserved Finally, we have assessed the accuracy of our own, and related, inferred networks by benchmarking them against i) an assembled experimental interactome, ii) a network derived by mining of the scientific literature and iii) by measuring the enrichment of interacting protein pairs sharing common Gene Ontology annotation Conclusion: The resulting networks are named HomoMINT and HomoMINT_filtered, the latter being based on the orthology table filtered by the domain architecture matching algorithm They contains 9749 and 5203 interactions respectively and can be analyzed and viewed in the context of the experimentally verified interactions between human proteins stored in the MINT database HomoMINT is constantly updated to take into account the growing information in the MINT database

Journal ArticleDOI
TL;DR: The present approach may be used to simplify a variety of directed and nondirected, natural and designed networks, and find that both biological and electronic networks are "self-dissimilar," with different network motifs at each level.
Abstract: Can complex engineered and biological networks be coarse-grained into smaller and more understandable versions in which each node represents an entire pattern in the original network? To address this, we define coarse-graining units as connectivity patterns which can serve as the nodes of a coarse-grained network and present algorithms to detect them. We use this approach to systematically reverse-engineer electronic circuits, forming understandable high-level maps from incomprehensible transistor wiring: first, a coarse-grained version in which each node is a gate made of several transistors is established. Then the coarse-grained network is itself coarse-grained, resulting in a high-level blueprint in which each node is a circuit module made of many gates. We apply our approach also to a mammalian protein signal-transduction network, to find a simplified coarse-grained network with three main signaling channels that resemble multi-layered perceptrons made of cross-interacting MAP-kinase cascades. We find that both biological and electronic networks are ``self-dissimilar,'' with different network motifs at each level. The present approach may be used to simplify a variety of directed and nondirected, natural and designed networks.

Journal ArticleDOI
TL;DR: The results indicate that gene duplication has played a larger part in the network evolution of the eukaryote than in the prokaryote, and suggests that single gene duplications with immediate divergence alone may explain more than 60% of biological network data in both domains.
Abstract: Gene duplication with subsequent interaction divergence is one of the primary driving forces in the evolution of genetic systems. Yet little is known about the precise mechanisms and the role of duplication divergence in the evolution of protein networks from the prokaryote and eukaryote domains. We developed a novel, model-based approach for Bayesian inference on biological network data that centres on approximate Bayesian computation, or likelihood-free inference. Instead of computing the intractable likelihood of the protein network topology, our method summarizes key features of the network and, based on these, uses a MCMC algorithm to approximate the posterior distribution of the model parameters. This allowed us to reliably fit a flexible mixture model that captures hallmarks of evolution by gene duplication and subfunctionalization to protein interaction network data of Helicobacter pylori and Plasmodium falciparum. The 80% credible intervals for the duplication–divergence component are [0.64, 0.98] for H. pylori and [0.87, 0.99] for P. falciparum. The remaining parameter estimates are not inconsistent with sequence data. An extensive sensitivity analysis showed that incompleteness of PIN data does not largely affect the analysis of models of protein network evolution, and that the degree sequence alone barely captures the evolutionary footprints of protein networks relative to other statistics. Our likelihood-free inference approach enables a fully Bayesian analysis of a complex and highly stochastic system that is otherwise intractable at present. Modelling the evolutionary history of PIN data, it transpires that only the simultaneous analysis of several global aspects of protein networks enables credible and consistent inference to be made from available datasets. Our results indicate that gene duplication has played a larger part in the network evolution of the eukaryote than in the prokaryote, and suggests that single gene duplications with immediate divergence alone may explain more than 60% of biological network data in both domains.

Journal ArticleDOI
TL;DR: It is shown that eukaryotic species have rewired their interactomes at a fast rate of approximately 10−5 interactions changed per protein pair, per million years of divergence, and proposed that the power law distribution observed in protein interaction networks could be partly explained by the cell's requirement for different degrees of protein binding specificity.
Abstract: Progress in uncovering the protein interaction networks of several species has led to questions of what underlying principles might govern their organization. Few studies have tried to determine the impact of protein interaction network evolution on the observed physiological differences between species. Using comparative genomics and structural information, we show here that eukaryotic species have rewired their interactomes at a fast rate of approximately 10−5 interactions changed per protein pair, per million years of divergence. For Homo sapiens this corresponds to 103 interactions changed per million years. Additionally we find that the specificity of binding strongly determines the interaction turnover and that different biological processes show significantly different link dynamics. In particular, human proteins involved in immune response, transport, and establishment of localization show signs of positive selection for change of interactions. Our analysis suggests that a small degree of molecular divergence can give rise to important changes at the network level. We propose that the power law distribution observed in protein interaction networks could be partly explained by the cell's requirement for different degrees of protein binding specificity.

Journal ArticleDOI
TL;DR: A mathematical model of the G1 to S network is reported that newly takes into account nucleo/cytoplasmic localization, the role of the cyclin-dependent kinase Sic1 in facilitating nuclear import of its cognate Cdk1-Clb5, Whi5 control, and carbon source regulation of Sic1 and Sic1-containing complexes.
Abstract: The eukaryotic cell cycle is the repeated sequence of events that enable the division of a cell into two daughter cells. It is divided into four phases: G1, S, G2, and M. Passage through the cell cycle is strictly regulated by a molecular interaction network, which involves the periodic synthesis and destruction of cyclins that bind and activate cyclin-dependent kinases that are present in nonlimiting amounts. Cyclin-dependent kinase inhibitors contribute to cell cycle control. Budding yeast is an established model organism for cell cycle studies, and several mathematical models have been proposed for its cell cycle. An area of major relevance in cell cycle control is the G1 to S transition. In any given growth condition, it is characterized by the requirement of a specific, critical cell size, PS, to enter S phase. The molecular basis of this control is still under discussion. The authors report a mathematical model of the G1 to S network that newly takes into account nucleo/cytoplasmic localization, the role of the cyclin-dependent kinase Sic1 in facilitating nuclear import of its cognate Cdk1-Clb5, Whi5 control, and carbon source regulation of Sic1 and Sic1-containing complexes. The model was implemented by a set of ordinary differential equations that describe the temporal change of the concentration of the involved proteins and protein complexes. The model was tested by simulation in several genetic and nutritional setups and was found to be neatly consistent with experimental data. To estimate PS, the authors developed a hybrid model including a probabilistic component for firing of DNA replication origins. Sensitivity analysis of PS provides a novel relevant conclusion: PS is an emergent property of the G1 to S network that strongly depends on growth rate.

Journal ArticleDOI
TL;DR: It is hypothesize that these patterns, consisting of the domains and links preserved through evolution, may constitute nucleation kernels for the evolutionary increase in proteome complexity.
Abstract: The modeling of complex systems, as disparate as the World Wide Web and the cellular metabolism, as networks has recently uncovered a set of generic organizing principles: Most of these systems are scale-free while at the same time modular, resulting in a hierarchical architecture. The structure of the protein domain network, where individual domains correspond to nodes and their co-occurrences in a protein are interpreted as links, also falls into this category, suggesting that domains involved in the maintenance of increasingly developed, multicellular organisms accumulate links. Here, we take the next step by studying link based properties of the protein domain co-occurrence networks of the eukaryotes S. cerevisiae, C. elegans, D. melanogaster, M. musculus and H. sapiens. We construct the protein domain co-occurrence networks from the PFAM database and analyze them by applying a k-core decomposition method that isolates the globally central (highly connected domains in the central cores) from the locally central (highly connected domains in the peripheral cores) protein domains through an iterative peeling process. Furthermore, we compare the subnetworks thus obtained to the physical domain interaction network of S. cerevisiae. We find that the innermost cores of the domain co-occurrence networks gradually grow with increasing degree of evolutionary development in going from single cellular to multicellular eukaryotes. The comparison of the cores across all the organisms under consideration uncovers patterns of domain combinations that are predominately involved in protein functions such as cell-cell contacts and signal transduction. Analyzing a weighted interaction network of PFAM domains of Yeast, we find that domains having only a few partners frequently interact with these, while the converse is true for domains with a multitude of partners. Combining domain co-occurrence and interaction information, we observe that the co-occurrence of domains in the innermost cores (globally central domains) strongly coincides with physical interaction. The comparison of the multicellular eukaryotic domain co-occurrence networks with the single celled of S. cerevisiae (the overlap network) uncovers small, connected network patterns. We hypothesize that these patterns, consisting of the domains and links preserved through evolution, may constitute nucleation kernels for the evolutionary increase in proteome complexity. Combining co-occurrence and physical interaction data we argue that the driving force behind domain fusions is a collective effect caused by the number of interactions and not the individual interaction frequency.

Journal ArticleDOI
TL;DR: This software review looks at the utility of the Biomolecular Interaction Network Database (BIND) as a web database, which offers methods common to related biology databases and specialisations for its protein interaction data.
Abstract: This software review looks at the utility of the Biomolecular Interaction Network Database (BIND) as a web database. BIND offers methods common to related biology databases and specialisations for its protein interaction data. Searching and browsing this database is easy and well integrated with the underlying data and the needs of scientists. Interaction networks are visualised with software that offers many useful options. The innovative ontoglyphs are used throughout to provide visual cues to protein functions, localisation and other aspects one needs to know for this data set. One can expect to get useful results that may be well integrated with one's research needs.

Journal ArticleDOI
TL;DR: A novel computational model simulating this cellular decision-process leading up to either phenotype based on a molecular interaction network of genes and proteins shows that one can indeed simulate the dichotomy between cell migration and proliferation based solely on an EGFR decision network.

Journal ArticleDOI
TL;DR: By the analyses of high-throughput data in yeast Saccharomyces cerevisiae, it was found that a protein's dispensability had significant correlations with its evolutionary rate and duplication rate, as well as its connectivity in protein-protein interaction network and gene-expression correlation network.
Abstract: Motivation: Protein dispensability is fundamental to the understanding of gene function and evolution. Recent advances in generating high-throughput data such as genomic sequence data, protein--protein interaction data, gene-expression data and growth-rate data of mutants allow us to investigate protein dispensability systematically at the genome scale. Results: In our studies, protein dispensability is represented as a fitness score that is measured by the growth rate of gene-deletion mutants. By the analyses of high-throughput data in yeast Saccharomyces cerevisiae, we found that a protein's dispensability had significant correlations with its evolutionary rate and duplication rate, as well as its connectivity in protein--protein interaction network and gene-expression correlation network. Neural network and support vector machine were applied to predict protein dispensability through high-throughput data. Our studies shed some lights on global characteristics of protein dispensability and evolution. Availability: The original datasets for protein dispensability analysis and prediction, together with related scripts, are available at http://digbio.missouri.edu/~ychen/ProDispen/ Contact: xudong@missouri.edu

Journal ArticleDOI
TL;DR: A new kernel-based method for supervised graph inference based on multiple types of biological datasets such as gene expression, phylogenetic profiles and amino acid sequences is proposed, which can integrate multiple datasets selectively and thereby exclude irrelevant and noisy datasets.
Abstract: Motivation: Inferring networks of proteins from biological data is a central issue of computational biology. Most network inference methods, including Bayesian networks, take unsupervised approaches in which the network is totally unknown in the beginning, and all the edges have to be predicted. A more realistic supervised framework, proposed recently, assumes that a substantial part of the network is known. We propose a new kernel-based method for supervised graph inference based on multiple types of biological datasets such as gene expression, phylogenetic profiles and amino acid sequences. Notably, our method assigns a weight to each type of dataset and thereby selects informative ones. Data selection is useful for reducing data collection costs. For example, when a similar network inference problem must be solved for other organisms, the dataset excluded by our algorithm need not be collected. Results: First, we formulate supervised network inference as a kernel matrix completion problem, where the inference of edges boils down to estimation of missing entries of a kernel matrix. Then, an expectation--maximization algorithm is proposed to simultaneously infer the missing entries of the kernel matrix and the weights of multiple datasets. By introducing the weights, we can integrate multiple datasets selectively and thereby exclude irrelevant and noisy datasets. Our approach is favorably tested in two biological networks: a metabolic network and a protein interaction network. Availability: Software is available on request. Contact: kato-tsuyoshi@aist.go.jp Supplementary information: A supplementary report including mathematical details is available at www.cbrc.jp/~kato/faem/faem.html

Journal ArticleDOI
TL;DR: MORPH is introduced, a new algorithm for predicting protein interaction partners between members of two protein families that are known to interact that reduces the search space by approximately 3 x 10(5)-fold and at the same time increases the accuracy of predicting correct binding partners.
Abstract: Motivation: Uncovering the protein--protein interaction network is a fundamental step in the quest to understand the molecular machinery of a cell. This motivates the search for efficient computational methods for predicting such interactions. Among the available predictors are those that are based on the co-evolution hypothesis "evolutionary trees of protein families (that are known to interact) are expected to have similar topologies". Many of these methods are limited by the fact that they can handle only a small number of protein sequences. Also, details on evolutionary tree topology are missing as they use similarity matrices in lieu of the trees. Results: We introduce MORPH, a new algorithm for predicting protein interaction partners between members of two protein families that are known to interact. Our approach can also be seen as a new method for searching the best superposition of the corresponding evolutionary trees based on tree automorphism group. We discuss relevant facts related to the predictability of protein--protein interaction based on their co-evolution. When compared with related computational approaches, our method reduces the search space by ∼3 × 105-fold and at the same time increases the accuracy of predicting correct binding partners. Contact: przytyck@mail.nih.gov

Book
01 May 2005
TL;DR: The ORFeome: the first step toward the interactome of C. elegans, and the future of NLP in Biomedicine.
Abstract: Preface List of Contributors SECTION I: INTRODUCTION - DATA DIVERSITY AND INTEGRATION 1 Integrative Data Analysis and Visualization: Introduction to Critical Problems, Goals and Challenges (Francisco Azuaje and Joaquin Dopazo) 11 Data Analysis and Visualization: An Integrative Approach 12 Critical Design and Implementation Factors 13 Overview of Contributions References 2 Biological Databases: Infrastructure, Content and Integration (Allyson L Williams, Paul J Kersey, Manuela Pruess and Rolf Apweiler) 21 Introduction 22 Data Integration 23 Review of Molecular Biology Databases 24 Conclusion References 3 Data and Predictive Model Integration: an Overview of Key Concepts, Problems and Solutions (Francisco Azuaje, Joaquin Dopazo and Haiying Wang) 31 Integrative Data Analysis and Visualization: Motivation and Approaches 32 Integrating Informational Views and Complexity for Understanding Function 33 Integrating Data Analysis Techniques for Supporting Functional Analysis 34 Final Remarks References SECTION II: INTEGRATIVE DATA MINING AND VISUALIZATION -EMPHASIS ON COMBINATION OF MULTIPLE DATA TYPES 4 Applications of Text Mining in Molecular Biology, from Name Recognition to Protein Interaction Maps (Martin Krallinger and Alfonso Valencia) 41 Introduction 42 Introduction to Text Mining and NLP 43 Databases and Resources for Biomedical Text Mining 44 Text Mining and Protein-Protein Interactions 45 Other Text-Mining Applications in Genomics 46 The Future of NLP in Biomedicine Acknowledgements References 5 Protein Interaction Prediction by Integrating Genomic Features and Protein Interaction Network Analysis (Long J Lu, Yu Xia, Haiyuan Yu, Alexander Rives, Haoxin Lu, Falk Schubert and Mark Gerstein) 51 Introduction 52 Genomic Features in Protein Interaction Predictions 53 Machine Learning on Protein-Protein Interactions 54 The Missing Value Problem 55 Network Analysis of Protein Interactions 56 Discussion References 6 Integration of Genomic and Phenotypic Data (Amanda Clare) 61 Phenotype 62 Forward Genetics and QTL Analysis 63 Reverse Genetics 64 Prediction of Phenotype from Other Sources of Data 65 Integrating Phenotype Data with Systems Biology 66 Integration of Phenotype Data in Databases 67 Conclusions References 7 Ontologies and Functional Genomics (Fatima Al-Shahrour and Joaquin Dopazo) 71 Information Mining in Genome-Wide Functional Analysis 72 Sources of Information: Free Text Versus Curated Repositories 73 Bio-Ontologies and the Gene Ontology in Functional Genomics 74 Using GO to Translate the Results of Functional Genomic Experiments into Biological Knowledge 75 Statistical Approaches to Test Significant Biological Differences 76 Using FatiGO to Find Significant Functional Associations in Clusters of Genes 77 Other Tools 78 Examples of Functional Analysis of Clusters of Genes 79 Future Prospects References 8 The C elegans Interactome: its Generation and Visualization (Alban Chesnau and Claude Sardet) 81 Introduction 82 The ORFeome: the first step toward the interactome of C elegans 83 Large-Scale High-Throughput Yeast Two-Hybrid Screens to Map the C elegans Protein-Protein Interaction (Interactome) Network: Technical Aspects 84 Visualization and Topology of Protein-Protein Interaction Networks 85 Cross-Talk Between the C elegans Interactome and other Large-Scale Genomics and Post-Genomics Data Sets 86 Conclusion: From Interactions to Therapies References SECTION III: INTEGRATIVE DATA MINING AND VISUALIZATION - EMPHASIS ON COMBINATION OF MULTIPLE PREDICTION MODELS AND METHODS 9 Integrated Approaches for Bioinformatic Data Analysis and Visualization - Challenges, Opportunities and New Solutions (Steve R Pettifer, James R Sinnott and Teresa K Attwood) 91 Introduction 92 Sequence Analysis Methods and Databases 93 A View Through a Portal 94 Problems with Monolithic Approaches: One Size Does Not Fit All 95 A Toolkit View 96 Challenges and Opportunities 97 Extending the Desktop Metaphor 98 Conclusions Acknowledgements References 10 Advances in Cluster Analysis of Microarray Data (Qizheng Sheng, Yves Moreau, Frank De Smet, Kathleen Marchal and Bart De Moor) 101 Introduction 102 Some Preliminaries 103 Hierarchical Clustering 104 k-Means Clustering 105 Self-Organizing Maps 106 A Wish List for Clustering Algorithms 107 The Self-Organizing Tree Algorithm 108 Quality-Based Clustering Algorithms 109 Mixture Models 1010 Biclustering Algorithms 1011 Assessing Cluster Quality 1012 Open Horizons References 11 Unsupervised Machine Learning to Support Functional Characterization of Genes: Emphasis on Cluster Description and Class Discovery (Olga G Troyanskaya) 111 Functional Genomics: Goals and Data Sources 112 Functional Annotation by Unsupervised Analysis of Gene Expression Microarray Data 113 Integration of Diverse Functional Data For Accurate Gene Function Prediction 114 MAGIC - General Probabilistic Integration of Diverse Genomic Data 115 Conclusion References 12 Supervised Methods with Genomic Data: a Review and Cautionary View (Ramon Diaz-Uriarte) 121 Chapter Objectives 122 Class Prediction and Class Comparison 123 Class Comparison: Finding/Ranking Differentially Expressed Genes 124 Class Prediction and Prognostic Prediction 125 ROC Curves for Evaluating Predictors and Differential Expression 126 Caveats and Admonitions 127 Final Note: Source Code Should be Available Acknowledgements References 13 A Guide to the Literature on Inferring Genetic Networks by Probabilistic Graphical Models (Pedro Larranaga, Inaki Inza and Jose L Flores) 131 Introduction 132 Genetic Networks 133 Probabilistic Graphical Models 134 Inferring Genetic Networks by Means of Probabilistic Graphical Models 135 Conclusions Acknowledgements References 14 Integrative Models for the Prediction and Understanding of Protein Structure Patterns (Inge Jonassen) 141 Introduction 142 Structure Prediction 143 Classifications of Structures 144 Comparing Protein Structures 145 Methods for the Discovery of Structure Motifs 146 Discussion and Conclusions References Index

Posted Content
TL;DR: In this paper, the authors compare the genomes of Saccharomyces cerevisiae and Caenorhabditis elegans with those of closely related species to elucidate the recent evolutionary history of their respective protein interaction networks.
Abstract: Protein interaction networks aim to summarize the complex interplay of proteins in an organism. Early studies suggested that the position of a protein in the network determines its evolutionary rate but there has been considerable disagreement as to what extent other factors, such as protein abundance, modify this reported dependence. We compare the genomes of Saccharomyces cerevisiae and Caenorhabditis elegans with those of closely related species to elucidate the recent evolutionary history of their respective protein interaction networks. Interaction and expression data are studied in the light of a detailed phylogenetic analysis. The underlying network structure is incorporated explicitly into the statistical analysis. The increased phylogenetic resolution, paired with high-quality interaction data, allows us to resolve the way in which protein interaction network structure and abundance of proteins affect the evolutionary rate. We find that expression levels are better predictors of the evolutionary rate than a protein's connectivity. Detailed analysis of the two organisms also shows that the evolutionary rates of interacting proteins are not sufficiently similar to be mutually predictive. It appears that meaningful inferences about the evolution of protein interaction networks require comparative analysis of reasonably closely related species. The signature of protein evolution is shaped by a protein's abundance in the organism and its function and the biological process it is involved in. Its position in the interaction networks and its connectivity may modulate this but they appear to have only minor influence on a protein's evolutionary rate.

Journal ArticleDOI
TL;DR: It appears that meaningful inferences about the evolution of protein interaction networks require comparative analysis of reasonably closely related species and it is found that expression levels are better predictors of the evolutionary rate than a protein's connectivity.
Abstract: Protein interaction networks aim to summarize the complex interplay of proteins in an organism. Early studies suggested that the position of a protein in the network determines its evolutionary rate but there has been considerable disagreement as to what extent other factors, such as protein abundance, modify this reported dependence. We compare the genomes of Saccharomyces cerevisiae and Caenorhabditis elegans with those of closely related species to elucidate the recent evolutionary history of their respective protein interaction networks. Interaction and expression data are studied in the light of a detailed phylogenetic analysis. The underlying network structure is incorporated explicitly into the statistical analysis. The increased phylogenetic resolution, paired with high-quality interaction data, allows us to resolve the way in which protein interaction network structure and abundance of proteins affect the evolutionary rate. We find that expression levels are better predictors of the evolutionary rate than a protein's connectivity. Detailed analysis of the two organisms also shows that the evolutionary rates of interacting proteins are not sufficiently similar to be mutually predictive. It appears that meaningful inferences about the evolution of protein interaction networks require comparative analysis of reasonably closely related species. The signature of protein evolution is shaped by a protein's abundance in the organism and its function and the biological process it is involved in. Its position in the interaction networks and its connectivity may modulate this but they appear to have only minor influence on a protein's evolutionary rate.

Proceedings ArticleDOI
01 Dec 2005
TL;DR: A random forest classifier is described which can effectively combine the structure-based prediction results and other functional annotations together to predict protein interactions and achieve a better performance than when structure information is not used.
Abstract: UNLABELLED This paper presents a framework for predicting protein-protein interactions (PPI) that integrates structure-based information with other functional annotations, e.g. GO, co-expression and co-localization, etc., Given two protein sequences, the structure-based interaction prediction technique threads these two sequences to all the protein complexes in the PDB and then chooses the best potential match. Based on this match, structural information is incorporated into logistic regression to evaluate the probability of these two proteins interacting. This paper also describes a random forest classifier which can effectively combine the structure-based prediction results and other functional annotations together to predict protein interactions. Experimental results indicate that the predictive power of the structure-based method is better than many other information sources. Also, combining the structure-based method with other information sources allows us to achieve a better performance than when structure information is not used. We also tested our method on a set of approximately 1000 yeast genes and, interestingly, the predicted interaction network is a scale-free network. Our method predicted some potential interactions involving yeast homologs of human disease-related proteins. SUPPLEMENTARY INFORMATION http://theory.csail.mit.edu/struct2net

Journal ArticleDOI
TL;DR: The connections established by yeast proteins are studied and a preferential attachment between essential proteins is discovered and it is proposed that this core exponential network may represent a generic scaffold around which organism-specific and taxon-specific proteins and interactions coalesce.
Abstract: Protein interactions in the budding yeast have been shown to form a scale-free network, a feature of other organized networks such as bacterial and archaeal metabolism and the World Wide Web. Here, we study the connections established by yeast proteins and discover a preferential attachment between essential proteins. The essential-essential connections are long ranged and form a subnetwork where the giant component includes 97% of these proteins. Unexpectedly, this subnetwork displays an exponential connectivity distribution, in sharp contrast to the scale-free topology of the complete network. Furthermore, the wide phylogenetic extent of these core proteins and interactions provides evidence that they represent the ancestral state of the yeast protein interaction network. Finally, we propose that this core exponential network may represent a generic scaffold around which organism-specific and taxon-specific proteins and interactions coalesce.

Journal ArticleDOI
TL;DR: Experimental results demonstrate that a global, system-wide approach-such as IRAP that considers the entire interaction network instead of merely local neighbors-is a much more promising approach for assessing the reliability of PPIs.

Journal ArticleDOI
TL;DR: The network analysis reveals that ligand binding has “network-bridging effects” on the DHFR structure, with most of the interaction networks now passing through the cofactor, shortening the average shortest path.
Abstract: Residue interaction networks and loop motions are important for catalysis in dihydrofolate reductase (DHFR). Here, we investigate the effects of ligand binding and chain connectivity on network communication in DHFR. We carry out systematic network analysis and molecular dynamics simulations of the native DHFR and 19 of its circularly permuted variants by breaking the chain connections in ten folding element regions and in nine nonfolding element regions as observed by experiment. Our studies suggest that chain cleavage in folding element areas may deactivate DHFR due to large perturbations in the network properties near the active site. The protein active site is near or coincides with residues through which the shortest paths in the residue interaction network tend to go. Further, our network analysis reveals that ligand binding has "network-bridging effects" on the DHFR structure. Our results suggest that ligand binding leads to a modification, with most of the interaction networks now passing through the cofactor, shortening the average shortest path. Ligand binding at the active site has profound effects on the network centrality, especially the closeness.

Journal ArticleDOI
TL;DR: The robustness of the p53 network is studied by analyzing its degeneration under two modes of attack, including mutational knockouts of proteins and the directed attacks mounted by tumour inducing viruses.