scispace - formally typeset
Search or ask a question

Showing papers on "Interaction network published in 2010"


Journal ArticleDOI
TL;DR: A highly reliable functional interaction network upon expert-curated pathways is built and applied to the analysis of two genome-wide GBM and several other cancer data sets, suggesting common mechanisms in the cancer biology.
Abstract: One challenge facing biologists is to tease out useful information from massive data sets for further analysis. A pathway-based analysis may shed light by projecting candidate genes onto protein functional relationship networks. We are building such a pathway-based analysis system. We have constructed a protein functional interaction network by extending curated pathways with non-curated sources of information, including protein-protein interactions, gene coexpression, protein domain interaction, Gene Ontology (GO) annotations and text-mined protein interactions, which cover close to 50% of the human proteome. By applying this network to two glioblastoma multiforme (GBM) data sets and projecting cancer candidate genes onto the network, we found that the majority of GBM candidate genes form a cluster and are closer than expected by chance, and the majority of GBM samples have sequence-altered genes in two network modules, one mainly comprising genes whose products are localized in the cytoplasm and plasma membrane, and another comprising gene products in the nucleus. Both modules are highly enriched in known oncogenes, tumor suppressors and genes involved in signal transduction. Similar network patterns were also found in breast, colorectal and pancreatic cancers. We have built a highly reliable functional interaction network upon expert-curated pathways and applied this network to the analysis of two genome-wide GBM and several other cancer data sets. The network patterns revealed from our results suggest common mechanisms in the cancer biology. Our system should provide a foundation for a network or pathway-based analysis platform for cancer and other diseases.

626 citations


Journal ArticleDOI
TL;DR: This paper characterizes the rate of convergence as a function of the structure of the interaction network and considers scenarios where the individuals’ behavior is the result of a strategic choice among competing alternatives based on the dynamics of coordination games.
Abstract: Which network structures favor the rapid spread of new ideas, behaviors, or technologies? This question has been studied extensively using epidemic models. Here we consider a complementary point of view and consider scenarios where the individuals’ behavior is the result of a strategic choice among competing alternatives. In particular, we study models that are based on the dynamics of coordination games. Classical results in game theory studying this model provide a simple condition for a new action or innovation to become widespread in the network. The present paper characterizes the rate of convergence as a function of the structure of the interaction network. The resulting predictions differ strongly from the ones provided by epidemic models. In particular, it appears that innovation spreads much more slowly on well-connected network structures dominated by long-range links than in low-dimensional ones dominated, for example, by geographic proximity.

393 citations


Journal ArticleDOI
12 Feb 2010-PLOS ONE
TL;DR: It is confirmed and extended the observation that GBM alterations tend to occur within specific functional modules, in spite of considerable patient-to-patient variation, and that two of the largest modules involve signaling via p53, Rb, PI3K and receptor protein kinases.
Abstract: Background Glioblastoma multiforme (GBM) is the most common and aggressive type of brain tumor in humans and the first cancer with comprehensive genomic profiles mapped by The Cancer Genome Atlas (TCGA) project A central challenge in large-scale genome projects, such as the TCGA GBM project, is the ability to distinguish cancer-causing “driver” mutations from passively selected “passenger” mutations Principal Findings In contrast to a purely frequency based approach to identifying driver mutations in cancer, we propose an automated network-based approach for identifying candidate oncogenic processes and driver genes The approach is based on the hypothesis that cellular networks contain functional modules, and that tumors target specific modules critical to their growth Key elements in the approach include combined analysis of sequence mutations and DNA copy number alterations; use of a unified molecular interaction network consisting of both protein-protein interactions and signaling pathways; and identification and statistical assessment of network modules, ie cohesive groups of genes of interest with a higher density of interactions within groups than between groups Conclusions We confirm and extend the observation that GBM alterations tend to occur within specific functional modules, in spite of considerable patient-to-patient variation, and that two of the largest modules involve signaling via p53, Rb, PI3K and receptor protein kinases We also identify new candidate drivers in GBM, including AGAP2/CENTG1, a putative oncogene and an activator of the PI3K pathway; and, three additional significantly altered modules, including one involved in microtubule organization To facilitate the application of our network-based approach to additional cancer types, we make the method freely available as part of a software tool called NetBox

347 citations


Journal ArticleDOI
TL;DR: The state-of-the-art techniques for computational detection of protein complexes are reviewed, some promising research directions in this field are discussed, and experimental results with yeast protein interaction data show that the interaction subgraphs discovered by various computational methods matched well with actual protein complexes.
Abstract: Most proteins form macromolecular complexes to perform their biological functions. However, experimentally determined protein complex data, especially of those involving more than two protein partners, are relatively limited in the current state-of-the-art high-throughput experimental techniques. Nevertheless, many techniques (such as yeast-two-hybrid) have enabled systematic screening of pairwise protein-protein interactions en masse. Thus computational approaches for detecting protein complexes from protein interaction data are useful complements to the limited experimental methods. They can be used together with the experimental methods for mapping the interactions of proteins to understand how different proteins are organized into higher-level substructures to perform various cellular functions. Given the abundance of pairwise protein interaction data from high-throughput genome-wide experimental screenings, a protein interaction network can be constructed from protein interaction data by considering individual proteins as the nodes, and the existence of a physical interaction between a pair of proteins as a link. This binary protein interaction graph can then be used for detecting protein complexes using graph clustering techniques. In this paper, we review and evaluate the state-of-the-art techniques for computational detection of protein complexes, and discuss some promising research directions in this field. Experimental results with yeast protein interaction data show that the interaction subgraphs discovered by various computational methods matched well with actual protein complexes. In addition, the computational approaches have also improved in performance over the years. Further improvements could be achieved if the quality of the underlying protein interaction data can be considered adequately to minimize the undesirable effects from the irrelevant and noisy sources, and the various biological evidences can be better incorporated into the detection process to maximize the exploitation of the increasing wealth of biological knowledge available.

338 citations


Journal ArticleDOI
TL;DR: A manifold regularization semi-supervised learning method is presented to tackle the issue of predicting drug-protein interactions from heterogeneous biological data sources by using labeled and unlabeled information which often generates better results than using the labeled data alone.
Abstract: Predicting drug-protein interactions from heterogeneous biological data sources is a key step for in silico drug discovery. The difficulty of this prediction task lies in the rarity of known drug-protein interactions and myriad unknown interactions to be predicted. To meet this challenge, a manifold regularization semi-supervised learning method is presented to tackle this issue by using labeled and unlabeled information which often generates better results than using the labeled data alone. Furthermore, our semi-supervised learning method integrates known drug-protein interaction network information as well as chemical structure and genomic sequence data. Using the proposed method, we predicted certain drug-protein interactions on the enzyme, ion channel, GPCRs, and nuclear receptor data sets. Some of them are confirmed by the latest publicly available drug targets databases such as KEGG. We report encouraging results of using our method for drug-protein interaction network reconstruction which may shed light on the molecular interaction inference and new uses of marketed drugs.

335 citations


Journal ArticleDOI
TL;DR: In STITCH 2, the number of relevant interactions is increased by incorporation of BindingDB, PharmGKB and the Comparative Toxicogenomics Database, and InChIKeys that allow identification of chemicals with a short, checksum-like string are adopted.
Abstract: Over the last years, the publicly available knowledge on interactions between small molecules and proteins has been steadily increasing. To create a network of interactions, STITCH aims to integrate the data dispersed over the literature and various databases of biological pathways, drug-target relationships and binding affinities. In STITCH 2, the number of relevant interactions is increased by incorporation of BindingDB, PharmGKB and the Comparative Toxicogenomics Database. The resulting network can be explored interactively or used as the basis for large-scale analyses. To facilitate links to other chemical databases, we adopt InChIKeys that allow identification of chemicals with a short, checksum-like string. STITCH 2.0 connects proteins from 630 organisms to over 74,000 different chemicals, including 2200 drugs. STITCH can be accessed at http://stitch.embl.de/.

222 citations


Journal ArticleDOI
TL;DR: It is shown that networks with best complete synchronization, least coupling cost, and maximum dynamical robustness, have arbitrary complexity but quantized total interaction strength, which constrains the allowed number of connections.
Abstract: Synchronization, in which individual dynamical units keep in pace with each other in a decentralized fashion, depends both on the dynamical units and on the properties of the interaction network. Yet, the role played by the network has resisted comprehensive characterization within the prevailing paradigm that interactions facilitating pairwise synchronization also facilitate collective synchronization. Here we challenge this paradigm and show that networks with best complete synchronization, least coupling cost, and maximum dynamical robustness, have arbitrary complexity but quantized total interaction strength, which constrains the allowed number of connections. It stems from this characterization that negative interactions as well as link removals can be used to systematically improve and optimize synchronization properties in both directed and undirected networks. These results extend the recently discovered compensatory perturbations in metabolic networks to the realm of oscillator networks and demonstrate why “less can be more” in network synchronization.

166 citations


Journal ArticleDOI
TL;DR: A binary protein–protein interaction map of core cell cycle proteins of Arabidopsis thaliana is created using two complementary interaction assays, yeast two-hybrid and bimolecular fluorescence complementation and constitutes a framework for further in-depth analysis of the cell cycle machinery.
Abstract: As in other eukaryotes, cell division in plants is highly conserved and regulated by cyclin-dependent kinases (CDKs) that are themselves predominantly regulated at the posttranscriptional level by their association with proteins such as cyclins. Although over the last years the knowledge of the plant cell cycle has considerably increased, little is known on the assembly and regulation of the different CDK complexes. To map protein-protein interactions between core cell cycle proteins of Arabidopsis thaliana, a binary protein-protein interactome network was generated using two complementary high-throughput interaction assays, yeast two-hybrid and bimolecular fluorescence complementation. Pairwise interactions among 58 core cell cycle proteins were tested, resulting in 357 interactions, of which 293 have not been reported before. Integration of the binary interaction results with cell cycle phase-dependent expression information and localization data allowed the construction of a dynamic interaction network. The obtained interaction map constitutes a framework for further in-depth analysis of the cell cycle machinery.

164 citations


Journal ArticleDOI
TL;DR: This work reviews the currently known features that are particular to hubs, possibly affecting their binding ability, and looks at the levels of intrinsic disorder, surface charge and domain distribution in hubs, as compared to non-hubs, along with differences in their functional domains.
Abstract: Hubs are proteins with a large number of interactions in a protein-protein interaction network. They are the principal agents in the interaction network and affect its function and stability. Their specific recognition of many different protein partners is of great interest from the structural viewpoint. Over the last few years, the structural properties of hubs have been extensively studied. We review the currently known features that are particular to hubs, possibly affecting their binding ability. Specifically, we look at the levels of intrinsic disorder, surface charge and domain distribution in hubs, as compared to non-hubs, along with differences in their functional domains.

145 citations


Journal ArticleDOI
TL;DR: In this review, recent advances in clustering methods for protein interaction networks will be presented in detail and the predictions of protein functions and interactions based on modules will be covered.
Abstract: The increasing availability of large-scale protein-protein interaction data has made it possible to understand the basic components and organization of cell machinery from the network level. The arising challenge is how to analyze such complex interacting data to reveal the principles of cellular organization, processes and functions. Many studies have shown that clustering protein interaction network is an effective approach for identifying protein complexes or functional modules, which has become a major research topic in systems biology. In this review, recent advances in clustering methods for protein interaction networks will be presented in detail. The predictions of protein functions and interactions based on modules will be covered. Finally, the performance of different clustering methods will be compared and the directions for future research will be discussed.

137 citations


Journal ArticleDOI
TL;DR: A novel mass spectrometry-cleavable cross-linking strategy embodied in Protein Interaction Reporter (PIR) technology was recently successfully applied for in vivo identification of protein-protein interactions as well as actual regions of the interacting proteins that share close proximity while present within cells.
Abstract: Chemical cross-linking coupled with mass spectrometry, an emerging approach for protein topology and interaction studies, has gained increasing interest in the past few years. A number of recent proof-of-principle studies on model proteins or protein complex systems with improved cross-linking strategies have shown great promise. However, the heterogeneity and low abundance of the cross-linked products as well as data complexity continue to pose enormous challenges for large-scale application of cross-linking approaches. A novel mass spectrometry-cleavable cross-linking strategy embodied in Protein Interaction Reporter (PIR) technology, first reported in 2005, was recently successfully applied for in vivo identification of protein-protein interactions as well as actual regions of the interacting proteins that share close proximity while present within cells. PIR technology holds great promise for achieving the ultimate goal of mapping protein interaction network at systems level using chemical cross-linking. In this review, we will briefly describe the recent progress in the field of chemical cross-linking development with an emphasis on the PIR concepts, its applications and future directions.

Journal ArticleDOI
29 Jun 2010-PLOS ONE
TL;DR: This study provides the first comprehensive review of the network and pathway characteristics of schizophrenia candidate genes and constructed the first schizophrenia molecular network (SMN), which revealed that schizophrenia is a dynamic process caused by dysregulation of the multiple pathways.
Abstract: Background Schizophrenia (SZ) is a heritable, complex mental disorder. We have seen limited success in finding causal genes for schizophrenia from numerous conventional studies. Protein interaction network and pathway-based analysis may provide us an alternative and effective approach to investigating the molecular mechanisms of schizophrenia. Methodology/Principal Findings We selected a list of schizophrenia candidate genes (SZGenes) using a multi-dimensional evidence-based approach. The global network properties of proteins encoded by these SZGenes were explored in the context of the human protein interactome while local network properties were investigated by comparing SZ-specific and cancer-specific networks that were extracted from the human interactome. Relative to cancer genes, we observed that SZGenes tend to have an intermediate degree and an intermediate efficiency on a perturbation spreading throughout the human interactome. This suggested that schizophrenia might have different pathological mechanisms from cancer even though both are complex diseases. We conducted pathway analysis using Ingenuity System and constructed the first schizophrenia molecular network (SMN) based on protein interaction networks, pathways and literature survey. We identified 24 pathways overrepresented in SZGenes and examined their interactions and crosstalk. We observed that these pathways were related to neurodevelopment, immune system, and retinoic X receptor (RXR). Our examination of SMN revealed that schizophrenia is a dynamic process caused by dysregulation of the multiple pathways. Finally, we applied the network/pathway approach to identify novel candidate genes, some of which could be verified by experiments. Conclusions/Significance This study provides the first comprehensive review of the network and pathway characteristics of schizophrenia candidate genes. Our preliminary results suggest that this systems biology approach might prove promising for selection of candidate genes for complex diseases. Our findings have important implications for the molecular mechanisms for schizophrenia and, potentially, other psychiatric disorders.

Journal ArticleDOI
TL;DR: A topology-free querying algorithm, given a query, Torque seeks a matching set of proteins that are sequence-similar to the query proteins and span a connected region of the network, while allowing both insertions and deletions.
Abstract: In the network querying problem, one is given a protein complex or pathway of species A and a protein-protein interaction network of species B; the goal is to identify subnetworks of B that are similar to the query in terms of sequence, topology, or both. Existing approaches mostly depend on knowledge of the interaction topology of the query in the network of species A; however, in practice, this topology is often not known. To address this problem, we develop a topology-free querying algorithm, which we call Torque. Given a query, represented as a set of proteins, Torque seeks a matching set of proteins that are sequence-similar to the query proteins and span a connected region of the network, while allowing both insertions and deletions. The algorithm uses alternatively dynamic programming and integer linear programming for the search task. We test Torque with queries from yeast, fly, and human, where we compare it to the QNet topology-based approach, and with queries from less studied species, where only topology-free algorithms apply. Torque detects many more matches than QNet, while giving results that are highly functionally coherent.

Journal ArticleDOI
TL;DR: In this article, the authors investigate the link between biological modules and network communities in yeast and its relationship to the scale at which they probe the network, and demonstrate that the functional homogeneity of communities depends on the scale selected, and that almost all proteins lie in a functionally homogeneous community at some scale.
Abstract: If biology is modular then clusters, or communities, of proteins derived using only protein interaction network structure should define protein modules with similar biological roles. We investigate the link between biological modules and network communities in yeast and its relationship to the scale at which we probe the network. Our results demonstrate that the functional homogeneity of communities depends on the scale selected, and that almost all proteins lie in a functionally homogeneous community at some scale. We judge functional homogeneity using a novel test and three independent characterizations of protein function, and find a high degree of overlap between these measures. We show that a high mean clustering coefficient of a community can be used to identify those that are functionally homogeneous. By tracing the community membership of a protein through multiple scales we demonstrate how our approach could be useful to biologists focusing on a particular protein. We show that there is no one scale of interest in the community structure of the yeast protein interaction network, but we can identify the range of resolution parameters that yield the most functionally coherent communities, and predict which communities are most likely to be functionally homogeneous.

Journal ArticleDOI
TL;DR: By applying a series of clustering methods to proteins' topological signature similarities, it is demonstrated that the obtained clusters are significantly enriched with cancer genes, and clear evidence is provided that PPI network structure around cancer genes is different from the structure around non-cancer genes.
Abstract: Many real-world phenomena have been described in terms of large networks. Networks have been invaluable models for the understanding of biological systems. Since proteins carry out most biological processes, we focus on analysing protein–protein interaction (PPI) networks. Proteins interact to perform a function. Thus, PPI networks reflect the interconnected nature of biological processes and analysing their structural properties could provide insights into biological function and disease. We have already demonstrated, by using a sensitive graph theoretic method for comparing topologies of node neighbourhoods called ‘graphlet degree signatures’, that proteins with similar surroundings in PPI networks tend to perform the same functions. Here, we explore whether the involvement of genes in cancer suggests the similarity of their topological ‘signatures’ as well. By applying a series of clustering methods to proteins' topological signature similarities, we demonstrate that the obtained clusters are significantly enriched with cancer genes. We apply this methodology to identify novel cancer gene candidates, validating 80 per cent of our predictions in the literature. We also validate predictions biologically by identifying cancer-related negative regulators of melanogenesis identified in our siRNA screen. This is encouraging, since we have done this solely from PPI network topology. We provide clear evidence that PPI network structure around cancer genes is different from the structure around non-cancer genes. Understanding the underlying principles of this phenomenon is an open question, with a potential for increasing our understanding of complex diseases.

Journal ArticleDOI
TL;DR: This study looks at the construction of a global protein-protein interaction (PPI) network for the human pathogen Mycobacterium tuberculosis H37Rv, based on a high-throughput bacterial two-hybrid method.
Abstract: Analysis of the protein-protein interaction network of a pathogen is a powerful approach for dissecting gene function, potential signal transduction, and virulence pathways. This study looks at the construction of a global protein-protein interaction (PPI) network for the human pathogen Mycobacterium tuberculosis H37Rv, based on a high-throughput bacterial two-hybrid method. Almost the entire ORFeome was cloned, and more than 8000 novel interactions were identified. The overall quality of the PPI network was validated through two independent methods, and a high success rate of more than 60% was obtained. The parameters of PPI networks were calculated. The average shortest path length was 4.31. The topological coefficient of the M. tuberculosis B2H network perfectly followed a power law distribution (correlation = 0.999; R-squared = 0.999) and represented the best fit in all currently available PPI networks. A cross-species PPI network comparison revealed 94 conserved subnetworks between M. tuberculosis and several prokaryotic organism PPI networks. The global network was linked to the protein secretion pathway. Two WhiB-like regulators were found to be highly connected proteins in the global network. This is the first systematic noncomputational PPI data for the human pathogen, and it provides a useful resource for studies of infection mechanisms, new signaling pathways, and novel antituberculosis drug development.

Journal ArticleDOI
TL;DR: This review defines the concept of plant interactome and the protein-protein interaction network, and compares the pros and cons for different strategies for interactome mapping including yeast two-hybrid system (Y2H), affinity purification mass spectrometry, bimolecular fluorescence complementation, and in silico prediction.
Abstract: Protein-protein interaction network represents an important aspect of systems biology. The understanding of the plant protein-protein interaction network and interactome will provide crucial insights into the regulation of plant developmental, physiological, and pathological processes. In this review, we will first define the concept of plant interactome and the protein-protein interaction network. The significance of the plant interactome study will be discussed. We will then compare the pros and cons for different strategies for interactome mapping including yeast two-hybrid system (Y2H), affinity purification mass spectrometry (AP-MS), bimolecular fluorescence complementation (BiFC), and in silico prediction. The application of these platforms on specific plant biology questions will be further discussed. The recent advancements revealed the great potential for plant protein-protein interaction network and interactome to elucidate molecular mechanisms for signal transduction, stress responses, cell cycle control, pattern formation, and others. Mapping the plant interactome in model species will provide important guideline for the future study of plant biology.

Journal ArticleDOI
TL;DR: In this paper, the protein interaction network of Leishmaniasis major was predicted by using three validated methods: PSIMAP, PEIMAP and iPfam, and calculated a high confidence network (confidence score > 0.70) with 1,366 nodes and 33,861 interactions.
Abstract: Background: Leishmaniasis is a virulent parasitic infection that causes a worldwide disease burden. Most treatments have toxic side-effects and efficacy has decreased due to the emergence of resistant strains. The outlook is worsened by the absence of promising drug targets for this disease. We have taken a computational approach to the detection of new drug targets, which may become an effective strategy for the discovery of new drugs for this tropical disease. Results: We have predicted the protein interaction network of Leishmania major by using three validated methods: PSIMAP, PEIMAP, and iPfam. Combining the results from these methods, we calculated a high confidence network (confidence score > 0.70) with 1,366 nodes and 33,861 interactions. We were able to predict the biological process for 263 interacting proteins by doing enrichment analysis of the clusters detected. Analyzing the topology of the network with metrics such as connectivity and betweenness centrality, we detected 142 potential drug targets after homology filtering with the human proteome. Further experiments can be done to validate these targets. Conclusion: We have constructed the first protein interaction network of the Leishmania major parasite by using a computational approach. The topological analysis of the protein network enabled us to identify a set of candidate proteins that may be both (1) essential for parasite survival and (2) without human orthologs. These potential targets are promising for further experimental validation. This strategy, if validated, may augment established drug discovery methodologies, for this and possibly other tropical diseases, with a relatively low additional investment of time and resources.

Journal ArticleDOI
30 Jul 2010-PLOS ONE
TL;DR: Network features were found to be most important for accurate prediction and can significantly improve the prediction performance, and the results suggest that the protein interaction context could provide important clues to help better illustrate SAP's functional association.
Abstract: Non-synonymous SNPs (nsSNPs), also known as Single Amino acid Polymorphisms (SAPs) account for the majority of human inherited diseases. It is important to distinguish the deleterious SAPs from neutral ones. Most traditional computational methods to classify SAPs are based on sequential or structural features. However, these features cannot fully explain the association between a SAP and the observed pathophysiological phenotype. We believe the better rationale for deleterious SAP prediction should be: If a SAP lies in the protein with important functions and it can change the protein sequence and structure severely, it is more likely related to disease. So we established a method to predict deleterious SAPs based on both protein interaction network and traditional hybrid properties. Each SAP is represented by 472 features that include sequential features, structural features and network features. Maximum Relevance Minimum Redundancy (mRMR) method and Incremental Feature Selection (IFS) were applied to obtain the optimal feature set and the prediction model was Nearest Neighbor Algorithm (NNA). In jackknife cross-validation, 83.27% of SAPs were correctly predicted when the optimized 263 features were used. The optimized predictor with 263 features was also tested in an independent dataset and the accuracy was still 80.00%. In contrast, SIFT, a widely used predictor of deleterious SAPs based on sequential features, has a prediction accuracy of 71.05% on the same dataset. In our study, network features were found to be most important for accurate prediction and can significantly improve the prediction performance. Our results suggest that the protein interaction context could provide important clues to help better illustrate SAP's functional association. This research will facilitate the post genome-wide association studies.

Journal ArticleDOI
TL;DR: This work extracts functional neighborhood features of a gene using Random Walks with Restarts and employs a KNN classifier to predict the function of uncharacterized genes based on the computed neighborhood features, providing a natural control of the trade-off between accuracy and coverage of prediction.
Abstract: The recent advent of high-throughput methods has generated large amounts of gene interaction data. This has allowed the construction of genomewide networks. A significant number of genes in such networks remain uncharacterized and predicting the molecular function of these genes remains a major challenge. A number of existing techniques assume that genes with similar functions are topologically close in the network. Our hypothesis is that genes with similar functions observe similar annotation patterns in their neighborhood, regardless of the distance between them in the interaction network. We thus predict molecular functions of uncharacterized genes by comparing their functional neighborhoods to genes of known function. We propose a two-phase approach. First, we extract functional neighborhood features of a gene using Random Walks with Restarts. We then employ a KNN classifier to predict the function of uncharacterized genes based on the computed neighborhood features. We perform leave-one-out validation experiments on two S. cerevisiae interaction networks and show significant improvements over previous techniques. Our technique provides a natural control of the trade-off between accuracy and coverage of prediction. We further propose and evaluate prediction in sparse genomes by exploiting features from well-annotated genomes.

Journal ArticleDOI
TL;DR: The proposed method for extending cellular pathways helps to explain the functions of cancer mutated genes by exploiting the synergies of canonical knowledge and large-scale interaction data.
Abstract: Cellular processes and pathways, whose deregulation may contribute to the development of cancers, are often represented as cascades of proteins transmitting a signal from the cell surface to the nucleus. However, recent functional genomic experiments have identified thousands of interactions for the signalling canonical proteins, challenging the traditional view of pathways as independent functional entities. Combining information from pathway databases and interaction networks obtained from functional genomic experiments is therefore a promising strategy to obtain more robust pathway and process representations, facilitating the study of cancer-related pathways. We present a methodology for extending pre-defined protein sets representing cellular pathways and processes by mapping them onto a protein-protein interaction network, and extending them to include densely interconnected interaction partners. The added proteins display distinctive network topological features and molecular function annotations, and can be proposed as putative new components, and/or as regulators of the communication between the different cellular processes. Finally, these extended pathways and processes are used to analyse their enrichment in pancreatic mutated genes. Significant associations between mutated genes and certain processes are identified, enabling an analysis of the influence of previously non-annotated cancer mutated genes. The proposed method for extending cellular pathways helps to explain the functions of cancer mutated genes by exploiting the synergies of canonical knowledge and large-scale interaction data.

Journal ArticleDOI
TL;DR: A novel methodology is developed to delimit the core set of host-cellular functions and their associated perturbation from the HHPID, which disentangles the complex set of HIV-1-host protein interactions, reconciles these with siRNA screens and provides an accessible and interpretable map of infection.
Abstract: Human immunodeficiency virus type 1 (HIV-1) exploits a diverse array of host cell functions in order to replicate. This is mediated through a network of virus-host interactions. A variety of recent studies have catalogued this information. In particular the HIV-1, Human Protein Interaction Database (HHPID) has provided a unique depth of protein interaction detail. However, as a map of HIV-1 infection, the HHPID is problematic, as it contains curation error and redundancy; in addition, it is based on a heterogeneous set of experimental methods. Based on identifying shared patterns of HIV-host interaction, we have developed a novel methodology to delimit the core set of host-cellular functions and their associated perturbation from the HHPID. Initially, using biclustering, we identify 279 significant sets of host proteins that undergo the same types of interaction. The functional cohesiveness of these protein sets was validated using a human protein-protein interaction network, gene ontology annotation and sequence similarity. Next, using a distance measure, we group host protein sets and identify 37 distinct higher-level subsystems. We further demonstrate the biological significance of these subsystems by cross-referencing with global siRNA screens that have been used to detect host factors necessary for HIV-1 replication, and investigate the seemingly small intersect between these data sets. Our results highlight significant host-cell subsystems that are perturbed during the course of HIV-1 infection. Moreover, we characterise the patterns of interaction that contribute to these perturbations. Thus, our work disentangles the complex set of HIV-1-host protein interactions in the HHPID, reconciles these with siRNA screens and provides an accessible and interpretable map of infection.

Patent
20 Aug 2010
TL;DR: In this article, methods of diagnosing and treating microbiome-associated disease or improving health using interaction network parameters are provided, including and beyond correlation to use these "highly-connected" organisms or molecules as targets for modulation or as therapeutic agents to improve health.
Abstract: Methods of diagnosing and treating microbiome-associated disease or improving health using interaction network parameters are provided. Methods are provided to analyze interaction networks between microbes, and between microbes and the host, to determine important (e.g. "highly- connected") organisms or molecules as determined by various network parameters. Methods are provided including and beyond correlation to use these "highly-connected" organisms or molecules as targets for modulation or as therapeutic agents to improve health.

Journal ArticleDOI
TL;DR: The results indicate that the structural effects and demographic variables active in the real world influence the evolution of the players’ interaction network in MMOGs, but do not provide evidence that players�’ structural embeddedness in the interaction network influences player performance.
Abstract: This article examines the co-evolution of players’ individual performance and their interaction network in a Massively Multiplayer Online Game (MMOG). The objective is to test whether the application of theories from the real world is valid in virtual worlds. While the results indicate that the structural effects and demographic variables active in the real world influence the evolution of the players’ interaction network in MMOGs (e.g. transitivity, reciprocity, and homophily), they do not provide evidence that players’ structural embeddedness in the interaction network influences player performance. These findings have important implications for researchers and practitioners who need to understand social processes in MMOGs (e.g., when launching marketing campaigns in MMOGs) or who study MMOGs and then use their findings to draw conclusions about the real world (e.g., when analyzing the relationship between employee performance and network structure).

Journal ArticleDOI
TL;DR: The evolutionary assumptions implicit in many of the protein interaction prediction methods are elucidated and the caution needed in deploying certain evolutionary assumptions is drawn, in particular cross-organism transfer of interactions by sequence homology.
Abstract: Here we review the methods for the prediction of protein interactions and the ideas in protein evolution that relate to them. The evolutionary assumptions implicit in many of the protein interaction prediction methods are elucidated. We draw attention to the caution needed in deploying certain evolutionary assumptions, in particular cross-organism transfer of interactions by sequence homology, and discuss the known issues in deriving interaction predictions from evidence of co-evolution. We also conject that there is evolutionary knowledge yet to be exploited in the prediction of interactions, in particular the heterogeneity of interactions, the increasing availability of interaction data from multiple species, and the models of protein interaction network growth.

Journal ArticleDOI
01 Jan 2010
TL;DR: This paper proposes a novel approach for function prediction by identifying frequent patterns of functional associations in a protein interaction network by matching the subgraphs of the unknown protein with the frequent patterns analogous to it.
Abstract: Predicting protein function from protein interaction networks has been challenging because of the complexity of functional relationships among proteins. Most previous function prediction methods depend on the neighborhood of or the connected paths to known proteins. However, their accuracy has been limited due to the functional inconsistency of interacting proteins. In this paper, we propose a novel approach for function prediction by identifying frequent patterns of functional associations in a protein interaction network. A set of functions that a protein performs is assigned into the corresponding node as a label. A functional association pattern is then represented as a labeled subgraph. Our frequent labeled subgraph mining algorithm efficiently searches the functional association patterns that occur frequently in the network. It iteratively increases the size of frequent patterns by one node at a time by selective joining, and simplifies the network by a priori pruning. Using the yeast protein interaction network, our algorithm found more than 1400 frequent functional association patterns. The function prediction is performed by matching the subgraph, including the unknown protein, with the frequent patterns analogous to it. By leave-one-out cross validation, we show that our approach has better performance than previous link-based methods in terms of prediction accuracy. The frequent functional association patterns generated in this study might become the foundations of advanced analysis for functional behaviors of proteins in a system level.

Journal ArticleDOI
TL;DR: This review aims to describe the basic concepts of protein domain evolution and illustrate recent developments in molecular evolution that have provided valuable new insights in the field of comparative genomics and protein interaction networks.
Abstract: The proteomes that make up the collection of proteins in contemporary organisms evolved through recombination and duplication of a limited set of domains. These protein domains are essentially the main components of globular proteins and are the most principal level at which protein function and protein interactions can be understood. An important aspect of domain evolution is their atomic structure and biochemical function, which are both specified by the information in the amino acid sequence. Changes in this information may bring about new folds, functions and protein architectures. With the present and still increasing wealth of sequences and annotation data brought about by genomics, new evolutionary relationships are constantly being revealed, unknown structures modeled and phylogenies inferred. Such investigations not only help predict the function of newly discovered proteins, but also assist in mapping unforeseen pathways of evolution and reveal crucial, co-evolving inter- and intra-molecular interactions. In turn this will help us describe how protein domains shaped cellular interaction networks and the dynamics with which they are regulated in the cell. Additionally, these studies can be used for the design of new and optimized protein domains for therapy. In this review, we aim to describe the basic concepts of protein domain evolution and illustrate recent developments in molecular evolution that have provided valuable new insights in the field of comparative genomics and protein interaction networks.

Journal ArticleDOI
Suk-Hoon Jung1, Bora Hyun1, Woo-Hyuk Jang1, Hee-Young Hur1, Dongsoo Han1 
TL;DR: The evaluation results show that the proposed method outperforms the simple PPIN-based method in terms of removing false positive proteins in the formation of complexes and shows that excluding competition between MEIs can be effective for improving prediction accuracy in general computational approaches involving protein interactions.
Abstract: Motivation: The increase in the amount of available protein–protein interaction (PPI) data enables us to develop computational methods for protein complex predictions. A protein complex is a group of proteins that interact with each other at the same time and place. The protein complex generally corresponds to a cluster in PPI network (PPIN). However, clusters correspond not only to protein complexes but also to sets of proteins that interact dynamically with each other. As a result, conventional graph-theoretic clustering methods that disregard interaction dynamics show high false positive rates in protein complex predictions. Results: In this article, a method of refining PPIN is proposed that uses the structural interface data of protein pairs for protein complex predictions. A simultaneous protein interaction network (SPIN) is introduced to specify mutually exclusive interactions (MEIs) as indicated from the overlapping interfaces and to exclude competition from MEIs that arise during the detection of protein complexes. After constructing SPINs, naive clustering algorithms are applied to the SPINs for protein complex predictions. The evaluation results show that the proposed method outperforms the simple PPIN-based method in terms of removing false positive proteins in the formation of complexes. This shows that excluding competition between MEIs can be effective for improving prediction accuracy in general computational approaches involving protein interactions. Availability: http://code.google.com/p/simultaneous-pin/ Contact: dshan@kaist.ac.kr Supplementary information:Supplementary data are available at Bioinformatics online.

Journal ArticleDOI
26 Mar 2010-PLOS ONE
TL;DR: Together the results provide good evidence for the existence of a large, robust ssRNA interaction network with distinct regulatory function, which could have a massive effect on the regulation of gene expression via mediation of transcript levels.
Abstract: Background In plants and animals there are many classes of short RNAs that carry out a wide range of functions within the cell; short silencing RNAs (ssRNAs) of 21–25 nucleotides in length are produced from double-stranded RNA precursors by the protein Dicer and guide nucleases and other proteins to their RNA targets through base pairing interactions. The consequence of this process is degradation of the targeted RNA, suppression of its translation or initiation of secondary ssRNA production. The secondary ssRNAs in turn could then initiate further layers of ssRNA production to form extensive cascades and networks of interacting RNA [1]. Previous empirical analysis in plants established the existence of small secondary ssRNA cascade [2], in which a single instance of this event occurred but it was not known whether there are other more extensive networks of secondary sRNA production. Methodology/Principal Findings We generated a network by predicting targets of ssRNA populations obtained from high-throughput sequencing experiments. The topology of the network shows it to have power law connectivity distribution, to be dissortative, highly clustered and composed of multiple components. We also identify protein families, PPR and ULP1, that act as hubs within the network. Comparison of the repetition of genomic sub-sequences of ssRNA length between Arabidopsis and E.coli suggest that the network structure is made possible by the underlying repetitiveness in the genome sequence. Conclusions/Significance Together our results provide good evidence for the existence of a large, robust ssRNA interaction network with distinct regulatory function. Such a network could have a massive effect on the regulation of gene expression via mediation of transcript levels.

Journal ArticleDOI
TL;DR: This work suggests SNPrank to be a powerful method for identifying network effects in genetic association data and reveals a potential vitamin regulation network association with antibody response.
Abstract: The variation in antibody response to vaccination likely involves small contributions of numerous genetic variants, such as single-nucleotide polymorphisms (SNPs), which interact in gene networks and pathways. To accumulate the bits of genetic information relevant to the phenotype that are distributed throughout the interaction network, we develop a network eigenvector centrality algorithm (SNPrank) that is sensitive to the weak main effects, gene–gene interactions and small higher-order interactions through hub effects. Analogous to Google PageRank, we interpret the algorithm as the simulation of a random SNP surfer (RSS) that accumulates bits of information in the network through a dynamic probabilistic Markov chain. The transition matrix for the RSS is based on a data-driven genetic association interaction network (GAIN), the nodes of which are SNPs weighted by the main-effect strength and edges weighted by the gene–gene interaction strength. We apply SNPrank to a GAIN analysis of a candidate-gene association study on human immune response to smallpox vaccine. SNPrank implicates a SNP in the retinoid X receptor α (RXRA) gene through a network interaction effect on antibody response. This vitamin A- and D-signaling mediator has been previously implicated in human immune responses, although it would be neglected in a standard analysis because its significance is unremarkable outside the context of its network centrality. This work suggests SNPrank to be a powerful method for identifying network effects in genetic association data and reveals a potential vitamin regulation network association with antibody response.