scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Coexpression network based on natural variation in human gene expression reveals gene interactions and functions

01 Nov 2009-Genome Research (Cold Spring Harbor Laboratory Press)-Vol. 19, Iss: 11, pp 1953-1962
TL;DR: This work took advantage of normal variation in human gene expression to infer gene networks, which were constructed using correlations in expression levels of more than 8.5 million gene pairs in immortalized B cells from three independent samples to identify biological processes and gene functions.
Abstract: Genes interact in networks to orchestrate cellular processes. Analysis of these networks provides insights into gene interactions and functions. Here, we took advantage of normal variation in human gene expression to infer gene networks, which we constructed using correlations in expression levels of more than 8.5 million gene pairs in immortalized B cells from three independent samples. The resulting networks allowed us to identify biological processes and gene functions. Among the biological pathways, we found processes such as translation and glycolysis that co-occur in the same subnetworks. We predicted the functions of poorly characterized genes, including CHCHD2 and TMEM111, and provided experimental evidence that TMEM111 is part of the endoplasmic reticulum-associated secretory pathway. We also found that IFIH1, a susceptibility gene of type 1 diabetes, interacts with YES1, which plays a role in glucose transport. Furthermore, genes that predispose to the same diseases are clustered nonrandomly in the coexpression network, suggesting that networks can provide candidate genes that influence disease susceptibility. Therefore, our analysis of gene coexpression networks offers information on the role of human genes in normal and disease processes.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: The expression level of lncRNA‐HEIH in HBV‐related HCC is significantly associated with recurrence and is an independent prognostic factor for survival, and it is proposed that lncRNAs may serve as key regulatory hubs in HCC progression.

590 citations

Posted Content
TL;DR: The method has a clear interpretation: the authors use the least amount of regularization that simultaneously makes a graph sparse and replicable under random sampling, which requires essentially no conditions.
Abstract: A challenging problem in estimating high-dimensional graphical models is to choose the regularization parameter in a data-dependent way. The standard techniques include $K$-fold cross-validation ($K$-CV), Akaike information criterion (AIC), and Bayesian information criterion (BIC). Though these methods work well for low-dimensional problems, they are not suitable in high dimensional settings. In this paper, we present StARS: a new stability-based method for choosing the regularization parameter in high dimensional inference for undirected graphs. The method has a clear interpretation: we use the least amount of regularization that simultaneously makes a graph sparse and replicable under random sampling. This interpretation requires essentially no conditions. Under mild conditions, we show that StARS is partially sparsistent in terms of graph estimation: i.e. with high probability, all the true edges will be included in the selected model even when the graph size diverges with the sample size. Empirically, the performance of StARS is compared with the state-of-the-art model selection procedures, including $K$-CV, AIC, and BIC, on both synthetic data and a real microarray dataset. StARS outperforms all these competing procedures.

324 citations

Journal ArticleDOI
TL;DR: A theoretic framework is developed where it is shown that under mild conditions, the SCORE stably yields successful community detection and is much more satisfactory than those by the classical spectral methods.
Abstract: Consider a network where the nodes split into $K$ different communities. The community labels for the nodes are unknown and it is of major interest to estimate them (i.e., community detection). Degree Corrected Block Model (DCBM) is a popular network model. How to detect communities with the DCBM is an interesting problem, where the main challenge lies in the degree heterogeneity. We propose a new approach to community detection which we call the Spectral Clustering On Ratios-of-Eigenvectors (SCORE). Compared to classical spectral methods, the main innovation is to use the entry-wise ratios between the first leading eigenvector and each of the other leading eigenvectors for clustering. Let $A$ be the adjacency matrix of the network. We first obtain the $K$ leading eigenvectors of $A$, say, $\hat{\eta}_{1},\ldots,\hat{\eta}_{K}$, and let $\hat{R}$ be the $n\times(K-1)$ matrix such that $\hat{R}(i,k)=\hat{\eta}_{k+1}(i)/\hat{\eta}_{1}(i)$, $1\leq i\leq n$, $1\leq k\leq K-1$. We then use $\hat{R}$ for clustering by applying the $k$-means method. The central surprise is, the effect of degree heterogeneity is largely ancillary, and can be effectively removed by taking entry-wise ratios between $\hat{\eta}_{k+1}$ and $\hat{\eta}_{1}$, $1\leq k\leq K-1$. The method is successfully applied to the web blogs data and the karate club data, with error rates of $58/1222$ and $1/34$, respectively. These results are more satisfactory than those by the classical spectral methods. Additionally, compared to modularity methods, SCORE is easier to implement, computationally faster, and also has smaller error rates. We develop a theoretic framework where we show that under mild conditions, the SCORE stably yields consistent community detection. In the core of the analysis is the recent development on Random Matrix Theory (RMT), where the matrix-form Bernstein inequality is especially helpful.

274 citations


Cites background from "Coexpression network based on natur..."

  • ..., coexpression genetic network [25, 26]), the situation is more complicated and we may not have a good idea on how large K is....

    [...]

Journal ArticleDOI
TL;DR: The results increase the number of established susceptibility genes for lupus to ∼30 and validate the importance of using large datasets to confirm associations of loci which moderately increase the risk for disease.
Abstract: Systemic lupus erythematosus (SLE) is a complex trait characterised by the production of a range of auto-antibodies and a diverse set of clinical phenotypes. Currently, ~8% of the genetic contribution to SLE in Europeans is known, following publication of several moderate-sized genome-wide (GW) association studies, which identified loci with a strong effect (OR>1.3). In order to identify additional genes contributing to SLE susceptibility, we conducted a replication study in a UK dataset (870 cases, 5,551 controls) of 23 variants that showed moderate-risk for lupus in previous studies. Association analysis in the UK dataset and subsequent meta-analysis with the published data identified five SLE susceptibility genes reaching genome-wide levels of significance (P(comb)<5×10(-8)): NCF2 (P(comb) = 2.87×10(-11)), IKZF1 (P(comb) = 2.33×10(-9)), IRF8 (P(comb) = 1.24×10(-8)), IFIH1 (P(comb) = 1.63×10(-8)), and TYK2 (P(comb) = 3.88×10(-8)). Each of the five new loci identified here can be mapped into interferon signalling pathways, which are known to play a key role in the pathogenesis of SLE. These results increase the number of established susceptibility genes for lupus to ~30 and validate the importance of using large datasets to confirm associations of loci which moderately increase the risk for disease.

272 citations


Cites methods from "Coexpression network based on natur..."

  • ...To determine whether there was any allele-specific effect on the level of gene expression, we used publically available genotype data on unrelated EBV-transformed B cells (CEU, YRI and CHB/JPT individuals which were part of the HapMap project) and expression data from the same individuals (GSE12526, GEO database) [41]....

    [...]

References
More filters
Journal ArticleDOI
04 Jun 1998-Nature
TL;DR: Simple models of networks that can be tuned through this middle ground: regular networks ‘rewired’ to introduce increasing amounts of disorder are explored, finding that these systems can be highly clustered, like regular lattices, yet have small characteristic path lengths, like random graphs.
Abstract: Networks of coupled dynamical systems have been used to model biological oscillators, Josephson junction arrays, excitable media, neural networks, spatial games, genetic control networks and many other self-organizing systems. Ordinarily, the connection topology is assumed to be either completely regular or completely random. But many biological, technological and social networks lie somewhere between these two extremes. Here we explore simple models of networks that can be tuned through this middle ground: regular networks 'rewired' to introduce increasing amounts of disorder. We find that these systems can be highly clustered, like regular lattices, yet have small characteristic path lengths, like random graphs. We call them 'small-world' networks, by analogy with the small-world phenomenon (popularly known as six degrees of separation. The neural network of the worm Caenorhabditis elegans, the power grid of the western United States, and the collaboration graph of film actors are shown to be small-world networks. Models of dynamical systems with small-world coupling display enhanced signal-propagation speed, computational power, and synchronizability. In particular, infectious diseases spread more easily in small-world networks than in regular lattices.

39,297 citations

Journal ArticleDOI
TL;DR: The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing.
Abstract: Genomic sequencing has made it clear that a large fraction of the genes specifying the core biological functions are shared by all eukaryotes. Knowledge of the biological role of such shared proteins in one organism can often be transferred to other organisms. The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing. To this end, three independent ontologies accessible on the World-Wide Web (http://www.geneontology.org) are being constructed: biological process, molecular function and cellular component.

35,225 citations


"Coexpression network based on natur..." refers background in this paper

  • ...We found that gene pairs that are correlated at |r| > 0.50 shared Gene Ontology (GO) (Ashburner et al. 2000) annotations significantly (P < 10 16) more than expected by chance....

    [...]

  • ...50 shared Gene Ontology (GO) (Ashburner et al. 2000) annotations significantly (P < 10 (16)) more than expected by chance....

    [...]

Journal ArticleDOI
15 Oct 1999-Science
TL;DR: A model based on these two ingredients reproduces the observed stationary scale-free distributions, which indicates that the development of large networks is governed by robust self-organizing phenomena that go beyond the particulars of the individual systems.
Abstract: Systems as diverse as genetic networks or the World Wide Web are best described as networks with complex topology. A common property of many large networks is that the vertex connectivities follow a scale-free power-law distribution. This feature was found to be a consequence of two generic mechanisms: (i) networks expand continuously by the addition of new vertices, and (ii) new vertices attach preferentially to sites that are already well connected. A model based on these two ingredients reproduces the observed stationary scale-free distributions, which indicates that the development of large networks is governed by robust self-organizing phenomena that go beyond the particulars of the individual systems.

33,771 citations


"Coexpression network based on natur..." refers background or methods in this paper

  • ...This measurement ranges from 0 to 1, with 1 representing networks that are most like other biological networks. cIn many networks, the probability that a gene is connected to k other genes is given by the power law distribution (Barabasi and Albert 1999; Barabasi and Oltvai 2004): P(k) ; k gamma....

    [...]

  • ...MATLAB functions for determining the clustering coefficient (Watts and Strogatz 1998), gamma (Barabasi and Albert 1999), and scale-free topology criteria (Zhang and Horvath 2005) were implemented as previously described....

    [...]

  • ...Genome Research 1959 www.genome.org MATLAB functions for determining the clustering coefficient (Watts and Strogatz 1998), gamma (Barabasi and Albert 1999), and scale-free topology criteria (Zhang and Horvath 2005) were implemented as previously described....

    [...]

  • ...In many networks, the probability that a gene is connected to k other genes is given by the power law distribution (Barabasi and Albert 1999; Barabasi and Oltvai 2004): P(k) ; k ....

    [...]

  • ...In addition, the clustering coefficient (Watts and Strogatz 1998) and another network parameter, gamma (Barabasi and Albert 1999), are within the ranges expected for biological networks (Table 1; Jordan et al. 2004; Zhang and Horvath 2005)....

    [...]

Journal ArticleDOI
TL;DR: Several case studies of Cytoscape plug-ins are surveyed, including a search for interaction pathways correlating with changes in gene expression, a study of protein complexes involved in cellular recovery to DNA damage, inference of a combined physical/functional interaction network for Halobacterium, and an interface to detailed stochastic/kinetic gene regulatory models.
Abstract: Cytoscape is an open source software project for integrating biomolecular interaction networks with high-throughput expression data and other molecular states into a unified conceptual framework. Although applicable to any system of molecular components and interactions, Cytoscape is most powerful when used in conjunction with large databases of protein-protein, protein-DNA, and genetic interactions that are increasingly available for humans and model organisms. Cytoscape's software Core provides basic functionality to layout and query the network; to visually integrate the network with expression profiles, phenotypes, and other molecular states; and to link the network to databases of functional annotations. The Core is extensible through a straightforward plug-in architecture, allowing rapid development of additional computational analyses and features. Several case studies of Cytoscape plug-ins are surveyed, including a search for interaction pathways correlating with changes in gene expression, a study of protein complexes involved in cellular recovery to DNA damage, inference of a combined physical/functional interaction network for Halobacterium, and an interface to detailed stochastic/kinetic gene regulatory models.

32,980 citations


"Coexpression network based on natur..." refers methods in this paper

  • ...Figures of the resulting networks were drawn using Cytoscape 2.6.0 (Shannon et al. 2003) or GraphViz (Ellson et al. 2002)....

    [...]

Journal ArticleDOI
TL;DR: By following this protocol, investigators are able to gain an in-depth understanding of the biological themes in lists of genes that are enriched in genome-scale studies.
Abstract: DAVID bioinformatics resources consists of an integrated biological knowledgebase and analytic tools aimed at systematically extracting biological meaning from large gene/protein lists. This protocol explains how to use DAVID, a high-throughput and integrated data-mining environment, to analyze gene lists derived from high-throughput genomic experiments. The procedure first requires uploading a gene list containing any number of common gene identifiers followed by analysis using one or more text and pathway-mining tools such as gene functional classification, functional annotation chart or clustering and functional annotation table. By following this protocol, investigators are able to gain an in-depth understanding of the biological themes in lists of genes that are enriched in genome-scale studies.

31,015 citations


"Coexpression network based on natur..." refers methods in this paper

  • ...Enrichment analysis for BIND protein interactions or KEGG Pathways was done using DAVID (Dennis et al. 2003; Huang da et al. 2009)....

    [...]