scispace - formally typeset
Search or ask a question

Showing papers by "Chen Wang published in 2008"


Journal ArticleDOI
TL;DR: In this article, a motif-directed NCA (mNCA) was proposed to integrate motif information and gene expression data to infer regulatory networks, which is applicable to many biological studies due to a lack of ChIP-on-chip data.
Abstract: Background Network Component Analysis (NCA) has shown its effectiveness in discovering regulators and inferring transcription factor activities (TFAs) when both microarray data and ChIP-on-chip data are available. However, a NCA scheme is not applicable to many biological studies due to limited topology information available, such as lack of ChIP-on-chip data. We propose a new approach, motif-directed NCA (mNCA), to integrate motif information and gene expression data to infer regulatory networks.

32 citations


Journal ArticleDOI
TL;DR: The results show that the knowledge-guided ICA approach can extract biologically meaningful regulatory modes and outperform several baseline methods for biomarker identification and shows promising results to infer novel biomarkers for ovarian cancer and extend current knowledge.
Abstract: Many statistical methods have been proposed to identify disease biomarkers from gene expression profiles. However, from gene expression profile data alone, statistical methods often fail to identify biologically meaningful biomarkers related to a specific disease under study. In this paper, we develop a novel strategy, namely knowledge-guided multi-scale independent component analysis (ICA), to first infer regulatory signals and then identify biologically relevant biomarkers from microarray data. Since gene expression levels reflect the joint effect of several underlying biological functions, disease-specific biomarkers may be involved in several distinct biological functions. To identify disease-specific biomarkers that provide unique mechanistic insights, a meta-data "knowledge gene pool" (KGP) is first constructed from multiple data sources to provide important information on the likely functions (such as gene ontology information) and regulatory events (such as promoter responsive elements) associated with potential genes of interest. The gene expression and biological meta data associated with the members of the KGP can then be used to guide subsequent analysis. ICA is then applied to multi-scale gene clusters to reveal regulatory modes reflecting the underlying biological mechanisms. Finally disease-specific biomarkers are extracted by their weighted connectivity scores associated with the extracted regulatory modes. A statistical significance test is used to evaluate the significance of transcription factor enrichment for the extracted gene set based on motif information. We applied the proposed method to yeast cell cycle microarray data and Rsf-1-induced ovarian cancer microarray data. The results show that our knowledge-guided ICA approach can extract biologically meaningful regulatory modes and outperform several baseline methods for biomarker identification. We have proposed a novel method, namely knowledge-guided multi-scale ICA, to identify disease-specific biomarkers. The goal is to infer knowledge-relevant regulatory signals and then identify corresponding biomarkers through a multi-scale strategy. The approach has been successfully applied to two expression profiling experiments to demonstrate its improved performance in extracting biologically meaningful and disease-related biomarkers. More importantly, the proposed approach shows promising results to infer novel biomarkers for ovarian cancer and extend current knowledge.

26 citations


Journal ArticleDOI
TL;DR: The increased fasting plasma glucose reflects progressive decompensation of beta cell functions, and could be used to guide the strategy of clinical treatments.
Abstract: Background Type 2 diabetes is a chronic disease characterized by a progressive loss of beta cell functions. However, the evaluation of beta cell functions is either expensive or inconvenient for clinical practice. We aimed to elucidate the association between the changes of insulin responsiveness and the fasting plasma glucose (FPG) during the development of diabetes. Methods A total of 1192 Chinese individuals with normal blood glucose or hyperglycemia were enrolled for the analysis. The early insulinogenic index (DeltaI30/DeltaG30), the area under the curve of insulin (AUC-I), and homeostasis model assessment were applied to evaluate the early phase secretion, total insulin secretion, and insulin resistance respectively. Polynomial regression analysis was performed to estimate the fluctuation of beta cell functions. Results The DeltaI30/DeltaG30 decreased much more rapidly than the AUC-I accompanying with the elevation of FPG. At the FPG of 110 mg/dl (a pre-diabetic stage), the DeltaI30/DeltaG30 lost 50% of its maximum while the AUC-I was still at a compensated normal level. The AUC-I exhibited abnormal and decreased gradually at the FPG of from 130 mg/dl to higher (overt diabetes), while the DeltaI30/DeltaG30 almost remained at 25% of its maximum value. When hyperglycemia continuously existed at > 180 mg/dl, both the DeltaI30/DeltaG30 and AUC-I were totally lost. Conclusion The increased fasting plasma glucose reflects progressive decompensation of beta cell functions, and could be used to guide the strategy of clinical treatments.

15 citations


Book ChapterDOI
06 May 2008
TL;DR: The experimental results show that iNCA can effectively integrate motif information, ChIP-on-chip data and microarray data to identify key regulators and their gene targets in muscle regeneration.
Abstract: Network Component Analysis (NCA) has shown its effectiveness inregulator identification by inferring the transcription factor activity (TFA) whenboth microarray data and ChIP-on-chip data are available. However, the NCAscheme is not applicable to many biological studies due to the lack of completeChIP-on-chip data. In this paper, we propose an integrative NCA (iNCA) approachto combine motif information, limited ChIP-on-chip data, and geneexpression data for regulatory network inference. Specifically, a Bayesian frameworkis adopted to develop a novel strategy, namely stability analysis with topologicalsampling, to infer key TFAs and their downstream gene targets. TheiNCA approach with stability analysis reduces the computational cost by avoidinga direct estimation of the high-dimensional distribution in a traditionalBayesian approach. Stability indices are designed to measure the goodness of theestimated TFAs and their connectivity strengths. The approach can also be usedto evaluate the confidence level of different data sources, considering the inevitableinconsistency among the data sources. The iNCA approach has beenapplied to a time course microarray data set of muscle regeneration. The experimentalresults show that iNCA can effectively integrate motif information, ChIP-on-chip data and microarray data to identify key regulators and their gene targetsin muscle regeneration. In particular, several identified TFAs like those ofMyoD, myogenin and YY1 are well supported by biological experiments.

1 citations


Proceedings Article
01 Dec 2008
TL;DR: A Consistency-based Masking Nonnegative Matrix Factorization (CMNMF) method is developed to incorporate existing biological constraints with simultaneous miRNA and mRNA profiling data for an improved performance in module identification and experimental results show that the condition-specific modeling framework improves the performance in predicting miRNA-gene relationships.
Abstract: Recently, a class of small RNA molecules, microRNAs or miRNAs, has attracted interest from researchers for their unique role in post-transcriptional regulation. Due to their distinct cell-type/tissue-specific expression patterns, it is of high importance to identify conditionspecific miRNA-gene modules for a complete depiction of gene regulatory networks. In this paper, we propose a novel method to integrate miRNA and mRNA data to identify condition-specific miRNA-gene modules. Specifically, a Consistency-based Masking Nonnegative Matrix Factorization (CMNMF) method is developed to incorporate existing biological constraints (like the repression of miRNAs on potential target genes) with simultaneous miRNA and mRNA profiling data for an improved performance in module identification. The experimental results on simulation data show that the condition-specific modeling framework improves the performance in predicting miRNA-gene relationships. More importantly, application of CMNMF to human colon cancer data revealed a biologically significant miRNA-gene module, which contains four up-regulated miRNAs (miR-182, miR-183, miR-221 and miR-222) and six down-regulated target genes annotated as cytotoxity mediated by nature killer cells. The proposed method can also be applied to various biological conditions, even with limited number of samples, to elucidate miRNAinvolved gene networks.