scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Motif-directed network component analysis for regulatory network inference

13 Feb 2008-BMC Bioinformatics (BioMed Central)-Vol. 9, Iss: 1, pp 1-9
TL;DR: In this article, a motif-directed NCA (mNCA) was proposed to integrate motif information and gene expression data to infer regulatory networks, which is applicable to many biological studies due to a lack of ChIP-on-chip data.
Abstract: Background Network Component Analysis (NCA) has shown its effectiveness in discovering regulators and inferring transcription factor activities (TFAs) when both microarray data and ChIP-on-chip data are available. However, a NCA scheme is not applicable to many biological studies due to limited topology information available, such as lack of ChIP-on-chip data. We propose a new approach, motif-directed NCA (mNCA), to integrate motif information and gene expression data to infer regulatory networks.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: It is shown that autophagy, as supported by the endocrine regulation of monodansylcadaverine staining, increased LC3 cleavage, and reduced expression of p62/SQSTM1, plays an important role in breast cancer cells responding to endocrine therapy and that the cell fate machinery includes both apoptotic and autophagic functions.

78 citations

Journal ArticleDOI
13 Feb 2017-PLOS ONE
TL;DR: A method that integrates machine learning techniques, such as self-training support vector machine (SVM) and BLM, to develop a self- training bipartite local model (SELF-BLM) that facilitates the identification of potential interactions.
Abstract: Predicting drug-target interactions is important for the development of novel drugs and the repositioning of drugs. To predict such interactions, there are a number of methods based on drug and target protein similarity. Although these methods, such as the bipartite local model (BLM), show promise, they often categorize unknown interactions as negative interaction. Therefore, these methods are not ideal for finding potential drug-target interactions that have not yet been validated as positive interactions. Thus, here we propose a method that integrates machine learning techniques, such as self-training support vector machine (SVM) and BLM, to develop a self-training bipartite local model (SELF-BLM) that facilitates the identification of potential interactions. The method first categorizes unlabeled interactions and negative interactions among unknown interactions using a clustering method. Then, using the BLM method and self-training SVM, the unlabeled interactions are self-trained and final local classification models are constructed. When applied to four classes of proteins that include enzymes, G-protein coupled receptors (GPCRs), ion channels, and nuclear receptors, SELF-BLM showed the best performance for predicting not only known interactions but also potential interactions in three protein classes compare to other related studies. The implemented software and supporting data are available at https://github.com/GIST-CSBL/SELF-BLM.

61 citations

Journal ArticleDOI
TL;DR: The GTRNetwork algorithm introduces the hidden layer TFA into classic relevance score-based gene regulatory network reconstruction processes and significantly improves both detection of new links and reduces that rate of false positives.
Abstract: Gene regulatory networks play essential roles in living organisms to control growth, keep internal metabolism running and respond to external environmental changes. Understanding the connections and the activity levels of regulators is important for the research of gene regulatory networks. While relevance score based algorithms that reconstruct gene regulatory networks from transcriptome data can infer genome-wide gene regulatory networks, they are unfortunately prone to false positive results. Transcription factor activities (TFAs) quantitatively reflect the ability of the transcription factor to regulate target genes. However, classic relevance score based gene regulatory network reconstruction algorithms use models do not include the TFA layer, thus missing a key regulatory element. This work integrates TFA prediction algorithms with relevance score based network reconstruction algorithms to reconstruct gene regulatory networks with improved accuracy over classic relevance score based algorithms. This method is called G ene expression and T ranscription factor activity based R elevance N etwork (GTRNetwork). Different combinations of TFA prediction algorithms and relevance score functions have been applied to find the most efficient combination. When the integrated GTRNetwork method was applied to E. coli data, the reconstructed genome-wide gene regulatory network predicted 381 new regulatory links. This reconstructed gene regulatory network including the predicted new regulatory links show promising biological significances. Many of the new links are verified by known TF binding site information, and many other links can be verified from the literature and databases such as EcoCyc. The reconstructed gene regulatory network is applied to a recent transcriptome analysis of E. coli during isobutanol stress. In addition to the 16 significantly changed TFAs detected in the original paper, another 7 significantly changed TFAs have been detected by using our reconstructed network. The GTRNetwork algorithm introduces the hidden layer TFA into classic relevance score-based gene regulatory network reconstruction processes. Integrating the TFA biological information with regulatory network reconstruction algorithms significantly improves both detection of new links and reduces that rate of false positives. The application of GTRNetwork on E. coli gene transcriptome data gives a set of potential regulatory links with promising biological significance for isobutanol stress and other conditions.

56 citations

Journal ArticleDOI
TL;DR: It is observed that glycine, serine, threonine, galactose and pyrimidine metabolisms are the most perturbed pathways in both mono and co-infection conditions.
Abstract: Chikungunya and dengue are arboviral infections with overlapping clinical symptoms. A subset of chikungunya infection occurs also as co-infections with dengue, resulting in complications during diagnosis and patient management. The present study was undertaken to identify the global metabolome of patient sera infected with chikungunya as mono infections and with dengue as co-infections. Using nuclear magnetic resonance (NMR) spectroscopy, the metabolome of sera of three disease conditions, namely, chikungunya and dengue as mono-infections and when co-infected were ascertained and compared with healthy individuals. Further, the cohorts were analyzed on the basis of age, onset of fever and joint involvement. Here we show that many metabolites in the serum are significantly differentially regulated during chikungunya mono-infection as well as during chikungunya co-infection with dengue. We observed that glycine, serine, threonine, galactose and pyrimidine metabolisms are the most perturbed pathways in both mono and co-infection conditions. The affected pathways in our study correlate well with the clinical manifestation like fever, inflammation, energy deprivation and joint pain during the infections. These results may serve as a starting point for validations and identification of distinct biomolecules that could be exploited as biomarker candidates thereby helping in better patient management.

29 citations

Journal ArticleDOI
TL;DR: The results show that the knowledge-guided ICA approach can extract biologically meaningful regulatory modes and outperform several baseline methods for biomarker identification and shows promising results to infer novel biomarkers for ovarian cancer and extend current knowledge.
Abstract: Many statistical methods have been proposed to identify disease biomarkers from gene expression profiles. However, from gene expression profile data alone, statistical methods often fail to identify biologically meaningful biomarkers related to a specific disease under study. In this paper, we develop a novel strategy, namely knowledge-guided multi-scale independent component analysis (ICA), to first infer regulatory signals and then identify biologically relevant biomarkers from microarray data. Since gene expression levels reflect the joint effect of several underlying biological functions, disease-specific biomarkers may be involved in several distinct biological functions. To identify disease-specific biomarkers that provide unique mechanistic insights, a meta-data "knowledge gene pool" (KGP) is first constructed from multiple data sources to provide important information on the likely functions (such as gene ontology information) and regulatory events (such as promoter responsive elements) associated with potential genes of interest. The gene expression and biological meta data associated with the members of the KGP can then be used to guide subsequent analysis. ICA is then applied to multi-scale gene clusters to reveal regulatory modes reflecting the underlying biological mechanisms. Finally disease-specific biomarkers are extracted by their weighted connectivity scores associated with the extracted regulatory modes. A statistical significance test is used to evaluate the significance of transcription factor enrichment for the extracted gene set based on motif information. We applied the proposed method to yeast cell cycle microarray data and Rsf-1-induced ovarian cancer microarray data. The results show that our knowledge-guided ICA approach can extract biologically meaningful regulatory modes and outperform several baseline methods for biomarker identification. We have proposed a novel method, namely knowledge-guided multi-scale ICA, to identify disease-specific biomarkers. The goal is to infer knowledge-relevant regulatory signals and then identify corresponding biomarkers through a multi-scale strategy. The approach has been successfully applied to two expression profiling experiments to demonstrate its improved performance in extracting biologically meaningful and disease-related biomarkers. More importantly, the proposed approach shows promising results to infer novel biomarkers for ovarian cancer and extend current knowledge.

26 citations

References
More filters
Journal ArticleDOI
25 Oct 2002-Science
TL;DR: This work determines how most of the transcriptional regulators encoded in the eukaryote Saccharomyces cerevisiae associate with genes across the genome in living cells, and identifies network motifs, the simplest units of network architecture, and demonstrates that an automated process can use motifs to assemble a transcriptional regulatory network structure.
Abstract: We have determined how most of the transcriptional regulators encoded in the eukaryote Saccharomyces cerevisiaeassociate with genes across the genome in living cells. Just as maps of metabolic networks describe the potential pathways that may be used by a cell to accomplish metabolic processes, this network of regulator-gene interactions describes potential pathways yeast cells can use to regulate global gene expression programs. We use this information to identify network motifs, the simplest units of network architecture, and demonstrate that an automated process can use motifs to assemble a transcriptional regulatory network structure. Our results reveal that eukaryotic cellular functions are highly connected through networks of transcriptional regulators that regulate other transcriptional regulators.

3,127 citations

Journal ArticleDOI
TL;DR: The TRANSFAC® database on transcription factors, their binding sites, nucleotide distribution matrices and regulated genes as well as the complementing database TRANSCompel® on composite elements have been further enhanced on various levels.
Abstract: The TRANSFAC database on transcription factors, their binding sites, nucleotide distribution matrices and regulated genes as well as the complementing database TRANSCompel on composite elements have been further enhanced on various levels. A new web interface with different search options and integrated versions of Match and Patch provides increased functionality for TRANSFAC. The list of databases which are linked to the common GENE table of TRANSFAC and TRANSCompel has been extended by: Ensembl, UniGene, EntrezGene, HumanPSD and TRANSPRO. Standard gene names from HGNC, MGI and RGD, are included for human, mouse and rat genes, respectively. With the help of InterProScan, Pfam, SMART and PROSITE domains are assigned automatically to the protein sequences of the transcription factors. TRANSCompel contains now, in addition to the COMPEL table, a separate table for detailed information on the experimental EVIDENCE on which the composite elements are based. Finally, for TRANSFAC, in respect of data growth, in particular the gain of Drosophila transcription factor binding sites (by courtesy of the Drosophila DNase I footprint database) and of Arabidopsis factors (by courtesy of DATF, Database of Arabidopsis Transcription Factors) has to be stressed. The here described public releases, TRANSFAC 7.0 and TRANSCompel 7.0, are accessible under http://www.gene-regulation.com/pub/databases.html.

2,262 citations

Journal ArticleDOI
TL;DR: The procedure identifies modules of coregulated genes, their regulators and the conditions under which regulation occurs, generating testable hypotheses in the form 'regulator X regulates module Y under conditions W'.
Abstract: Much of a cell’s activity is organized as a network of interacting modules: sets of genes coregulated to respond to different conditions. We present a probabilistic method for identifying regulatory modules from gene expression data. Our procedure identifies modules of coregulated genes, their regulators and the conditions under which regulation occurs, generating testable hypotheses in the form ‘regulator X regulates module Y under conditions W’. We applied the method to a Saccharomyces cerevisiae expression data set, showing its ability to identify functionally coherent modules and their correct regulators. We present microarray experiments supporting three novel predictions, suggesting regulatory roles for previously uncharacterized proteins.

1,820 citations

Journal ArticleDOI
TL;DR: MatchTM is a weight matrix-based tool for searching putative transcription factor binding sites in DNA sequences that uses the matrix library collected in TRANSFAC® and therefore provides the possibility to search for a great variety of different transcription factorbinding sites.
Abstract: MatchTM is a weight matrix-based tool for searching putative transcription factor binding sites in DNA sequences. MatchTM is closely interconnected and distributed together with the TRANSFAC® database. In particular, MatchTM uses the matrix library collected in TRANSFAC® and therefore provides the possibility to search for a great variety of different transcription factor binding sites. Several sets of optimised matrix cut-off values are built in the system to provide a variety of search modes of different stringency. The user may construct and save his/her specific user profiles which are selected subsets of matrices including default or user-defined cut-off values. Furthermore a number of tissue-specific profiles are provided that were compiled by the TRANSFAC® team. A public version of the MatchTM tool is available at: http://www.gene-regulation.com/pub/programs.html#match. The same program with a different web interface can be found at http://compel.bionet.nsc.ru/Match/Match.html. An advanced version of the tool called MatchTM Professional is available at http://www.biobase.de.

1,069 citations

Journal ArticleDOI
TL;DR: In this paper, a review summarizes some of the common themes in microarray data analysis, including detection of differential expression, clustering, and predicting sample characteristics, and their relative merits.
Abstract: Many different biological questions are routinely studied using transcriptional profiling on microarrays. A wide range of approaches are available for gleaning insights from the data obtained from such experiments. The appropriate choice of data-analysis technique depends both on the data and on the goals of the experiment. This review summarizes some of the common themes in microarray data analysis, including detection of differential expression, clustering, and predicting sample characteristics. Several approaches to each problem, and their relative merits, are discussed and key areas for additional research highlighted.

627 citations