scispace - formally typeset
Search or ask a question

Showing papers by "Tao Huang published in 2012"


Journal ArticleDOI
04 Apr 2012-PLOS ONE
TL;DR: This study developed a computational method to identify colorectal cancer-related genes based on the gene expression profiles, and the shortest path analysis of functional protein association networks, which indicated that the method may become a useful tool, or at least plays a complementary role to the existing method.
Abstract: One of the most important and challenging problems in biomedicine and genomics is how to identify the disease genes. In this study, we developed a computational method to identify colorectal cancer-related genes based on (i) the gene expression profiles, and (ii) the shortest path analysis of functional protein association networks. The former has been used to select differentially expressed genes as disease genes for quite a long time, while the latter has been widely used to study the mechanism of diseases. With the existing protein-protein interaction data from STRING (Search Tool for the Retrieval of Interacting Genes), a weighted functional protein association network was constructed. By means of the mRMR (Maximum Relevance Minimum Redundancy) approach, six genes were identified that can distinguish the colorectal tumors and normal adjacent colonic tissues from their gene expression profiles. Meanwhile, according to the shortest path approach, we further found an additional 35 genes, of which some have been reported to be relevant to colorectal cancer and some are very likely to be relevant to it. Interestingly, the genes we identified from both the gene expression profiles and the functional protein association network have more cancer genes than the genes identified from the gene expression profiles alone. Besides, these genes also had greater functional similarity with the reported colorectal cancer genes than the genes identified from the gene expression profiles alone. All these indicate that our method as presented in this paper is quite promising. The method may become a useful tool, or at least plays a complementary role to the existing method, for identifying colorectal cancer genes. It has not escaped our notice that the method can be applied to identify the genes of other diseases as well.

169 citations


Journal ArticleDOI
TL;DR: A sequence-based predictor of ubiquitination site prediction was developed based on nearest neighbor algorithm using the maximum relevance and minimum redundancy principle to identify the key features and the incremental feature selection procedure to optimize the prediction engine.
Abstract: Ubiquitination, one of the most important post-translational modifications of proteins, occurs when ubiquitin (a small 76-amino acid protein) is attached to lysine on a target protein. It often commits the labeled protein to degradation and plays important roles in regulating many cellular processes implicated in a variety of diseases. Since ubiquitination is rapid and reversible, it is time-consuming and labor-intensive to identify ubiquitination sites using conventional experimental approaches. To efficiently discover lysine-ubiquitination sites, a sequence-based predictor of ubiquitination site was developed based on nearest neighbor algorithm. We used the maximum relevance and minimum redundancy principle to identify the key features and the incremental feature selection procedure to optimize the prediction engine. PSSM conservation scores, amino acid factors and disorder scores of the surrounding sequence formed the optimized 456 features. The Mathew’s correlation coefficient (MCC) of our ubiquitination site predictor achieved 0.142 by jackknife cross-validation test on a large benchmark dataset. In independent test, the MCC of our method was 0.139, higher than the existing ubiquitination site predictor UbiPred and UbPred. The MCCs of UbiPred and UbPred on the same test set were 0.135 and 0.117, respectively. Our analysis shows that the conservation of amino acids at and around lysine plays an important role in ubiquitination site prediction. What’s more, disorder and ubiquitination have a strong relevance. These findings might provide useful insights for studying the mechanisms of ubiquitination and modulating the ubiquitination pathway, potentially leading to potential therapeutic strategies in the future.

119 citations


Journal ArticleDOI
28 Aug 2012-PLOS ONE
TL;DR: A novel predictor based on Random Forest algorithm with the Minimum Redundancy Maximal Relevance (mRMR) method followed by incremental feature selection (IFS) is developed that incorporated features of physicochemical/biochemical properties, sequence conservation, residual disorder, secondary structure and solvent accessibility.
Abstract: Prediction of protein-protein interaction (PPI) sites is one of the most challenging problems in computational biology. Although great progress has been made by employing various machine learning approaches with numerous characteristic features, the problem is still far from being solved. In this study, we developed a novel predictor based on Random Forest (RF) algorithm with the Minimum Redundancy Maximal Relevance (mRMR) method followed by incremental feature selection (IFS). We incorporated features of physicochemical/biochemical properties, sequence conservation, residual disorder, secondary structure and solvent accessibility. We also included five 3D structural features to predict protein-protein interaction sites and achieved an overall accuracy of 0.672997 and MCC of 0.347977. Feature analysis showed that 3D structural features such as Depth Index (DPX) and surface curvature (SC) contributed most to the prediction of protein-protein interaction sites. It was also shown via site-specific feature analysis that the features of individual residues from PPI sites contribute most to the determination of protein-protein interaction sites. It is anticipated that our prediction method will become a useful tool for identifying PPI sites, and that the feature analysis described in this paper will provide useful insights into the mechanisms of interaction.

97 citations


Journal ArticleDOI
TL;DR: An integrative approach to conduct differential combinatorial regulatory network analysis in the specific context venous metastasis of HBV-HCC is demonstrated and possible transcriptional regulatory patterns underlying the different metastatic subgroups of HCC are proposed.
Abstract: Hepatocellular carcinoma (HCC) is one of the most fatal cancers in the world, and metastasis is a significant cause to the high mortality in patients with HCC. However, the molecular mechanism behind HCC metastasis is not fully understood. Study of regulatory networks may help investigate HCC metastasis in the way of systems biology profiling. By utilizing both sequence information and parallel microRNA(miRNA) and mRNA expression data on the same cohort of HBV related HCC patients without or with venous metastasis, we constructed combinatorial regulatory networks of non-metastatic and metastatic HCC which contain transcription factor(TF) regulation and miRNA regulation. Differential regulation patterns, classifying marker modules, and key regulatory miRNAs were analyzed by comparing non-metastatic and metastatic networks. Globally TFs accounted for the main part of regulation while miRNAs for the minor part of regulation. However miRNAs displayed a more active role in the metastatic network than in the non-metastatic one. Seventeen differential regulatory modules discriminative of the metastatic status were identified as cumulative-module classifier, which could also distinguish survival time. MiR-16, miR-30a, Let-7e and miR-204 were identified as key miRNA regulators contributed to HCC metastasis. In this work we demonstrated an integrative approach to conduct differential combinatorial regulatory network analysis in the specific context venous metastasis of HBV-HCC. Our results proposed possible transcriptional regulatory patterns underlying the different metastatic subgroups of HCC. The workflow in this study can be applied in similar context of cancer research and could also be extended to other clinical topics.

81 citations


Journal ArticleDOI
13 Jun 2012-PLOS ONE
TL;DR: By combining advanced ionomics and mutual information, a quantifying measurement for mutual dependence between two random variables, associations of ion modules/networks with overweight/obesity, metabolic syndrome and type 2 diabetes in 976 middle-aged Chinese men and women are investigated.
Abstract: Background: Few studies assessed effects of individual and multiple ions simultaneously on metabolic outcomes, due to methodological limitation. Methodology/Principal Findings: By combining advanced ionomics and mutual information, a quantifying measurement for mutual dependence between two random variables, we investigated associations of ion modules/networks with overweight/obesity, metabolic syndrome (MetS) and type 2 diabetes (T2DM) in 976 middle-aged Chinese men and women. Fasting plasma ions were measured by inductively coupled plasma mass spectroscopy. Significant ion modules were selected by mutual information to construct disease related ion networks. Plasma copper and phosphorus always ranked the first two among three specific ion networks associated with overweight/obesity, MetS and T2DM. Comparing the ranking of ion individually and in networks, three patterns were observed (1) "Individual ion," such as potassium and chrome, which tends to work alone; (2) "Module ion," such as iron in T2DM, which tends to act in modules/network; and (3) "odule-individual ion," such as copper in overweight/obesity, which seems to work equivalently in either way. Conclusions: In conclusion, by using the novel approach of the ionomics strategy and the information theory, we observed potential associations of ions individually or as modules/networks with metabolic disorders. Certainly, these findings need to be confirmed in future biological studies.

73 citations


Journal ArticleDOI
TL;DR: Based on the network features and the biochemical/physicochemical features of the deletion network and deletion genes, as well as their functional features, a two-layer model was developed for predicting the deletion effects on yeast longevity and is anticipated that it may become a useful tool for studying longevity from the angle of genes and networks.

60 citations


Journal ArticleDOI
06 Apr 2012-PLOS ONE
TL;DR: It was observed that the identification accuracy was higher with the tissue samples defined by extracting the features from the second biomarker pool than that with the samples defined based on the first biomarker Pool, indicating that the novel approach holds a quite promising potential in helping find effective biomarkers for diagnosing the liver cirrhosis disease and the hepatocellular carcinoma disease.
Abstract: Hepatitis C virus (HCV) is a main risk factor for liver cirrhosis and hepatocellular carcinoma, particularly to those patients with chronic liver disease or injury. The similar etiology leads to a high correlation of the patients suffering from the disease of liver cirrhosis with those suffering from the disease of hepatocellular carcinoma. However, the biological mechanism for the relationship between these two kinds of diseases is not clear. The present study was initiated in an attempt to investigate into the HCV infection protein network, in hopes to find good biomarkers for diagnosing the two diseases as well as gain insights into their progression mechanisms. To realize this, two potential biomarker pools were defined: (i) the target genes of HCV, and (ii) the between genes on the shortest paths among the target genes of HCV. Meanwhile, a predictor was developed for identifying the liver tissue samples among the following three categories: (i) normal, (ii) cirrhosis, and (iii) hepatocellular carcinoma. Interestingly, it was observed that the identification accuracy was higher with the tissue samples defined by extracting the features from the second biomarker pool than that with the samples defined based on the first biomarker pool. The identification accuracy by the jackknife validation for the between-genes approach was 0.960, indicating that the novel approach holds a quite promising potential in helping find effective biomarkers for diagnosing the liver cirrhosis disease and the hepatocellular carcinoma disease. It may also provide useful insights for in-depth study of the biological mechanisms of HCV-induced cirrhosis and hepatocellular carcinoma.

54 citations


Journal ArticleDOI
21 Sep 2012-PLOS ONE
TL;DR: A novel method was proposed by which to allocate small molecules and enzymes to 11 major classes of metabolic pathways, utilizing the information provided by chemical-chemical interactions, chemical-protein interactions, and protein- protein interactions to become a useful vehicle in predicting the metabolic pathways of small molecule and enzymes.
Abstract: Metabolic pathway analysis, one of the most important fields in biochemistry, is pivotal to understanding the maintenance and modulation of the functions of an organism. Good comprehension of metabolic pathways is critical to understanding the mechanisms of some fundamental biological processes. Given a small molecule or an enzyme, how may one identify the metabolic pathways in which it may participate? Answering such a question is a first important step in understanding a metabolic pathway system. By utilizing the information provided by chemical-chemical interactions, chemical-protein interactions, and protein-protein interactions, a novel method was proposed by which to allocate small molecules and enzymes to 11 major classes of metabolic pathways. A benchmark dataset consisting of 3,348 small molecules and 654 enzymes of yeast was constructed to test the method. It was observed that the first order prediction accuracy evaluated by the jackknife test was 79.56% in identifying the small molecules and enzymes in a benchmark dataset. Our method may become a useful vehicle in predicting the metabolic pathways of small molecules and enzymes, providing a basis for some further analysis of the pathway systems.

41 citations


Journal ArticleDOI
TL;DR: By comparing the predicted results obtained from other methods based on blast and amino acid composition, respectively, it implies that the prediction method is quite promising that may provide an opportunity to understand this complicated pathway system well.
Abstract: In systems biology, regulatory pathway is one of the most important research areas. However, regulatory pathway is so complicated that we still poorly understand this system. On the other hand, with rapid accumulated information on different organisms, it becomes more and more possible to in-depth investigate regulatory pathway. To understand regulatory pathway well, figuring out the components of each pathway is the most important step. In this study, a network- based method was proposed to classify human genes into corresponding pathways. The information of proteinprotein interactions retrieved from STRING was used to construct a network and jackknife test was employed to evaluate the method. As a result, the first order prediction accuracy was 87.91%, indicating that interactive proteins always have similar biological regulatory functions. By comparing the predicted results obtained from other methods based on blast and amino acid composition, respectively, it implies that our prediction method is quite promising that may provide an opportunity to understand this complicated pathway system well.

28 citations


Journal ArticleDOI
TL;DR: The genes the authors identified from both the gene expression profiles and the functional protein association network included more cancer genes than did the genes identified from the geneexpression profiles alone and had greater functional similarity to the reported cancer genes.

27 citations


Journal ArticleDOI
TL;DR: A computational method based on the nearest neighbor algorithm was developed for rapidly and effectively identifying protein oxidation sites and the 16 optimal features obtained may provide useful clues and insights for in-depth understanding the action mechanism of protein oxidation.
Abstract: Protein oxidation is a ubiquitous post-translational modification that plays important roles in various physiological and pathological processes. Owing to the fact that protein oxidation can also take place as an experimental artifact or caused by oxygen in the air during the process of sample collection and analysis, and that it is both time-consuming and expensive to determine the protein oxidation sites purely by biochemical experiments, it would be of great benefit to develop in silico methods for rapidly and effectively identifying protein oxidation sites. In this study, we developed a computational method to address this problem. Our method was based on the nearest neighbor algorithm in which, however, the maximum relevance minimum redundancy and incremental feature selection approaches were incorporated. From the initial 735 features, 16 features were selected as the optimal feature set. Of such 16 optimized features, 10 features were associated with the position-specific scoring matrix conservatio...

Journal ArticleDOI
Tao Huang1, Chuan Wang1, Guoqing Zhang1, Lu Xie, Yixue Li1 
TL;DR: SySAP (a System-level predictor of deleterious Single Amino acid Polymorphisms), an easy-to-use and high accurate web server, which not only considers the sequence and structure information, but also the network information can improve the performance of deleTERious SAP prediction.
Abstract: Single amino acid polymorphisms (SAPs), also known as non-synonymous single nucleotide polymorphisms (nsSNPs), are responsible for most of human genetic diseases. Discriminate the deleterious SAPs from neutral ones can help identify the disease genes and understand the mechanism of diseases. In this work, a method of deleterious SAP prediction at system level was established. Unlike most existing methods, our method not only considers the sequence and structure information, but also the network information. The integration of network information can improve the performance of deleterious SAP prediction. To make our method available to the public, we developed SySAP (a System-level predictor of deleterious Single Amino acid Polymorphisms), an easy-to-use and high accurate web server. SySAP is freely available at http://www.biosino.org/ SySAP/and http://lifecenter.sgst.cn/SySAP/.

Journal ArticleDOI
17 Aug 2012-PLOS ONE
TL;DR: This study integrated the methylation, microRNA and mRNA data from lung cancer tissues and normal lung tissues using functional gene sets to provide a systematic view of the functional alterations during tumorigenesis that may help to elucidate the mechanisms of lung cancer and lead to improved treatments for patients.
Abstract: Integrating high-throughput data obtained from different molecular levels is essential for understanding the mechanisms of complex diseases such as cancer. In this study, we integrated the methylation, microRNA and mRNA data from lung cancer tissues and normal lung tissues using functional gene sets. For each Gene Ontology (GO) term, three sets were defined: the methylation set, the microRNA set and the mRNA set. The discriminating ability of each gene set was represented by the Matthews correlation coefficient (MCC), as evaluated by leave-one-out cross-validation (LOOCV). Next, the MCCs in the methylation sets, the microRNA sets and the mRNA sets were ranked. By comparing the MCC ranks of methylation, microRNA and mRNA for each GO term, we classified the GO sets into six groups and identified the dysfunctional methylation, microRNA and mRNA gene sets in lung cancer. Our results provide a systematic view of the functional alterations during tumorigenesis that may help to elucidate the mechanisms of lung cancer and lead to improved treatments for patients.

Journal ArticleDOI
TL;DR: A fast computational framework was developed by optimize the reprogramming factors via the protein interaction network and gene functional profiles that will become a very useful tool for both basic research and drug development.
Abstract: Induced pluripotent stem cells have displayed great potential in disease investigation and drug development applications. However, selection of reprogramming factors in each cell type or disease state is both expensive and time consuming. To deal with this kind of situation, a fast computational framework was developed by optimize the reprogramming factors via the protein interaction network and gene functional profiles. It can be used to select reprogramming factors from millions of possibilities. It is anticipated that the novel approach will become a very useful tool for both basic research and drug development.

Journal ArticleDOI
TL;DR: It was found through Jackknife cross-validation that the overall success rate of identifying the positive pathways was 74.26%.
Abstract: Given a compounds-forming system, i.e., a system consisting of some compounds and their relationship, can it form a biologically meaningful pathway? It is a fundamental problem in systems biology. Nowadays, a lot of information on different organisms, at both genetic and metabolic levels, has been collected and stored in some specific databases. Based on these data, it is feasible to address such an essential problem. Metabolic pathway is one kind of compoundsforming systems and we analyzed them in yeast by extracting different (biological and graphic) features from each of the 13,736 compounds-forming systems, of which 136 are positive pathways, i.e., known metabolic pathway from KEGG; while 13,600 were negative. Each of these compounds-forming systems was represented by 144 features, of which 88 are graph features and 56 biological features. “Minimum Redundancy Maximum Relevance” and “Incremental Feature Selection” were utilized to analyze these features and 16 optimal features were selected as being able to predict a query compounds- forming system most successfully. It was found through Jackknife cross-validation that the overall success rate of identifying the positive pathways was 74.26%. It is anticipated that this novel approach and encouraging result may give meaningful illumination to investigate this important topic.

DOI
27 Sep 2012
TL;DR: If good evidence existed showing that physical activity interventions could improve BDNF and cognitive function in children, this would further strengthen the importance of children being physically active.
Abstract: Well designed school based physical activity interventions have proven successful in improving health (1). Many studies have found substantial changes in cardiovascular (CVD) risk factors (2), and it seems that greater improvements are found in the children who need it most (3). It is therefore surprising that most countries only have two compulsory physical education lessons per week (4). One reason for politicians to be reluctant in increasing physical education could be a fear of using more school time on PE and less on theoretical subjects. This is despite the fact that no interventions have shown decreased grades in theoretical subjects in the intervention group even if number of theoretical lessons decreased. It is therefore necessary to increase the knowledge of the association between physical activity and cognitive function (1). Recently, a study including 1.3 million Swedish military conscripts born 1950-76 analyzed the association between cardiorespiratory tness (CRF) and four di erent types of cognitive function as well as the job and salary they got later in life (5). Among the conscripts 268,496 were siblings, 3,147 were twins, and 1,432 monozygotic twins. The authors found a positive cross sectional association between CRF and cognitive function at the age of 18 years, and an increase in CRF from age 15-18 years was associated with higher intelligence. The CRF as 18 year old was associated with better job and higher salary later in life. These associations were found even in monozygotic twins, and there was a close relationship between the di erence in CRF and di erence in cognitive function in monozygotic twins. Others have shown that increased physical activity has both an acute (6) and more lasting (7) e ect on cognitive parameters. The mechanism behind the observed improvement in cognitive function is still unclear. Results from a study of Coras et al indicate the ability to remember new impressions is related to the regenerative capacity in hippocampus and improves synaptic plasticity (8;9). A study of Erickson et al found an association between serum brain derived neutrophic factor (BDNF), the size of hippocampus and memory (7), and exercise training increased the size of anterior hippocampus. BDNF is a member of the neurotrophic factor family that plays a key role in regulating synaptic plasticity, neuronal survival, di erentiation, and learning and memory (10). The e ect of physical activity on BDNF has also been shown in several human studies (11-13). Of interest is that BDNF is associated with in ammatory markers and insulin sensitivity, and lower levels of BDNF have been found in obese subjects compared to normal weight (14). Many studies have been conducted in children where metabolic health parameters have been measured in order to study health e ects of physical activity. Cardiovascular risk factors are important and they are the primary health outcomes used in studies of children. It has been shown that CVD risk factors cluster in children with low physical activity level, low CRF or obesity (15;16). However, if good evidence existed showing that physical activity interventions could improve BDNF and cognitive function in children, this would further strengthen the importance of children being physically active. BDNF can be measured in serum with conventional methods and it is suggested that this protein is analyzed in future studies or in existing blood samples from better designed physical activity interventions as a biological marker of cognitive function. If evidence of improved cognitive function and this biological marker is gathered in school based interventions, politicians may eventually recognize the importance of increasing physical education and other physical activity in school.