scispace - formally typeset
Search or ask a question

Showing papers by "Tao Huang published in 2013"


Journal ArticleDOI
TL;DR: The gene expressions among the colorectal cancer patients in the aforementioned four stages were compared and the early and late stage biomarkers, respectively, were obtained and both kinds of biomarkers were mapped onto the protein interaction network.
Abstract: Colorectal cancer is generally categorized into the following four stages according to its development or serious degree: Dukes A, B, C, and D. Since different stage of colorectal cancer actually corresponds to different activated region of the network, the transition of different network states may reflect its pathological changes. In view of this, we compared the gene expressions among the colorectal cancer patients in the aforementioned four stages and obtained the early and late stage biomarkers, respectively. Subsequently, the two kinds of biomarkers were both mapped onto the protein interaction network. If an early biomarker and a late biomarker were close in the network and also if their expression levels were correlated in the Dukes B and C patients, then a signal propagation path from the early stage biomarker to the late one was identified. Many transition genes in the signal propagation paths were involved with the signal transduction, cell communication, and cellular process regulation. Some transition hubs were known as colorectal cancer genes. The findings reported here may provide useful insights for revealing the mechanism of colorectal cancer progression at the cellular systems biology level.

56 citations


Journal ArticleDOI
TL;DR: A computational method is developed to identify hepatocellular carcinoma related genes based on k-th shortest paths in the protein-protein interaction (PPI) network and it is found that 33 genes whose p-values were less than 0.05 have been reported to be involved in HCC tumorigenesis and development.
Abstract: Hepatocellular carcinoma (HCC) is the most common type of liver cancer worldwide and one of the deadliest cancers in Asia. But at present, effective targets for HCC clinical therapy are still limited. The "guilt by association" rule suggests that interacting proteins share the same or similar functions and hence may be involved in the same pathway. This assumption can be used to identify disease related genes from protein association networks constructed from existing PPI data. Given the close association between Hepatitis B virus and Hepatitis B which may lead to HCC, here we develop a computational method to identify hepatocellular carcinoma related genes based on k-th shortest paths in the protein-protein interaction (PPI) network (we set k=1, 2 in this study). Finally, we found 33 genes whose p-values were less than 0.05, and most of them have been reported to be involved in HCC tumorigenesis and development. The results also provide a new reference for research into HCC oncogenesis and for development of new strategies for HCC clinical therapies.

42 citations


Journal ArticleDOI
TL;DR: This study proved the efficiency of the proposed method for identifying lung-cancer-related genes with a shortest path approach in a protein-protein interaction (PPI) network and showed promising results.
Abstract: Lung cancer is one of the leading causes of cancer mortality worldwide. The main types of lung cancer are small cell lung cancer (SCLC) and nonsmall cell lung cancer (NSCLC). In this work, a computational method was proposed for identifying lung-cancer-related genes with a shortest path approach in a protein-protein interaction (PPI) network. Based on the PPI data from STRING, a weighted PPI network was constructed. 54 NSCLC- and 84 SCLC-related genes were retrieved from associated KEGG pathways. Then the shortest paths between each pair of these 54 NSCLC genes and 84 SCLC genes were obtained with Dijkstra's algorithm. Finally, all the genes on the shortest paths were extracted, and 25 and 38 shortest genes with a permutation P value less than 0.05 for NSCLC and SCLC were selected for further analysis. Some of the shortest path genes have been reported to be related to lung cancer. Intriguingly, the candidate genes we identified from the PPI network contained more cancer genes than those identified from the gene expression profiles. Furthermore, these genes possessed more functional similarity with the known cancer genes than those identified from the gene expression profiles. This study proved the efficiency of the proposed method and showed promising results.

31 citations


Journal ArticleDOI
TL;DR: A novel computational method was developed to predict the side effects of drug compounds by hybridizing the chemical-chemical and protein-chemical interactions and can rank the potential side effects for any query drug according to their predicted level of risk.
Abstract: A drug side effect is an undesirable effect which occurs in addition to the intended therapeutic effect of the drug. The unexpected side effects that many patients suffer from are the major causes of large-scale drug withdrawal. To address the problem, it is highly demanded by pharmaceutical industries to develop computational methods for predicting the side effects of drugs. In this study, a novel computational method was developed to predict the side effects of drug compounds by hybridizing the chemical-chemical and protein-chemical interactions. Compared to most of the previous works, our method can rank the potential side effects for any query drug according to their predicted level of risk. A training dataset and test datasets were constructed from the benchmark dataset that contains 835 drug compounds to evaluate the method. By a jackknife test on the training dataset, the 1st order prediction accuracy was 86.30%, while it was 89.16% on the test dataset. It is expected that the new method may become a useful tool for drug design, and that the findings obtained by hybridizing various interactions in a network system may provide useful insights for conducting in-depth pharmacological research as well, particularly at the level of systems biomedicine.

30 citations


Journal ArticleDOI
25 Jun 2013-PLOS ONE
TL;DR: This study proposes a novel method by which to identify eQTL associations with information theory and machine learning approaches and provides a new way to identify the association between genetic markers and gene expression.
Abstract: Expression Quantitative Trait Locus (eQTL) analysis is a powerful tool to study the biological mechanisms linking the genotype with gene expression. Such analyses can identify genomic locations where genotypic variants influence the expression of genes, both in close proximity to the variant (cis-eQTL), and on other chromosomes (trans-eQTL). Many traditional eQTL methods are based on a linear regression model. In this study, we propose a novel method by which to identify eQTL associations with information theory and machine learning approaches. Mutual Information (MI) is used to describe the association between genetic marker and gene expression. MI can detect both linear and non-linear associations. What’s more, it can capture the heterogeneity of the population. Advanced feature selection methods, Maximum Relevance Minimum Redundancy (mRMR) and Incremental Feature Selection (IFS), were applied to optimize the selection of the affected genes by the genetic marker. When we applied our method to a study of apoE-deficient mice, it was found that the cis-acting eQTLs are stronger than trans-acting eQTLs but there are more trans-acting eQTLs than cis-acting eQTLs. We compared our results (mRMR.eQTL) with R/qtl, and MatrixEQTL (modelLINEAR and modelANOVA). In female mice, 67.9% of mRMR.eQTL results can be confirmed by at least two other methods while only 14.4% of R/qtl result can be confirmed by at least two other methods. In male mice, 74.1% of mRMR.eQTL results can be confirmed by at least two other methods while only 18.2% of R/qtl result can be confirmed by at least two other methods. Our methods provide a new way to identify the association between genetic markers and gene expression. Our software is available from supporting information.

29 citations


Journal ArticleDOI
TL;DR: An optimal feature set consisting of 16 features, which were able to identify the valid pathways most successfully, was obtained and a benchmark dataset with 13,736 pathways consisting of both valid and invalid pathways was produced.
Abstract: In systems biology, it is a great challenge for researchers to identify whether the given set of organic compounds can combine together and form a meaningful pathway. Fortunately, it becomes more and more feasible to address and solve such a problem with the rapidly accumulated information on various organisms. Based on the attainable information, a novel computational approach is proposed to investigate this problem by adopting the metabolic pathway of yeast as the subject of the study. And we produced a benchmark dataset with 13,736 pathways consisting of both valid and invalid pathways and identified the valid pathways among them. Each of these pathways was encoded into a numeric vector, consisting of three parts: graph property, chemical functional group, and chemical structural set. Methods of Minimum Redundancy Maximum Relevance and Incremental Feature Selection were utilized to select an optimal feature set, and Nearest Neighbor Algorithm was adopted as the classification model, while Jackknife Test was used to evaluate the model. As a result, an optimal feature set consisting of 16 features, which were able to identify the valid pathways most successfully, was obtained.

27 citations


Journal ArticleDOI
TL;DR: A novel computational method is presented to predict virulence factors by integrating protein-protein interactions in a STRING database and biological pathways in the KEGG to provide insight and guidance for related research.
Abstract: Virulence factors are molecules that play very important roles in enhancing the pathogen’s capability in causing diseases. Many efforts were made to investigate the mechanism of virulence factors using in silico methods. In this study, we present a novel computational method to predict virulence factors by integrating protein–protein interactions in a STRING database and biological pathways in the KEGG. Three specific species were studied according to their records in the VFDB. They are Campylobacter jejuni NCTC 11168, Escherichia coli O6:K15:H31 536 (UPEC) and Pseudomonas aeruginosa PAO1. The prediction accuracies reached were 0.9467, 0.9575 and 0.9180, respectively. Metabolism pathways, flagellar assembly and chemotaxis may be of importance for virulence based on the analysis of the optimal feature sets we obtained. We hope this can provide some insight and guidance for related research.

21 citations


Journal ArticleDOI
TL;DR: This work proposes a sequence-based computational approach for predicting protein disordered regions by means of the Nearest Neighbor algorithm, in which conservation, amino acid factor and secondary structure status of each amino acid in a fixed-length sliding window are taken as the encoding features.
Abstract: Protein disordered regions are associated with some critical cellular functions such as transcriptional regulation, translation and cellular signal transduction, and they are responsible for various diseases. Although experimental methods have been developed to determine these regions, they are time-consuming and expensive. Therefore, it is highly desired to develop computational methods that can provide us with this kind information in a rapid and inexpensive manner. Here we propose a sequence-based computational approach for predicting protein disordered regions by means of the Nearest Neighbor algorithm, in which conservation, amino acid factor and secondary structure status of each amino acid in a fixed-length sliding window are taken as the encoding features. Also, the feature selection based on mRMR (maximum Relevancy Minimum Redundancy) is applied to obtain an optimal 51-feature set that includes 39 conservation features and 12 secondary structure features. With the optimal 51 features, our predictor yielded quite promising MCC (Mathew's correlation coefficients): 0.371 on a rigorous benchmark dataset tested by 5-fold cross-validation and 0.219 on an independent test dataset. Our results suggest that conservation and secondary structure play important roles in intrinsically disordered proteins.

21 citations


Journal ArticleDOI
TL;DR: A novel approach was introduced to encode substrate/product and enzyme molecules with molecular descriptors and physicochemical properties, respectively, and KNN was adopted to build the substrate-enzyme-product interaction network.
Abstract: It is important to correctly and efficiently predict the interaction of substrate-enzyme and to predict their product in metabolic pathway. In this work, a novel approach was introduced to encode substrate/product and enzyme molecules with molecular descriptors and physicochemical properties, respectively. Based on this encoding method, KNN was adopted to build the substrate-enzyme-product interaction network. After selecting the optimal features that are able to represent the main factors of substrate-enzyme-product interaction in our prediction, totally 160 features out of 290 features were attained which can be clustered into ten categories: elemental analysis, geometry, chemistry, amino acid composition, predicted secondary structure, hydrophobicity, polarizability, solvent accessibility, normalized van der Waals volume, and polarity. As a result, our predicting model achieved an MCC of 0.423 and an overall prediction accuracy of 89.1% for 10-fold cross-validation test.

20 citations


Journal ArticleDOI
06 Jun 2013-PLOS ONE
TL;DR: This study has identified 26 core human proteins involved in PPI between HIV-1 and host, that have great potential for HIV therapy and 280 chemicals that interact with three HIV drugs targeting human proteins can also interact with these 26 core proteins.
Abstract: Acquired immune deficiency syndrome (AIDS) is a severe infectious disease that causes a large number of deaths every year. Traditional anti-AIDS drugs directly targeting the HIV-1 encoded enzymes including reverse transcriptase (RT), protease (PR) and integrase (IN) usually suffer from drug resistance after a period of treatment and serious side effects. In recent years, the emergence of numerous useful information of protein-protein interactions (PPI) in the HIV life cycle and related inhibitors makes PPI a new way for antiviral drug intervention. In this study, we identified 26 core human proteins involved in PPI between HIV-1 and host, that have great potential for HIV therapy. In addition, 280 chemicals that interact with three HIV drugs targeting human proteins can also interact with these 26 core proteins. All these indicate that our method as presented in this paper is quite promising. The method may become a useful tool, or at least plays a complementary role to the existing method, for identifying novel anti-HIV drugs.

17 citations


Journal ArticleDOI
TL;DR: A computational method to predict RB related genes based on Dagging, with the maximum relevance minimum redundancy (mRMR) method followed by incremental feature selection (IFS), showed that these features were closely related to RB.
Abstract: One of the most important and challenging problems in biomedicine is how to predict the cancer related genes. Retinoblastoma (RB) is the most common primary intraocular malignancy usually occurring in childhood. Early detection of RB could reduce the morbidity and promote the probability of disease-free survival. Therefore, it is of great importance to identify RB genes. In this study, we developed a computational method to predict RB related genes based on Dagging, with the maximum relevance minimum redundancy (mRMR) method followed by incremental feature selection (IFS). 119 RB genes were compiled from two previous RB related studies, while 5,500 non-RB genes were randomly selected from Ensemble genes. Ten datasets were constructed based on all these RB and non-RB genes. Each gene was encoded with a 13,126-dimensional vector including 12,887 Gene Ontology enrichment scores and 239 KEGG enrichment scores. Finally, an optimal feature set including 1061 GO terms and 8 KEGG pathways was obtained. Analysis showed that these features were closely related to RB. It is anticipated that the method can be applied to predict the other cancer related genes as well.

Journal ArticleDOI
02 May 2013-PLOS ONE
TL;DR: The ensemble model can better distinguish the high risk and low risk patients than the stage prediction model and the recurrence prediction model alone and it could significantly improve the prediction performance by ensembling heterogeneous information.
Abstract: Colorectal cancer can be grouped into Dukes A, B, C, and D stages based on its developments. Generally speaking, more advanced patients have poorer prognosis. To integrate progression stage prediction systems with recurrence prediction systems, we proposed an ensemble prognostic model for colorectal cancer. In this model, each patient was assigned a most possible stage and a most possible recurrence status. If a patient was predicted to be recurrence patient in advanced stage, he would be classified into high risk group. The ensemble model considered both progression stages and recurrence status. High risk patients and low risk patients predicted by the ensemble model had a significant different disease free survival (log-rank test p-value, 0.0016) and disease specific survival (log-rank test p-value, 0.0041). The ensemble model can better distinguish the high risk and low risk patients than the stage prediction model and the recurrence prediction model alone. This method could be applied to the studies of other diseases and it could significantly improve the prediction performance by ensembling heterogeneous information.

Journal ArticleDOI
TL;DR: A new method to predict disordered regions in proteins using Random Forest, Maximum Relevancy Minimum Redundancy, and Incremental Feature Selection to build the optimal model may shed some light on the understanding of the formation mechanism of disordered structures, providing guidelines for experimental validation.
Abstract: With a large number of disordered proteins and their important functions discovered, it is highly desired to develop effective methods to computationally predict protein disordered regions. In this study, based on Random Forest (RF), Maximum Relevancy Minimum Redundancy (mRMR), and Incremental Feature Selection (IFS), we developed a new method to predict disordered regions in proteins. The mRMR criterion was used to rank the importance of all candidate features. Finally, top 128 features were selected from the ranked feature list to build the optimal model, including 92 Position Specific Scoring Matrix (PSSM) conservation score features and 36 secondary structure features. As a result, Matthews correlation coefficient (MCC) of 0.3895 was achieved on the training set by 10-fold cross-validation. On the basis of predicting results for each query sequence by using the method, we used the scanning and modification strategy to improve the performance. The accuracy (ACC) and MCC were increased by 4% and almost 0.2%, respectively, compared with other three popular predictors: DISOPRED, DISOclust, and OnD-CRF. The selected features may shed some light on the understanding of the formation mechanism of disordered structures, providing guidelines for experimental validation.

Journal ArticleDOI
TL;DR: A computational method was developed to predict protein amidation sites, by incorporating the maximum relevance minimum redundancy method and the incremental feature selection method based on the nearest neighbor algorithm, which could be used as an efficient tool to theoretically predict amidated peptides.
Abstract: Carboxy-terminal α-amidation is a widespread post-translational modification of proteins found widely in vertebrates and invertebrates. The α-amide group is required for full biological activity, since it may render a peptide more hydrophobic and thus better be able to bind to other proteins, preventing ionization of the C-terminus. However, in particular, the C-terminal amidation is very difficult to detect because experimental methods are often labor-intensive, time-consuming and expensive. Therefore, in silico methods may complement due to their high efficiency. In this study, a computational method was developed to predict protein amidation sites, by incorporating the maximum relevance minimum redundancy method and the incremental feature selection method based on the nearest neighbor algorithm. From a total of 735 features, 41 optimal features were selected and were utilized to construct the final predictor. As a result, the predictor achieved an overall Matthews correlation coefficient of 0.8308. Feature analysis showed that PSSM conservation scores and amino acid factors played the most important roles in the α-amidation site prediction. Site-specific feature analyses showed that features derived from the amidation site itself and adjacent sites were most significant. This method presented could be used as an efficient tool to theoretically predict amidated peptides. And the selected features from our study could shed some light on the in-depth understanding of the mechanisms of the amidation modification, providing guidelines for experimental validation.

Journal ArticleDOI
TL;DR: Comparison of top features between interand intra-chain disulfide bonds revealed the similarities and differences of the mechanisms of forming these two types of disulfides, which might help understand more of the mechanism and provide clues to further experimental studies in this research field.
Abstract: Protein disulfide bond is formed during post-translational modifications, and has been implicated in various physiological and pathological processes. Proper localization of disulfide bonds also facilitates the prediction of protein three-dimensional (3D) structure. However, it is both time-consuming and labor-intensive using conventional experimental approaches to determine disulfide bonds, especially for large-scale data sets. Since there are also some limitations for disulfide bond prediction based on 3D structure features, developing sequence-based, convenient and fast-speed computational methods for both inter- and intra-chain disulfide bond prediction is necessary. In this study, we developed a computational method for both types of disulfide bond prediction based on maximum relevance and minimum redundancy (mRMR) method followed by incremental feature selection (IFS), with nearest neighbor algorithm as its prediction model. Features of sequence conservation, residual disorder, and amino acid factor are used for inter-chain disulfide bond prediction. And in addition to these features, sequential distance between a pair of cysteines is also used for intra-chain disulfide bond prediction. Our approach achieves a prediction accuracy of 0.8702 for inter-chain disulfide bond prediction using 128 features and 0.9219 for intra-chain disulfide bond prediction using 261 features. Analysis of optimal feature set indicated key features and key sites for the disulfide bond formation. Interestingly, comparison of top features between interand intra-chain disulfide bonds revealed the similarities and differences of the mechanisms of forming these two types of disulfide bonds, which might help understand more of the mechanisms and provide clues to further experimental studies in this research field.

Journal ArticleDOI
TL;DR: A new classifier called weighted passive nearest neighbor algorithm (WPNNA) is applied to predict the ubiquitination sites using a hybrid of features, including PSSM conservation scores, amino acid factors and disorder scores, which indicates that the predictor based on WPNNA is as least a good complement to the current state of art in ubiquitinations site prediction.
Abstract: Ubiquitination, a reversible protein post-translational modification (PTM), occurs when an amide bond is formed between ubiquitin (a small protein) and the targeted protein. It involves in a wide variety of cellular processes and is associated with various diseases such as Alzheimer's disease. In order to understand ubiquitination at the molecular level, it is important to identify the ubiquitination site by which the ubiquitin binds to. Since experimental methods to determine ubiquitination sites are both expensive and time-consuming, it is necessary to develop in-silico methods to predict ubiquitination sites based on merely the sequential information of the target protein. In this paper, we apply a new classifier called weighted passive nearest neighbor algorithm (WPNNA) to predict the ubiquitination sites. WPNNA was demonstrated to be insensitive to the varied datum densities between different classes. A hybrid of features, including PSSM conservation scores, amino acid factors and disorder scores, are employed to code the protein fragments centered on the possible ubiquitination sites. The Mathew's correlation coefficient (MCC) of our predictor on a training dataset is 0.169 with sensitivity of 31.6% and specificity of 82.9%, and on an independent test dataset is 0.403 with sensitivity of 64.3% and specificity of 75.7%. We compare our predictor with that of a recent published paper which also made predictions on the same datasets. Our predictor achieves much better sensitivities on both datasets than the paper and achieves much better MCC than the paper on the independent test dataset, indicating that the predictor based on WPNNA is as least a good complement to the current state of art in ubiquitination site prediction.

Journal ArticleDOI
TL;DR: With the explosively increasing high-throughput omics data, it is highly desired to develop effective computational methods and tools that can mine useful information to support the development of biochemistry, biomedicine, and drug design
Abstract: With the explosively increasing high-throughput omics data, it is highly desired to develop effective computational methods and tools that can mine useful information to support the development of biochemistry, biomedicine, and drug design. Furthermore, in order to understand the protein-protein, protein-D/RNA, and other complex interactions, systems biology approaches are applied. In this collection, diverse topics were covered and there are many novel methods and intriguing findings. Y. Jiang et al. compared the gene expressions among the colorectal cancer patients in different stages and obtained the early and late stage biomarkers. Then, these two kinds of biomarkers were both mapped onto the protein interaction network, and the signal propagation path from the early stage biomarker to the late one was identified. Their findings may provide useful insights for revealing the mechanism of colorectal cancer progression at the cellular systems biology level. L. N. Lili et al. investigated the process of stroma activation in human ovarian cancer by molecular analysis of matched sets of cancer and surrounding stroma tissues. They found that functionally significant variability exists among ovarian cancer patients in the ability of the microenvironment to modulate cancer development. B. Yang et al. constructed a network-based inference framework for identifying cancer genes from gene expression data. Six identified genes (TSPYL5, CD55, CCNE2, DCK, BBC3, and MUC1) susceptible to breast cancer were verified through the literature mining, GO analysis, and pathway functional enrichment analysis. Lung cancer is one of the most malignant cancers. B. Q. Li et al. identified 25 NSCLC and 38 SCLC genes with the shortest path approach in PPI networks. These candidate genes contained more cancer genes and more functional similarity with cancer genes than those identified from the gene expression profiles. A. R. Iskandar et al. evaluated the perturbation of xenobiotic metabolism in response to cigarette smoke exposure in nasal and bronchial tissues. Their observation suggested that the effects of cigarette smoke exposure on the xenobiotic responses in the bronchial and nasal epithelium of smokers were similar to those observed in their respective organotypic models exposed to cigarette smoke, and nasal tissue could be a used as a reliable surrogate to measure the xenobiotic responses in the bronchial tissue. E. G. Maiorov et al. identified interconnected markers for T-cell acute lymphoblastic leukemia (T-ALL). Their identified genes may serve as biomarkers, alternative to the traditional ones used for the diagnosis of T-ALL, and help understand the pathogenesis of the disease. M. Kalita et al. used a multiplex gene expression profiling platform to investigate the perturbations of the innate pathways induced by TGF in a primary airway epithelial cell model of epithelial mesenchymal transition (EMT). Their results indicated that epigenetic changes produced by EMT induce dynamic state changes of the innate signaling pathway. C. Lu et al. studied the functions of microRNAs related to the liver regeneration of the whitespotted bamboo shark, Chiloscyllium plagiosum. Their work deepened the understanding of mechanisms of liver regeneration and resulted in the addition of a significant number of novel miRNAs sequences to GenBank. T. Alioto et al. presented a lightweight pipeline for first-pass gene prediction on newly sequenced genomes. The two main components are ASPic, a program that derives highly accurate, albeit not necessarily complete, EST-based transcript annotations from EST alignments. The other component is GeneID, a standard gene prediction program, which we have modified to take as evidence intron annotations. The pipeline was successfully tested on the entire C. elegans genome and the 44 ENCODE human pilot regions. J. Zou et al. reviewed advanced systems biology methods in drug discovery and translational biomedicine. Their review provided a framework for addressing disease mechanism and approaching drug discovery. L. Chen et al. proposed a computational method to predict the side effects of drugs, which integrated the information of chemical-chemical and protein-chemical interactions. Compared to most of the previous studies, the proposed method can provide the order information of the side effects for any query drug. K. Wang et al. proposed an accurate method for protein-ligand binding site on protein surface using SVM and statistical depth function. The accuracy, sensitivity, and specificity on training set are 77.55%, 56.15%, and 87.96%, respectively, and on the independent test set the accuracy, sensitivity, and specificity are 80.36%, 53.53%, and 92.38%, respectively. K. K. Tseng et al. presented a new system and novel approaches to classify different kinds of sperm images in order to assess their health. In their evaluation, the method reached accuracy of 87.5% and has better performance than the existing approaches to sperm classification. A rapid method is required to mitigate complexity and computation challenges on high throughput protein identification. In Method for Rapid Protein Identification in a Large Database, an accelerated open method is presented by W. Zhang et al. to satisfy this requirement to some extent. Q. Zou et al. proposed a novel method for distinguishing cytokine from other proteins. It is of vital importance of identifying cytokine in silicon. Ensemble classification strategy was employed for improving the prediction performance, and a friendly prediction web server was also developed. Du and Yu introduced a novel method, SubMito-PSPCP, which embeds the PSSM into the pseudoamino acid compositions, to predict protein submitochondrial locations. T. Gu et al. applied the Support Vector Regression and a two stage feature selection to developing the computational model which maps DPP-IV inhibitors to the activity. They also developed the online server. Based on nonlinear mapping and Coulomb function, X. Liu et al. applied 3D kernel approach to predict the four protein tertiary structural classes and five membrane protein types with satisfactory results. It has not escaped our notice that kernel approaches may hold a high potential for predicting the other protein features. T. H. Zhao et al. proposed a new method to predict protein disordered regions based on sequence features. The accuracy and MCC (Matthew's correlation coefficient) of their method are higher than three popular disordered region predictors: DISOPRED, DISOclust, and OnD-CRF. M. S. M. Ali et al. studied the structure and function of LipA8 which is able to adapt to extreme temperatures. Simulations show that it is most stable at 0°C and 5°C. In extreme temperature, the catalytic domain (N-terminus) maintained its stability than the noncatalytic domain (C-terminus), but the noncatalytic domain showed higher flexibility than the catalytic domain. A Boolean network (BN) is widely used as a model of gene regulatory networks. K. Kobayashi et al. proposed a BN model with two types of the control inputs and an optimal control method with duration of drug effectiveness. The optimal control problem is reduced to an integer programming problem. J. Zhang et al. studied the microRNA-mediated regulation in biological systems with oscillatory behavior. They started with two specific microRNA-mediated regulatory circuits which show their fine-tuning roles in the modulation of periodic behavior and then applied these results to study the effects of miR369-3 regulation of cell cycle. B. Yan et al. developed a mathematical model to study the mechanisms underlying the size checkpoint in fission yeast. They found that when the spatiotemporal regulation is coupled to the positive feedback loops, the mitosis-promoting factor (MPF) exhibits a bistable steady-state relationship with the cell size. The switch-like response from the positive feedback loops naturally generates the cell size checkpoint. Detection of potential siRNA off-targets is crucial for High Content Screening (HCS) using small interfering RNAs (siRNAs). S. Das et al. performed a detailed off-target analysis of three most commonly used kinome siRNA libraries based on latest RefSeq version and created SeedSeq database, a new unique format to store off-target information. L. Zhu et al. systematically investigated the characteristics and evolutionary pattern of actin gene family in primates. Phylogenetic analysis of 233 actin genes in human, chimpanzee, gorilla, orangutan, gibbon, rhesus monkey, and marmoset genomes showed that actin genes in the seven species could be divided into two major types of clades: orthologous group versus complex group. Codon usages and gene expression patterns of actin gene copies were highly consistent among the groups because of basic functions needed by the organisms but much diverged within species due to functional diversification. J. Ping et al. performed long time-scale molecular dynamics simulations on both open and closed states of Escherichia coli adenylate kinase (ADK); based on which a conformational selection mechanism was proposed to explain the large scale domain motion of this enzyme. Yudong Cai Tao Huang Lei Chen Bin Niu