scispace - formally typeset
Search or ask a question
Author

Zhixia Teng

Bio: Zhixia Teng is an academic researcher from Northeast Forestry University. The author has contributed to research in topics: Enhancer & Feature selection. The author has an hindex of 3, co-authored 6 publications receiving 25 citations.

Papers
More filters
Journal ArticleDOI
TL;DR: The results show that the proposed CFRP-based prediction model achieves better performances than the others in term of the evaluation metrics, and the complex features generated by CFRP are beneficial for building a powerful predicting model of ncRNA-protein interaction.
Abstract: Non-coding RNA (ncRNA) plays important roles in many critical regulation processes. Many ncRNAs perform their regulatory functions by the form of RNA-protein complexes. Therefore, identifying the interaction between ncRNA and protein is fundamental to understand functions of ncRNA. Under pressures from expensive cost of experimental techniques, developing an accuracy computational predictive model has become an indispensable way to identify ncRNA-protein interaction. A powerful predicting model of ncRNA-protein interaction needs a good feature set of characterizing the interaction. In this paper, a novel method is put forward to generate complex features for characterizing ncRNA-protein interaction (named CFRP). To obtain a comprehensive description of ncRNA-protein interaction, complex features are generated by non-linear transformations from the traditional k-mer features of ncRNA and protein sequences. To further reduce the dimensions of complex features, a group of discriminative features are selected by random forest. To validate the performances of the proposed method, a series of experiments are carried on several widely-used public datasets. Compared with the traditional k-mer features, the CFRP complex features can boost the performances of ncRNA-protein interaction prediction model. Meanwhile, the CFRP-based prediction model is compared with several state-of-the-art methods, and the results show that the proposed method achieves better performances than the others in term of the evaluation metrics. In conclusion, the complex features generated by CFRP are beneficial for building a powerful predicting model of ncRNA-protein interaction.

20 citations

Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors used bidirectional LSTM (EBLSTM) to extract subsequences by sliding a 3-mer window along the DNA sequence as features.
Abstract: Enhancers are regulatory DNA sequences that could be bound by specific proteins named transcription factors (TFs). The interactions between enhancers and TFs regulate specific genes by increasing the target gene expression. Therefore, enhancer identification and classification have been a critical issue in the enhancer field. Unfortunately, so far there has been a lack of suitable methods to identify enhancers. Previous research has mainly focused on the features of the enhancer's function and interactions, which ignores the sequence information. As we know, the recurrent neural network (RNN) and long short-term memory (LSTM) models are currently the most common methods for processing time series data. LSTM is more suitable than RNN to address the DNA sequence. In this paper, we take the advantages of LSTM to build a method named iEnhancer-EBLSTM to identify enhancers. iEnhancer-ensembles of bidirectional LSTM (EBLSTM) consists of two steps. In the first step, we extract subsequences by sliding a 3-mer window along the DNA sequence as features. Second, EBLSTM model is used to identify enhancers from the candidate input sequences. We use the dataset from the study of Quang H et al. as the benchmarks. The experimental results from the datasets demonstrate the efficiency of our proposed model.

14 citations

Journal ArticleDOI
TL;DR: The results suggest that the proposed support vector machine- (SVM-) based classifier can be used in human enzyme identification effectively and efficiently and can help to understand their functions and develop new drugs.
Abstract: Enzymes are proteins that can efficiently catalyze specific biochemical reactions, and they are widely present in the human body. Developing an efficient method to identify human enzymes is vital to select enzymes from the vast number of human proteins and to investigate their functions. Nevertheless, only a limited amount of research has been conducted on the classification of human enzymes and nonenzymes. In this work, we developed a support vector machine- (SVM-) based predictor to classify human enzymes using the amino acid composition (AAC), the composition of k-spaced amino acid pairs (CKSAAP), and selected informative amino acid pairs through the use of a feature selection technique. A training dataset including 1117 human enzymes and 2099 nonenzymes and a test dataset including 684 human enzymes and 1270 nonenzymes were constructed to train and test the proposed model. The results of jackknife cross-validation showed that the overall accuracy was 76.46% for the training set and 76.21% for the test set, which are higher than the 72.6% achieved in previous research. Furthermore, various feature extraction methods and mainstream classifiers were compared in this task, and informative feature parameters of k-spaced amino acid pairs were selected and compared. The results suggest that our classifier can be used in human enzyme identification effectively and efficiently and can help to understand their functions and develop new drugs.

9 citations

Journal ArticleDOI
TL;DR: A novel optimization framework to detect complexes from protein-protein interaction (PPI) network, named PLSMC, which can match known complexes with a higher accuracy than other methods and has high functional homogeneity.
Abstract: Protein complex formed by a group of physical interacting proteins plays a crucial role in cell activities. Great effort has been made to computationally identify protein complexes from protein-protein interaction (PPI) network. However, the accuracy of the prediction is still far from being satisfactory, because the topological structures of protein complexes in the PPI network are too complicated. This paper proposes a novel optimization framework to detect complexes from PPI network, named PLSMC. The method is on the basis of the fact that if two proteins are in a common complex, they are likely to be interacting. PLSMC employs this relation to determine complexes by a penalized least squares method. PLSMC is applied to several public yeast PPI networks, and compared with several state-of-the-art methods. The results indicate that PLSMC outperforms other methods. In particular, complexes predicted by PLSMC can match known complexes with a higher accuracy than other methods. Furthermore, the predicted complexes have high functional homogeneity.

5 citations

Journal ArticleDOI
TL;DR: Li et al. as mentioned in this paper used a multifeature encoding scheme consisting of Kmer and EIIP to describe the DNA sequences and developed a stacked ensemble model, in which four machine learning algorithms, namely, BayesNet, NaiveBayes, LibSVM and Voted Perceptron, were utilized to implement an ensemble of base classifiers that produce intermediate results as input of the metaclassifier, Logistic.
Abstract: As one of important epigenetic modifications, DNA N4-methylcytosine (4mC) plays a crucial role in controlling gene replication, expression, cell cycle, DNA replication, and differentiation. The accurate identification of 4mC sites is necessary to understand biological functions. In the paper, we use ensemble learning to develop a model named i4mC-EL to identify 4mC sites in the mouse genome. Firstly, a multifeature encoding scheme consisting of Kmer and EIIP was adopted to describe the DNA sequences. Secondly, on the basis of the multifeature encoding scheme, we developed a stacked ensemble model, in which four machine learning algorithms, namely, BayesNet, NaiveBayes, LibSVM, and Voted Perceptron, were utilized to implement an ensemble of base classifiers that produce intermediate results as input of the metaclassifier, Logistic. The experimental results on the independent test dataset demonstrate that the overall rate of predictive accurate of i4mC-EL is 82.19%, which is better than the existing methods. The user-friendly website implementing i4mC-EL can be accessed freely at the following.

4 citations


Cited by
More filters
01 Jan 2014
TL;DR: Details of primers used for quantitative PCR and Reverse Transcriptase PCR Gene symbol Forward strand (5’→3’) Reverse strand ( 5’ →3”) FLK-1.
Abstract: Name Sequence Exon 2 mG6pc S 5’-TCCCTGTCACCTGTGAG-3’ Exon 5 mG6pc AS 5’-CACAAGAAGTCTTTGTAA-3’ Exon1 mG6pcS 5’-TTACCAAGACTCCCAGGACTG-3’ Exon2 mG6pcAS 5’-GAGCTGTTGCTGTAGTAGTCG-3’ Pck1S 5’-AGCCTTTGGTCAACAACTGG-3’ Pck1AS 5’-TGCCTTCGGGGTTAGTTATG-3’ GcgR S 5’-ACCCAACTATTGCTGGTTGC-3’ GcgR AS 5’-CCATGTTGTCATTGCTGGTC-3’ Hmgcs2 S 5’-CCGTATGGGCTTCTGTTCAG-3’ Hmgcs2 AS 5’-AGCTTTGTGCGTTCCATCAG-3’ mL19S 5’-AGAAGATTGACCGCCATAT-3’ mL19AS 5’-TTCGTGCTTCCTTGGTCTTAGA-3’ CRU G6pc S 5’-TTTGCTATTTTACGTAAATCACCCT-3’ CRU G6pc AS 5’-GTACCTCAGGAAGCTGCCA-3’ CRU Pck1 S 5’-GGCCTCCCAACATTCATTAAC-3’ CRU Pck1 AS 5’-GTAGCTAGCCCTCCTCGCTTTAA-3’ GRU G6pc S 5’-CACCCCTTAGCACTGTAAGCCGTGTG-3’ GRU G6pc AS 5’-GGATTCAGTCTGTAGGTCAACCTAGCCC-3’ GRU Pck1 S 5’-TGCAGCCAGCAACATATGAA-3’

384 citations

Journal ArticleDOI
TL;DR: An overview of the successful implementation of various deep learning approaches for predicting RNA– protein interactions, mainly focusing on the prediction of RNA–protein interaction pairs and RBP‐binding sites on RNAs is provided.
Abstract: Interactions between RNAs and proteins play essential roles in many important biological processes. Benefitting from the advances of next generation sequencing technologies, hundreds of RNA-binding proteins (RBP) and their associated RNAs have been revealed, which enables the large-scale prediction of RNA-protein interactions using machine learning methods. Till now, a wide range of computational tools and pipelines have been developed, including deep learning models, which have achieved remarkable performance on the identification of RNA-protein binding affinities and sites. In this review, we provide an overview of the successful implementation of various deep learning approaches for predicting RNA-protein interactions, mainly focusing on the prediction of RNA-protein interaction pairs and RBP-binding sites on RNAs. Furthermore, we discuss the advantages and disadvantages of these approaches, and highlight future perspectives on how to design better deep learning models. Finally, we suggest some promising future directions of computational tasks in the study of RNA-protein interactions, especially the interactions between noncoding RNAs and proteins. This article is categorized under: RNA Interactions with Proteins and Other Molecules > Protein-RNA Interactions: Functional Implications RNA Evolution and Genomics > Computational Analyses of RNA RNA Interactions with Proteins and Other Molecules > Protein-RNA Recognition.

55 citations

04 Jan 2018
TL;DR: A database, named as MeDReaders, was constructed to collect information about methylated DNA binding activities of 731 TFs, which could bind to DNA motifs containing highly methylated CpGs both in vitro and in vivo.
Abstract: Understanding the molecular principles governing interactions between transcription factors (TFs) and DNA targets is one of the main subjects for transcriptional regulation. Recently, emerging evidence demonstrated that some TFs could bind to DNA motifs containing highly methylated CpGs both in vitro and in vivo. Identification of such TFs and elucidation of their physiological roles now become an important stepping-stone toward understanding the mechanisms underlying the methylation-mediated biological processes, which have crucial implications for human disease and disease development. Hence, we constructed a database, named as MeDReaders, to collect information about methylated DNA binding activities. A total of 731 TFs, which could bind to methylated DNA sequences, were manually curated in human and mouse studies reported in the literature. In silico approaches were applied to predict methylated and unmethylated motifs of 292 TFs by integrating whole genome bisulfite sequencing (WGBS) and ChIP-Seq datasets in six human cell lines and one mouse cell line extracted from ENCODE and GEO database. MeDReaders database will provide a comprehensive resource for further studies and aid related experiment designs. The database implemented unified access for users to most TFs involved in such methylation-associated binding actives. The website is available at http://medreader.org/.

52 citations

Journal ArticleDOI
TL;DR: An overview of genetic mutations associated with cardiomyopathy and the roles of some epigenetic mechanisms in HF is given.
Abstract: Heart failure (HF) is a complex pathophysiological syndrome that arises from a primary defect in the ability of the heart to take in and/or eject sufficient blood. Genetic mutations associated with familial dilated cardiomyopathy, hypertrophic cardiomyopathy, and arrhythmogenic right ventricular cardiomyopathy can contribute to the various pathologies of HF. Therefore, genetic screening could be an approach for guiding individualized therapies and surveillance. In addition, epigenetic regulation occurs via key mechanisms, including ATP-dependent chromatin remodeling, DNA methylation, histone modification, and RNA-based mechanisms. MicroRNA is also a hot spot in HF research. This review gives an overview of genetic mutations associated with cardiomyopathy and the roles of some epigenetic mechanisms in HF.

47 citations

Journal ArticleDOI
TL;DR: A novel method based on deep learning to identify cancer-specific circRNA–RBP binding sites (CSCRSites), only using the nucleotide sequences as the input, which shows that CSCRSite outperform the conventional machine learning classifiers and some representative deep learning methods on the benchmark data.
Abstract: Circular RNAs (circRNAs) are extensively expressed in cells and tissues, and play crucial roles in human diseases and biological processes. Recent studies have reported that circRNAs could function as RNA binding protein (RBP) sponges, meanwhile RBPs can also be involved in back-splicing. The interaction with RBPs is also considered an important factor for investigating the function of circRNAs. Hence, it is necessary to understand the interaction mechanisms of circRNAs and RBPs, especially in human cancers. Here, we present a novel method based on deep learning to identify cancer-specific circRNA–RBP binding sites (CSCRSites), only using the nucleotide sequences as the input. In CSCRSites, an architecture with multiple convolution layers is utilized to detect the features of the raw circRNA sequence fragments, and further identify the binding sites through a fully connected layer with the softmax output. The experimental results show that CSCRSites outperform the conventional machine learning classifiers and some representative deep learning methods on the benchmark data. In addition, the features learnt by CSCRSites are converted to sequence motifs, some of which can match to human known RNA motifs involved in human diseases, especially cancer. Therefore, as a deep learning-based tool, CSCRSites could significantly contribute to the function analysis of cancer-associated circRNAs.

42 citations