scispace - formally typeset
Journal ArticleDOI: 10.1093/BFGP/ELAA023

Prediction of bio-sequence modifications and the associations with diseases.

02 Mar 2021-Briefings in Functional Genomics (Oxford University Press (OUP))-Vol. 20, Iss: 1, pp 1-18
Abstract: Modifications of protein, RNA and DNA play an important role in many biological processes and are related to some diseases. Therefore, accurate identification and comprehensive understanding of protein, RNA and DNA modification sites can promote research on disease treatment and prevention. With the development of sequencing technology, the number of known sequences has continued to increase. In the past decade, many computational tools that can be used to predict protein, RNA and DNA modification sites have been developed. In this review, we comprehensively summarized the modification site predictors for three different biological sequences and the association with diseases. The relevant web server is accessible at http://lab.malab.cn/∼acy/PTM_data/ some sample data on protein, RNA and DNA modification can be downloaded from that website.

... read more

Topics: RNA (51%)
Citations
  More

9 results found


Open accessJournal ArticleDOI: 10.3389/FGENE.2021.665498
Kun Niu1, Ximei Luo2, Shumei Zhang1, Zhixia Teng1  +2 moreInstitutions (2)
Abstract: Enhancers are regulatory DNA sequences that could be bound by specific proteins named transcription factors (TFs). The interactions between enhancers and TFs regulate specific genes by increasing the target gene expression. Therefore, enhancer identification and classification have been a critical issue in the enhancer field. Unfortunately, so far there has been a lack of suitable methods to identify enhancers. Previous research has mainly focused on the features of the enhancer's function and interactions, which ignores the sequence information. As we know, the recurrent neural network (RNN) and long short-term memory (LSTM) models are currently the most common methods for processing time series data. LSTM is more suitable than RNN to address the DNA sequence. In this paper, we take the advantages of LSTM to build a method named iEnhancer-EBLSTM to identify enhancers. iEnhancer-ensembles of bidirectional LSTM (EBLSTM) consists of two steps. In the first step, we extract subsequences by sliding a 3-mer window along the DNA sequence as features. Second, EBLSTM model is used to identify enhancers from the candidate input sequences. We use the dataset from the study of Quang H et al. as the benchmarks. The experimental results from the datasets demonstrate the efficiency of our proposed model.

... read more

Topics: Enhancer (50%)

1 Citations


Open accessJournal ArticleDOI: 10.1186/S12967-021-03084-X
Shihu Jiao1, Quan Zou1, Huannan Guo, Lei ShiInstitutions (1)
Abstract: Cancer is one of the most serious diseases threatening human health. Cancer immunotherapy represents the most promising treatment strategy due to its high efficacy and selectivity and lower side effects compared with traditional treatment. The identification of tumor T cell antigens is one of the most important tasks for antitumor vaccines development and molecular function investigation. Although several machine learning predictors have been developed to identify tumor T cell antigen, more accurate tumor T cell antigen identification by existing methodology is still challenging. In this study, we used a non-redundant dataset of 592 tumor T cell antigens (positive samples) and 393 tumor T cell antigens (negative samples). Four types feature encoding methods have been studied to build an efficient predictor, including amino acid composition, global protein sequence descriptors and grouped amino acid and peptide composition. To improve the feature representation ability of the hybrid features, we further employed a two-step feature selection technique to search for the optimal feature subset. The final prediction model was constructed using random forest algorithm. Finally, the top 263 informative features were selected to train the random forest classifier for detecting tumor T cell antigen peptides. iTTCA-RF provides satisfactory performance, with balanced accuracy, specificity and sensitivity values of 83.71%, 78.73% and 88.69% over tenfold cross-validation as well as 73.14%, 62.67% and 83.61% over independent tests, respectively. The online prediction server was freely accessible at http://lab.malab.cn/~acy/iTTCA . We have proven that the proposed predictor iTTCA-RF is superior to the other latest models, and will hopefully become an effective and useful tool for identifying tumor T cell antigens presented in the context of major histocompatibility complex class I.

... read more

Topics: T cell (52%), Antigen (52%), Major histocompatibility complex (51%)

1 Citations


Journal ArticleDOI: 10.1093/BIB/BBAB477
Yu Sun1, Haicheng Li1, Lei Zheng1, Jinzhao Li1  +6 moreInstitutions (2)
Abstract: Lactic acid bacteria consortia are commonly present in food, and some of these bacteria possess probiotic properties. However, discovery and experimental validation of probiotics require extensive time and effort. Therefore, it is of great interest to develop effective screening methods for identifying probiotics. Advances in sequencing technology have generated massive genomic data, enabling us to create a machine learning-based platform for such purpose in this work. This study first selected a comprehensive probiotics genome dataset from the probiotic database (PROBIO) and literature surveys. Then, k-mer (from 2 to 8) compositional analysis was performed, revealing diverse oligonucleotide composition in strain genomes and apparently more probiotic (P-) features in probiotic genomes than non-probiotic genomes. To reduce noise and improve computational efficiency, 87 376 k-mers were refined by an incremental feature selection (IFS) method, and the model achieved the maximum accuracy level at 184 core features, with a high prediction accuracy (97.77%) and area under the curve (98.00%). Functional genomic analysis using annotations from gene ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG) and Rapid Annotation using Subsystem Technology (RAST) databases, as well as analysis of genes associated with host gastrointestinal survival/settlement, carbohydrate utilization, drug resistance and virulence factors, revealed that the distribution of P-features was biased toward genes/pathways related to probiotic function. Our results suggest that the role of probiotics is not determined by a single gene, but by a combination of k-mer genomic components, providing new insights into the identification and underlying mechanisms of probiotics. This work created a novel and free online bioinformatic tool, iProbiotics, which would facilitate rapid screening for probiotics.

... read more

Topics: KEGG (52%), Genome (50%)

Open accessJournal ArticleDOI: 10.3389/FGENE.2021.811158
Abstract: DNA-binding protein (DBP) is a protein with a special DNA binding domain that is associated with many important molecular biological mechanisms. Rapid development of computational methods has made it possible to predict DBP on a large scale; however, existing methods do not fully integrate DBP-related features, resulting in rough prediction results. In this article, we develop a DNA-binding protein identification method called KK-DBP. To improve prediction accuracy, we propose a feature extraction method that fuses multiple PSSM features. The experimental results show a prediction accuracy on the independent test dataset PDB186 of 81.22%, which is the highest of all existing methods.

... read more


Journal ArticleDOI: 10.1016/J.YMETH.2021.07.003
Lian Liu, Bowen Song1, Kunqi Chen2, Yuxin Zhang2  +5 moreInstitutions (2)
07 Jul 2021-Methods
Abstract: The primary sequences of DNA, RNA and protein have been used as the dominant information source of existing machine learning tools, especially for contexts not fully explored by wet-experimental approaches. Since molecular markers are profoundly orchestrated in the living organisms, those markers that cannot be unambiguously recovered from the primary sequence often help to predict other biological events. To the best of our knowledge, there is no current tool to build and deploy machine learning models that consider genomic evidence. We therefore developed the WHISTLE server, the first machine learning platform based on genomic coordinates. It features convenient covariate extraction and model web deployment with 46 distinct genomic features integrated along with the conventional sequence features. We showed that, when predicting m6A sites from SRAMP project, the model integrating genomic features substantially outperformed those based on only sequence features. The WHISTLE server should be a useful tool for studying biological attributes specifically associated with genomic coordinates, and is freely accessible at: www.xjtlu.edu.cn/biologicalsciences/whi2.

... read more


References
  More

184 results found


Journal ArticleDOI: 10.1038/NATURE01511
Ruedi Aebersold1, Matthias Mann2Institutions (2)
13 Mar 2003-Nature
Abstract: Recent successes illustrate the role of mass spectrometry-based proteomics as an indispensable tool for molecular and cellular biology and for the emerging field of systems biology. These include the study of protein-protein interactions via affinity-based isolations on a small and proteome-wide scale, the mapping of numerous organelles, the concurrent description of the malaria parasite genome and proteome, and the generation of quantitative protein profiles from diverse species. The ability of mass spectrometry to identify and, increasingly, to precisely quantify thousands of proteins from complex samples can be expected to impact broadly on biology and medicine.

... read more

Topics: Proteomics (62%), Proteome (58%), Mass spectrometry data format (57%) ... show more

6,305 Citations


Open accessJournal ArticleDOI: 10.1016/J.MOLCEL.2010.09.019
22 Oct 2010-Molecular Cell
Abstract: Damage to our genetic material is an ongoing threat to both our ability to faithfully transmit genetic information to our offspring as well as our own survival. To respond to these threats, eukaryotes have evolved the DNA damage response (DDR). The DDR is a complex signal transduction pathway that has the ability to sense DNA damage and transduce this information to the cell to influence cellular responses to DNA damage. Cells possess an arsenal of enzymatic tools capable of remodeling and repairing DNA; however, their activities must be tightly regulated in a temporal, spatial, and DNA lesion-appropriate fashion to optimize repair and prevent unnecessary and potentially deleterious alterations in the structure of DNA during normal cellular processes. This review will focus on how the DDR controls DNA repair and the phenotypic consequences of defects in these critical regulatory functions in mammals.

... read more

Topics: DNA damage (62%), DNA repair (61%), DNA re-replication (58%) ... show more

3,211 Citations


Open accessJournal ArticleDOI: 10.1038/NMETH.1459
Benjamin Flusberg1, Dale R. Webster1, Jessica Lee1, Kevin Travers1  +4 moreInstitutions (1)
09 May 2010-Nature Methods
Abstract: We describe the direct detection of DNA methylation, without bisulfite conversion, through single-molecule, real-time (SMRT) sequencing. In SMRT sequencing, DNA polymerases catalyze the incorporation of fluorescently labeled nucleotides into complementary nucleic acid strands. The arrival times and durations of the resulting fluorescence pulses yield information about polymerase kinetics and allow direct detection of modified nucleotides in the DNA template, including N6-methyladenine, 5-methylcytosine and 5-hydroxymethylcytosine. Measurement of polymerase kinetics is an intrinsic part of SMRT sequencing and does not adversely affect determination of primary DNA sequence. The various modifications affect polymerase kinetics differently, allowing discrimination between them. We used these kinetic signatures to identify adenine methylation in genomic samples and found that, in combination with circular consensus sequencing, they can enable single-molecule identification of epigenetic modifications with base-pair resolution. This method is amenable to long read lengths and will likely enable mapping of methylation patterns in even highly repetitive genomic regions.

... read more

1,190 Citations


Open accessJournal ArticleDOI: 10.1093/NAR/GKI901
Abstract: We describe a large-scale random approach termed reduced representation bisulfite sequencing (RRBS) for analyzing and comparing genomic methylation patterns. BglII restriction fragments were size-selected to 500-600 bp, equipped with adapters, treated with bisulfite, PCR amplified, cloned and sequenced. We constructed RRBS libraries from murine ES cells and from ES cells lacking DNA methyltransferases Dnmt3a and 3b and with knocked-down (kd) levels of Dnmt1 (Dnmt[1(kd),3a-/-,3b-/-]). Sequencing of 960 RRBS clones from Dnmt[1(kd),3a-/-,3b-/-] cells generated 343 kb of non-redundant bisulfite sequence covering 66212 cytosines in the genome. All but 38 cytosines had been converted to uracil indicating a conversion rate of >99.9%. Of the remaining cytosines 35 were found in CpG and 3 in CpT dinucleotides. Non-CpG methylation was >250-fold reduced compared with wild-type ES cells, consistent with a role for Dnmt3a and/or Dnmt3b in CpA and CpT methylation. Closer inspection revealed neither a consensus sequence around the methylated sites nor evidence for clustering of residual methylation in the genome. Our findings indicate random loss rather than specific maintenance of methylation in Dnmt[1(kd),3a-/-,3b-/-] cells. Near-complete bisulfite conversion and largely unbiased representation of RRBS libraries suggest that random shotgun bisulfite sequencing can be scaled to a genome-wide approach.

... read more

958 Citations


Journal ArticleDOI: 10.1126/SCIENCE.1261417
27 Feb 2015-Science
Abstract: Naive and primed pluripotent states retain distinct molecular properties, yet limited knowledge exists on how their state transitions are regulated. Here, we identify Mettl3, an N(6)-methyladenosine (m(6)A) transferase, as a regulator for terminating murine naive pluripotency. Mettl3 knockout preimplantation epiblasts and naive embryonic stem cells are depleted for m(6)A in mRNAs, yet are viable. However, they fail to adequately terminate their naive state and, subsequently, undergo aberrant and restricted lineage priming at the postimplantation stage, which leads to early embryonic lethality. m(6)A predominantly and directly reduces mRNA stability, including that of key naive pluripotency-promoting transcripts. This study highlights a critical role for an mRNA epigenetic modification in vivo and identifies regulatory modules that functionally influence naive and primed pluripotency in an opposing manner.

... read more

Topics: Rex1 (59%), MRNA modification (58%), MRNA methylation (57%) ... show more

864 Citations