Characterization and identification of protein O-GlcNAcylation sites with substrate specificity
Hsin Yi Wu,Cheng Tsung Lu,Hui Ju Kao,Yi-Ju Chen,Yu-Ju Chen,Tzong-Yi Lee +5 more
- Vol. 15, Iss: 16, pp 1-12
TLDR
A computational method to identify informative substrate motifs for O-GlcNAcylation sites with the consideration of substrate site specificity is proposed and may help unravel their mechanisms and roles in signaling, transcription, chronic disease, and cancer.Abstract:
Protein O-GlcNAcylation, involving the attachment of single N-acetylglucosamine (GlcNAc) to the hydroxyl group of serine or threonine residues. Elucidation of O-GlcNAcylation sites on proteins is required in order to decipher its crucial roles in regulating cellular processes and aid in drug design. With an increasing number of O-GlcNAcylation sites identified by mass spectrometry (MS)-based proteomics, several methods have been proposed for the computational identification of O-GlcNAcylation sites. However, no development that focuses on the investigation of O-GlcNAcylated substrate motifs has existed. Thus, we were motivated to design a new method for the identification of protein O-GlcNAcylation sites with the consideration of substrate site specificity. In this study, 375 experimentally verified O-GlcNAcylation sites were collected from dbOGAP, which is an integrated resource for protein O-GlcNAcylation. Due to the difficulty in characterizing the substrate motifs by conventional sequence logo analysis, a recursively statistical method has been applied to obtain significant conserved motifs. To construct the predictive models learned from the identified substrate motifs, we adopted Support Vector Machines (SVMs). A five-fold cross validation was used to evaluate the predictive model, achieving sensitivity, specificity, and accuracy of 0.76, 0.80, and 0.78, respectively. Additionally, an independent testing set, which was really blind to the training data of predictive model, was used to demonstrate that the proposed method could provide a promising accuracy (0.94) and outperform three other O-GlcNAcylation site prediction tools. This work proposed a computational method to identify informative substrate motifs for O-GlcNAcylation sites. The evaluation of cross validation and independent testing indicated that the identified motifs were effective in the identification of O-GlcNAcylation sites. A case study demonstrated that the proposed method could be a feasible means of conducting preliminary analyses of protein O-GlcNAcylation. We also anticipated that the revealed substrate motif may facilitate the study of extensive crosstalk between O-GlcNAcylation and phosphorylation. This method may help unravel their mechanisms and roles in signaling, transcription, chronic disease, and cancer.read more
Citations
More filters
Journal ArticleDOI
dbPTM 2016: 10-year anniversary of a resource for post-translational modification of proteins
Kai-Yao Huang,Min-Gang Su,Hui-Ju Kao,Yun-Chung Hsieh,Jhih-Hua Jhong,Kuang-Hao Cheng,Hsien Da Huang,Tzong-Yi Lee +7 more
TL;DR: This update manually curates over 12 000 modified peptides, including the emerging S-nitrosylation, S-glutathionylation and succinylation, from approximately 500 research articles, which were retrieved by text mining.
Journal ArticleDOI
dbPTM in 2019: exploring disease association and cross-talk of post-translational modifications.
Kai-Yao Huang,Tzong-Yi Lee,Hui Ju Kao,Chen Tse Ma,Chao Chun Lee,Tsai Hsuan Lin,Wen Chi Chang,Hsien Da Huang +7 more
TL;DR: The dbPTM update highlights the current challenges in PTM crosstalk investigation and breaks the bottleneck of how proteomics may contribute to understanding PTM codes, revealing the next level of data complexity and proteomic limitation in prospective PTM research.
Journal ArticleDOI
Sunbeam: an extensible pipeline for analyzing metagenomic sequencing experiments.
Erik L. Clarke,Louis J. Taylor,Chunyu Zhao,Andrew Connell,Jung-Jin Lee,Bryton Fett,Frederic D. Bushman,Kyle Bittinger +7 more
TL;DR: Sunbeam provides a foundation to build more in-depth analyses and to enable comparisons in metagenomic sequencing experiments by removing problematic, low-complexity reads and standardizing post-processing and analytical steps.
Journal ArticleDOI
O-GlcNAcPRED-II: an integrated classification algorithm for identifying O-GlcNAcylation sites based on fuzzy undersampling and a K-means PCA oversampling technique.
Cangzhi Jia,Yun Zuo,Quan Zou +2 more
TL;DR: An ensemble model O‐GlcNAcPRED‐II, a type of classifier‐integrated system, was developed to identify potential O‐glcNAcylation sites and indicated that the proposed predictor outperformed five published prediction tools.
Journal ArticleDOI
An improved discriminative filter bank selection approach for motor imagery EEG signal classification using mutual information
TL;DR: Introducing a wide sub-band and using mutual information for selecting the most discriminative sub-bands, the proposed method shows improvement in motor imagery EEG signal classification.
References
More filters
Journal ArticleDOI
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.
Stephen F. Altschul,Thomas L. Madden,Alejandro A. Schäffer,Jinghui Zhang,Zheng Zhang,Webb Miller,David J. Lipman +6 more
TL;DR: A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original.
Journal ArticleDOI
LIBSVM: A library for support vector machines
Chih-Chung Chang,Chih-Jen Lin +1 more
TL;DR: Issues such as solving SVM optimization problems theoretical convergence multiclass classification probability estimates and parameter selection are discussed in detail.
Journal ArticleDOI
The Protein Data Bank
Helen M. Berman,John D. Westbrook,Zukang Feng,Gary L. Gilliland,Talapady N. Bhat,Helge Weissig,Ilya N. Shindyalov,Philip E. Bourne +7 more
TL;DR: The goals of the PDB are described, the systems in place for data deposition and access, how to obtain further information and plans for the future development of the resource are described.
Journal ArticleDOI
WebLogo: A Sequence Logo Generator
TL;DR: WebLogo generates sequence logos, graphical representations of the patterns within a multiple sequence alignment that provide a richer and more precise description of sequence similarity than consensus sequences and can rapidly reveal significant features of the alignment otherwise difficult to perceive.
Journal ArticleDOI
UniProt: the Universal Protein knowledgebase
Rolf Apweiler,Amos Marc Bairoch,Cathy H. Wu,Winona C. Barker,Brigitte Boeckmann,Serenella Ferro,Elisabeth Gasteiger,Hongzhan Huang,Rodrigo Lopez,Michele Magrane,Maria Jesus Martin,Darren A. Natale,Claire O'Donovan,Nicole Redaschi,Lai-Su L. Yeh +14 more
TL;DR: The Swiss-Prot, TrEMBL and PIR protein database activities have united to form the Universal Protein Knowledgebase (UniProt), which is to provide a comprehensive, fully classified, richly and accurately annotated protein sequence knowledgebase, with extensive cross-references and query interfaces.