Characterization and identification of protein O-GlcNAcylation sites with substrate specificity

doi:10.1186/1471-2105-15-S16-S1

Open AccessJournal ArticleDOI

Characterization and identification of protein O-GlcNAcylation sites with substrate specificity

- Vol. 15, Iss: 16, pp 1-12

TLDR

A computational method to identify informative substrate motifs for O-GlcNAcylation sites with the consideration of substrate site specificity is proposed and may help unravel their mechanisms and roles in signaling, transcription, chronic disease, and cancer.

Abstract:

Protein O-GlcNAcylation, involving the attachment of single N-acetylglucosamine (GlcNAc) to the hydroxyl group of serine or threonine residues. Elucidation of O-GlcNAcylation sites on proteins is required in order to decipher its crucial roles in regulating cellular processes and aid in drug design. With an increasing number of O-GlcNAcylation sites identified by mass spectrometry (MS)-based proteomics, several methods have been proposed for the computational identification of O-GlcNAcylation sites. However, no development that focuses on the investigation of O-GlcNAcylated substrate motifs has existed. Thus, we were motivated to design a new method for the identification of protein O-GlcNAcylation sites with the consideration of substrate site specificity. In this study, 375 experimentally verified O-GlcNAcylation sites were collected from dbOGAP, which is an integrated resource for protein O-GlcNAcylation. Due to the difficulty in characterizing the substrate motifs by conventional sequence logo analysis, a recursively statistical method has been applied to obtain significant conserved motifs. To construct the predictive models learned from the identified substrate motifs, we adopted Support Vector Machines (SVMs). A five-fold cross validation was used to evaluate the predictive model, achieving sensitivity, specificity, and accuracy of 0.76, 0.80, and 0.78, respectively. Additionally, an independent testing set, which was really blind to the training data of predictive model, was used to demonstrate that the proposed method could provide a promising accuracy (0.94) and outperform three other O-GlcNAcylation site prediction tools. This work proposed a computational method to identify informative substrate motifs for O-GlcNAcylation sites. The evaluation of cross validation and independent testing indicated that the identified motifs were effective in the identification of O-GlcNAcylation sites. A case study demonstrated that the proposed method could be a feasible means of conducting preliminary analyses of protein O-GlcNAcylation. We also anticipated that the revealed substrate motif may facilitate the study of extensive crosstalk between O-GlcNAcylation and phosphorylation. This method may help unravel their mechanisms and roles in signaling, transcription, chronic disease, and cancer.

Characterization and identification of protein O-GlcNAcylation sites with substrate specificity

Citations

dbPTM 2016: 10-year anniversary of a resource for post-translational modification of proteins

dbPTM in 2019: exploring disease association and cross-talk of post-translational modifications.

Sunbeam: an extensible pipeline for analyzing metagenomic sequencing experiments.

O-GlcNAcPRED-II: an integrated classification algorithm for identifying O-GlcNAcylation sites based on fuzzy undersampling and a K-means PCA oversampling technique.

An improved discriminative filter bank selection approach for motor imagery EEG signal classification using mutual information

References

Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.

LIBSVM: A library for support vector machines

The Protein Data Bank

WebLogo: A Sequence Logo Generator

UniProt: the Universal Protein knowledgebase

Related Papers (5)

dbPTM: an information repository of protein post-translational modification

SNOSite: exploiting maximal dependence decomposition to identify cysteine S-nitrosylation with substrate site specificity.

dbPTM 3.0: an informative resource for investigating substrate site specificity and functional association of protein post-translational modifications

KinasePhos 2.0: a web server for identifying protein kinase-specific phosphorylation sites based on sequences and coupling patterns.

KinasePhos: a web tool for identifying protein kinase-specific phosphorylation sites