Showing papers by "Toby J. Gibson published in 2008"
••
TL;DR: The current state of linear motif biology is summarized, which uses low affinity interactions to create cooperative, combinatorial and highly dynamic regulatory protein complexes, which suggest that models for cell regulatory networks in systems biology should neither be overly dependent on stochastic nor on smooth deterministic approximations.
Abstract: It is now clear that a detailed picture of cell regulation requires a comprehensive understanding of the abundant short protein motifs through which signaling is channeled. The current body of knowledge has slowly accumulated through piecemeal experimental investigation of individual motifs in signaling. Computational methods contributed little to this process. A new generation of bioinformatics tools will aid the future investigation of motifs in regulatory proteins, and the disordered polypeptide regions in which they frequently reside. Allied to high throughput methods such as phosphoproteomics, signaling networks are becoming amenable to experimental deconstruction. In this review, we summarise the current state of linear motif biology, which uses low affinity interactions to create cooperative, combinatorial and highly dynamic regulatory protein complexes. The discrete deterministic properties implicit to these assemblies suggest that models for cell regulatory networks in systems biology should neither be overly dependent on stochastic nor on smooth deterministic approximations.
317 citations
••
TL;DR: The localization, structure, and binding specificity of this protein, which is named malectin, open the way to studies of its role in the genesis, processing and secretion of N-glycosylated proteins.
Abstract: N-Glycosylation starts in the endoplasmic reticulum (ER) where a 14-sugar glycan composed of three glucoses, nine mannoses, and two N-acetylglucosamines (Glc(3)Man(9)GlcNAc(2)) is transferred to nascent proteins. The glucoses are sequentially trimmed by ER-resident glucosidases. The Glc(3)Man(9)GlcNAc(2) moiety is the substrate for oligosaccharyltransferase; the Glc(1)Man(9)GlcNAc(2) and Man(9)GlcNAc(2) intermediates are signals for glycoprotein folding and quality control in the calnexin/calreticulin cycle. Here, we report a novel membrane-anchored ER protein that is highly conserved in animals and that recognizes the Glc(2)-N-glycan. Structure determination by nuclear magnetic resonance showed that its luminal part is a carbohydrate binding domain that recognizes glucose oligomers. Carbohydrate microarray analyses revealed a uniquely selective binding to a Glc(2)-N-glycan probe. The localization, structure, and binding specificity of this protein, which we have named malectin, open the way to studies of its role in the genesis, processing and secretion of N-glycosylated proteins.
232 citations
••
TL;DR: The new, greater focus on proteins that are in some way normally unstructured promises to provide a greater understanding of protein function, particularly with respect to protein–protein interactions.
79 citations
••
TL;DR: In this paper, a novel membrane-anchored endoplasmic reticulum (ER) protein named malectin was reported, which is highly conserved in animals and that recognizes the Glc(2)-N-glycan.
Abstract: N-Glycosylation starts in the endoplasmic reticulum (ER) where a 14-sugar glycan composed of three glucoses, nine mannoses, and two N-acetylglucosamines (Glc(3)Man(9)GlcNAc(2)) is transferred to nascent proteins. The glucoses are sequentially trimmed by ER-resident glucosidases. The Glc(3)Man(9)GlcNAc(2) moiety is the substrate for oligosaccharyltransferase; the Glc(1)Man(9)GlcNAc(2) and Man(9)GlcNAc(2) intermediates are signals for glycoprotein folding and quality control in the calnexin/calreticulin cycle. Here, we report a novel membrane-anchored ER protein that is highly conserved in animals and that recognizes the Glc(2)-N-glycan. Structure determination by nuclear magnetic resonance showed that its luminal part is a carbohydrate binding domain that recognizes glucose oligomers. Carbohydrate microarray analyses revealed a uniquely selective binding to a Glc(2)-N-glycan probe. The localization, structure, and binding specificity of this protein, which we have named malectin, open the way to studies of its role in the genesis, processing and secretion of N-glycosylated proteins.
51 citations
••
TL;DR: The conservation score improves the prediction of linear motifs, by discarding those matches that are unlikely to be functional because they have not been conserved during the evolution of the protein sequences.
Abstract: The structure of many eukaryotic cell regulatory proteins is highly modular. They are assembled from globular domains, segments of natively disordered polypeptides and short linear motifs. The latter are involved in protein interactions and formation of regulatory complexes. The function of such proteins, which may be difficult to define, is the aggregate of the subfunctions of the modules. It is therefore desirable to efficiently predict linear motifs with some degree of accuracy, yet sequence database searches return results that are not significant.
50 citations
••
TL;DR: KEN-box enrichment with cell cycle Gene Ontology terms suggests that collectively these motifs are functional but does not prove that any given instance is so, and suggests that KEN-boxes might be more common than reported.
Abstract: Motivation: KEN-box-mediated target selection is one of the mechanisms used in the proteasomal destruction of mitotic cell cycle proteins via the APC/C complex. While annotating the Eukaryotic Linear Motif resource (ELM, http://elm.eu.org/), we found that KEN motifs were significantly enriched in human protein entries with cell cycle keywords in the UniProt/Swiss-Prot database—implying that KEN-boxes might be more common than reported.
Results: Matches to short linear motifs in protein database searches are not, per se, significant. KEN-box enrichment with cell cycle Gene Ontology terms suggests that collectively these motifs are functional but does not prove that any given instance is so. Candidates were surveyed for native disorder prediction using GlobPlot and IUPred and for motif conservation in homologues. Among >25 strong new candidates, the most notable are human HIPK2, CHFR, CDC27, Dab2, Upf2, kinesin Eg5, DNA Topoisomerase 1 and yeast Cdc5 and Swi5. A similar number of weaker candidates were present. These proteins have yet to be tested for APC/C targeted destruction, providing potential new avenues of research.
Contact: toby.gibson@embl.de
Supplementary information: Tables of KEN-box candidates and keyword/conservation significance assessments are available as supplementary data at Bioinformatics online.
41 citations
••
TL;DR: None of the programs currently available is capable of reliably aligning LMs in distantly related sequences and a number of specific problems are highlighted.
Abstract: Linear motifs (LMs) are abundant short regulatory sites used for modulating the functions of many eukaryotic proteins. They play important roles in post-translational modification, cell compartment targeting, docking sites for regulatory complex assembly and protein processing and cleavage. Methods for LM detection are now being developed that are strongly dependent on scores for motif conservation in homologous proteins. However, most LMs are found in natively disordered polypeptide segments that evolve rapidly, unhindered by structural constraints on the sequence. These regions of modular proteins are difficult to align using classical multiple sequence alignment programs that are specifically optimised to align the globular domains. As a consequence, poor motif alignment quality is hindering efforts to detect new LMs. We have developed a new benchmark, as part of the BAliBASE suite, designed to assess the ability of standard multiple alignment methods to detect and align LMs. The reference alignments are organised into different test sets representing real alignment problems and contain examples of experimentally verified functional motifs, extracted from the Eukaryotic Linear Motif (ELM) database. The benchmark has been used to evaluate and compare a number of multiple alignment programs. With distantly related proteins, the worst alignment program correctly aligns 48% of LMs compared to 73% for the best program. However, the performance of all the programs is adversely affected by the introduction of other sequences containing false positive motifs. The ranking of the alignment programs based on LM alignment quality is similar to that observed when considering full-length protein alignments, however little correlation was observed between LM and overall alignment quality for individual alignment test cases. We have shown that none of the programs currently available is capable of reliably aligning LMs in distantly related sequences and we have highlighted a number of specific problems. The results of the tests suggest possible ways to improve program accuracy for difficult, divergent sequences.
32 citations
••
15 Mar 2008TL;DR: This article presents a meta-modular model of protein architecture and its applications to nonglobular domains, and some of the methods for finding protein disorder and its implications are described.
Abstract: Originally published in: Modular Protein Domains. Edited by Giovanni Cesareni, Mario Gimona, Marius Sudol and Michael Yaffe. Copyright © 2005 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30813-2
The sections in this article are
Introduction
Protein Architecture: Sequence, Structure, and Function
The Modular Model of Protein Function
Partitioning of Protein Space
Analyzing Globular Domains
Globularity of Domains
Resources for Analysis of Globular Domains
SMART: Simple Modular Architecture Research Tool
The SMART Alignment Set
SMART Relational Database System
Web Interface
Application of SMART
Other Features and Resources
Globular Repeats
Domain Interaction Prediction
No Domains?
Analyzing Nonglobular Protein Segments
Unstructured Regions: Protein Disorder
What Role Does Protein Disorder Play in Biology?
What is Protein Disorder?
Methods for Finding Protein Disorder
GlobPlotting
Prediction of Multiple Types of Disorder with DisEMBL
Design of Protein Expression Vectors
Function Prediction for Nonglobular Protein Segments
Available Resources
The Eukaryotic Linear Motif Resource: ELM
ELM Annotation – ‘Site seeing’
ELM Resource Architecture
Knowledge-based Decision Support (KBDS): ELM Filtering
Using ELM
URLs
Conclusions
Acknowledgements
Keywords:
modular protein domains;
computational analysis;
protein architecture;
sequence;
structure;
function;
analyzing globular domains;
SMART: Simple Modular Architecture Research Tool;
analyzing nonglobular domains;
URLs
3 citations