scispace - formally typeset
Search or ask a question
Author

Toby J. Gibson

Bio: Toby J. Gibson is an academic researcher from European Bioinformatics Institute. The author has contributed to research in topics: Short linear motif & Eukaryotic Linear Motif resource. The author has an hindex of 78, co-authored 171 publications receiving 167371 citations. Previous affiliations of Toby J. Gibson include University of Rome Tor Vergata & University College Dublin.


Papers
More filters
Journal ArticleDOI
TL;DR: This study may be considered the first real-time analysis of Alu DNA element transcripts with regard to editing of the respective Alu transcripts in human blood cells.
Abstract: Editing of RNA molecules gained major interest when coding mRNA was analyzed. A small, noncoding, Alu DNA element transcript that may act as regulatory RNA in cells was examined in this study. Alu DNA element transcription was determined in buffy coat from healthy humans and human sporadic Creutzfeldt–Jakob disease (sCJD) cases. In addition, non-sCJD controls, mostly dementia cases and Alzheimer's disease (AD) cases, were included. The Alu cDNA sequences were aligned to genomic Alu DNA elements by database search. A comparison of best aligned Alu DNA sequences with our RNA/cDNA clones revealed editing by deamination by ADAR (adenosine deaminase acting on RNA) and APOBEC (apolipoprotein B editing complex). Nucleotide exchanges like a G instead of an A or a T instead of a C in our cDNA sequences versus genomic Alu DNA pointed to recent mutations. To confirm this, our Alu cDNA sequences were aligned not only to genomic human Alu DNA but also to the respective genomic DNA of the chimpanzee and rhesus. Enhance...

4 citations

Journal ArticleDOI
31 Jan 2017
TL;DR: A method that incorporates the most recent genome annotations into the annotation of the microarray probe sets, using tools from the next generation sequencing, allows to quickly build project specific gene annotation models, as well as for comparison of microarray to RNAseq data.
Abstract: Genome-wide expression profiling and genotyping is widely applied in functional genomics research, ranging from stem cell studies to cancer, in drug response studies, and in clinical diagnostics. The Affymetrix GeneChip microarrays represent the most popular platform for such assays. Nevertheless, due to rapid and continuous improvement of the knowledge about the genome, the definition of many of the genes and transcripts change, and new genes are discovered. Thus the original probe information is out-dated for a number of Affymetrix platforms, and needs to be re-defined. It has been demonstrated, that accurate probe set definition improves both coverage of the gene expression analysis and its statistical power. Therefore we developed a method that incorporates the most recent genome annotations into the annotation of the microarray probe sets, using tools from the next generation sequencing. Additionally our method allows to quickly build project specific gene annotation models, as well as for comparison of microarray to RNAseq data.

4 citations

Journal ArticleDOI
01 Jan 2020-Database
TL;DR: A set of tools and a web resource, ‘articles.ELM’, to rapidly identify the motif literature articles pertinent to a researcher’s interest, thereby improving the visibility of motif literature and simplifying the recovery of valuable biological insights sequestered within scientific articles.
Abstract: Modern biology produces data at a staggering rate. Yet, much of these biological data is still isolated in the text, figures, tables and supplementary materials of articles. As a result, biological information created at great expense is significantly underutilised. The protein motif biology field does not have sufficient resources to curate the corpus of motif-related literature and, to date, only a fraction of the available articles have been curated. In this study, we develop a set of tools and a web resource, 'articles.ELM', to rapidly identify the motif literature articles pertinent to a researcher's interest. At the core of the resource is a manually curated set of about 8000 motif-related articles. These articles are automatically annotated with a range of relevant biological data allowing in-depth search functionality. Machine-learning article classification is used to group articles based on their similarity to manually curated motif classes in the Eukaryotic Linear Motif resource. Articles can also be manually classified within the resource. The 'articles.ELM' resource permits the rapid and accurate discovery of relevant motif articles thereby improving the visibility of motif literature and simplifying the recovery of valuable biological insights sequestered within scientific articles. Consequently, this web resource removes a critical bottleneck in scientific productivity for the motif biology field. Database URL: http://slim.icr.ac.uk/articles/.

3 citations

Journal ArticleDOI
TL;DR: The Eukaryotic Linear Motif Resource (ELM) as discussed by the authors is a bioinformatics facility for investigating candidate short functional motifs in eukaryactic proteins, which contains more than 140 motifs and their regular expressions patterns.
Abstract: Linear motifs are short and evolutionarily variable sequence patterns associated with particular functions often involving post-translational modifications, such as phosphorylation, acetylation, glycosylation, targeting signals for cellular compartments, protein cleavage sites and protein–protein interaction.Experimentally they are often neglected because their short length (4-10 residues long), and the fact that they often reside in disordered regions in proteins makes them difficult to detect. For a similar reason, using the sole regular expression to detect linear motifs matches in sequences has almost no predictive power because they are both statistically insignificant and prone to massive over-prediction.The Eukaryotic Linear Motif resource (ELM - "http://elm.eu.org":http://elm.eu.org) is a bioinformatics facility for investigating candidate short functional motifs in eukaryotic proteins. The ELM database to date has collected more than 140 motifs and their regular expressions patterns as well as information about their instances of occurrence, distribution, crystal structure, publications, etc.In order to reduce the over-prediction inherent to pattern matching against protein sequences and to discriminate true from false positive motif matches, context-based rules and logical filters are applied. The current version includes cell compartment, phylogeny, globular domain clash filters and the more recent structural filter based on known three-dimensional information that relies on structural information, such as residue solvent accessibility and secondary structure features. This implies that a candidate motif can be excluded from further consideration if the protein resides in the wrong cellular compartment or the motif is buried in the core of a globular domain. By considering additional types of context information, we expect that prediction of functional sites by ELM can be considerably improved. In cases where the user cannot provide relevant context information, we consider providing predictions of contextual information in order to improve the ELM performance. For example, since the ELM motif database has been annotated with biological process GO terms, the system could be prepared for addition of a new context filter using biological process.

3 citations

Reference EntryDOI
15 Mar 2008
TL;DR: This article presents a meta-modular model of protein architecture and its applications to nonglobular domains, and some of the methods for finding protein disorder and its implications are described.
Abstract: Originally published in: Modular Protein Domains. Edited by Giovanni Cesareni, Mario Gimona, Marius Sudol and Michael Yaffe. Copyright © 2005 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30813-2 The sections in this article are Introduction Protein Architecture: Sequence, Structure, and Function The Modular Model of Protein Function Partitioning of Protein Space Analyzing Globular Domains Globularity of Domains Resources for Analysis of Globular Domains SMART: Simple Modular Architecture Research Tool The SMART Alignment Set SMART Relational Database System Web Interface Application of SMART Other Features and Resources Globular Repeats Domain Interaction Prediction No Domains? Analyzing Nonglobular Protein Segments Unstructured Regions: Protein Disorder What Role Does Protein Disorder Play in Biology? What is Protein Disorder? Methods for Finding Protein Disorder GlobPlotting Prediction of Multiple Types of Disorder with DisEMBL Design of Protein Expression Vectors Function Prediction for Nonglobular Protein Segments Available Resources The Eukaryotic Linear Motif Resource: ELM ELM Annotation – ‘Site seeing’ ELM Resource Architecture Knowledge-based Decision Support (KBDS): ELM Filtering Using ELM URLs Conclusions Acknowledgements Keywords: modular protein domains; computational analysis; protein architecture; sequence; structure; function; analyzing globular domains; SMART: Simple Modular Architecture Research Tool; analyzing nonglobular domains; URLs

3 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original.
Abstract: The BLAST programs are widely used tools for searching protein and DNA databases for sequence similarities. For protein comparisons, a variety of definitional, algorithmic and statistical refinements described here permits the execution time of the BLAST programs to be decreased substantially while enhancing their sensitivity to weak similarities. A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original. In addition, a method is introduced for automatically combining statistically significant alignments produced by BLAST into a position-specific score matrix, and searching the database using this matrix. The resulting Position-Specific Iterated BLAST (PSIBLAST) program runs at approximately the same speed per iteration as gapped BLAST, but in many cases is much more sensitive to weak but biologically relevant sequence similarities. PSI-BLAST is used to uncover several new and interesting members of the BRCT superfamily.

70,111 citations

Journal ArticleDOI
TL;DR: The sensitivity of the commonly used progressive multiple sequence alignment method has been greatly improved and modifications are incorporated into a new program, CLUSTAL W, which is freely available.
Abstract: The sensitivity of the commonly used progressive multiple sequence alignment method has been greatly improved for the alignment of divergent protein sequences. Firstly, individual weights are assigned to each sequence in a partial alignment in order to down-weight near-duplicate sequences and up-weight the most divergent ones. Secondly, amino acid substitution matrices are varied at different alignment stages according to the divergence of the sequences to be aligned. Thirdly, residue-specific gap penalties and locally reduced gap penalties in hydrophilic regions encourage new gaps in potential loop regions rather than regular secondary structure. Fourthly, positions in early alignments where gaps have been opened receive locally reduced gap penalties to encourage the opening up of new gaps at these positions. These modifications are incorporated into a new program, CLUSTAL W which is freely available.

63,427 citations

Journal ArticleDOI
TL;DR: ClUSTAL X is a new windows interface for the widely-used progressive multiple sequence alignment program CLUSTAL W, providing an integrated system for performing multiple sequence and profile alignments and analysing the results.
Abstract: CLUSTAL X is a new windows interface for the widely-used progressive multiple sequence alignment program CLUSTAL W. The new system is easy to use, providing an integrated system for performing multiple sequence and profile alignments and analysing the results. CLUSTAL X displays the sequence alignment in a window on the screen. A versatile sequence colouring scheme allows the user to highlight conserved features in the alignment. Pull-down menus provide all the options required for traditional multiple sequence and profile alignment. New features include: the ability to cut-and-paste sequences to change the order of the alignment, selection of a subset of the sequences to be realigned, and selection of a sub-range of the alignment to be realigned and inserted back into the original alignment. Alignment quality analysis can be performed and low-scoring segments or exceptional residues can be highlighted. Quality analysis and realignment of selected residue ranges provide the user with a powerful tool to improve and refine difficult alignments and to trap errors in input sequences. CLUSTAL X has been compiled on SUN Solaris, IRIX5.3 on Silicon Graphics, Digital UNIX on DECstations, Microsoft Windows (32 bit) for PCs, Linux ELF for x86 PCs, and Macintosh PowerMac.

38,522 citations

Journal ArticleDOI
TL;DR: MUSCLE is a new computer program for creating multiple alignments of protein sequences that includes fast distance estimation using kmer counting, progressive alignment using a new profile function the authors call the log-expectation score, and refinement using tree-dependent restricted partitioning.
Abstract: We describe MUSCLE, a new computer program for creating multiple alignments of protein sequences. Elements of the algorithm include fast distance estimation using kmer counting, progressive alignment using a new profile function we call the logexpectation score, and refinement using treedependent restricted partitioning. The speed and accuracy of MUSCLE are compared with T-Coffee, MAFFT and CLUSTALW on four test sets of reference alignments: BAliBASE, SABmark, SMART and a new benchmark, PREFAB. MUSCLE achieves the highest, or joint highest, rank in accuracy on each of these sets. Without refinement, MUSCLE achieves average accuracy statistically indistinguishable from T-Coffee and MAFFT, and is the fastest of the tested methods for large numbers of sequences, aligning 5000 sequences of average length 350 in 7 min on a current desktop computer. The MUSCLE program, source code and PREFAB test data are freely available at http://www.drive5. com/muscle.

37,524 citations

Journal ArticleDOI
TL;DR: Two unusual extensions are presented: Multiscale, which adds the ability to visualize large‐scale molecular assemblies such as viral coats, and Collaboratory, which allows researchers to share a Chimera session interactively despite being at separate locales.
Abstract: The design, implementation, and capabilities of an extensible visualization system, UCSF Chimera, are discussed. Chimera is segmented into a core that provides basic services and visualization, and extensions that provide most higher level functionality. This architecture ensures that the extension mechanism satisfies the demands of outside developers who wish to incorporate new features. Two unusual extensions are presented: Multiscale, which adds the ability to visualize large-scale molecular assemblies such as viral coats, and Collaboratory, which allows researchers to share a Chimera session interactively despite being at separate locales. Other extensions include Multalign Viewer, for showing multiple sequence alignments and associated structures; ViewDock, for screening docked ligand orientations; Movie, for replaying molecular dynamics trajectories; and Volume Viewer, for display and analysis of volumetric data. A discussion of the usage of Chimera in real-world situations is given, along with anticipated future directions. Chimera includes full user documentation, is free to academic and nonprofit users, and is available for Microsoft Windows, Linux, Apple Mac OS X, SGI IRIX, and HP Tru64 Unix from http://www.cgl.ucsf.edu/chimera/.

35,698 citations