scispace - formally typeset
Search or ask a question
Author

Dominic P. Tolle

Bio: Dominic P. Tolle is an academic researcher from European Bioinformatics Institute. The author has contributed to research in topics: MEROPS & Modeling and simulation. The author has an hindex of 8, co-authored 8 publications receiving 3284 citations. Previous affiliations of Dominic P. Tolle include Wellcome Trust Sanger Institute & Max Delbrück Center for Molecular Medicine.

Papers
More filters
Journal ArticleDOI
TL;DR: The MEROPS database has added an analysis tool to the relevant species pages to show significant gains and losses of peptidase genes relative to related species, and has collected over 39 000 known cleavage sites in proteins, peptides and synthetic substrates.
Abstract: Peptidases (proteolytic enzymes) are of great relevance to biology, medicine and biotechnology. This practical importance creates a need for an integrated source of information about them, and also about their natural inhibitors. The MEROPS database (http://merops.sanger.ac.uk) aims to fill this need. The organizational principle of the database is a hierarchical classification in which homologous sets of the proteins of interest are grouped in families and the homologous families are grouped in clans. Each peptidase, family and clan has a unique identifier. The database has recently been expanded to include the protein inhibitors of peptidases, and these are classified in much the same way as the peptidases. Forms of information recently added include new links to other databases, summary alignments for peptidase clans, displays to show the distribution of peptidases and inhibitors among organisms, substrate cleavage sites and indexes for expressed sequence tag libraries containing peptidases. A new way of making hyperlinks to the database has been devised and a BlastP search of our library of peptidase and inhibitor sequences has been added.

2,406 citations

Journal ArticleDOI
TL;DR: A system wherein the inhibitor units of the peptidase inhibitors are assigned to 48 families on the basis of similarities detectable at the level of amino acid sequence, and a simple system of nomenclature is introduced for reference to each clan, family and inhibitor.
Abstract: The proteins that inhibit peptidases are of great importance in medicine and biotechnology, but there has never been a comprehensive system of classification for them. Some of the terminology currently in use is potentially confusing. In the hope of facilitating the exchange, storage and retrieval of information about this important group of proteins, we now describe a system wherein the inhibitor units of the peptidase inhibitors are assigned to 48 families on the basis of similarities detectable at the level of amino acid sequence. Then, on the basis of three-dimensional structures, 31 of the families are assigned to 26 clans. A simple system of nomenclature is introduced for reference to each clan, family and inhibitor. We briefly discuss the specificities and mechanisms of the interactions of the inhibitors in the various families with their target enzymes. The system of families and clans of inhibitors described has been implemented in the MEROPS peptidase database (http://merops.sanger.ac.uk/), and this will provide a mechanism for updating it as new information becomes available.

582 citations

Journal ArticleDOI
TL;DR: Following MIASE guidelines will improve the quality of scientific reporting, and will also allow collaborative, more distributed efforts in computational modeling and simulation of biological processes.
Abstract: Reproducibility of experiments is a basic requirement for science. Minimum Information (MI) guidelines have proved a helpful means of enabling reuse of existing work in modern biology. The Minimum Information Required in the Annotation of Models (MIRIAM) guidelines promote the exchange and reuse of biochemical computational models. However, information about a model alone is not sufficient to enable its efficient reuse in a computational setting. Advanced numerical algorithms and complex modeling workflows used in modern computational biology make reproduction of simulations difficult. It is therefore essential to define the core information necessary to perform simulations of those models. The Minimum Information About a Simulation Experiment (MIASE, Glossary in Box 1) describes the minimal set of information that must be provided to make the description of a simulation experiment available to others. It includes the list of models to use and their modifications, all the simulation procedures to apply and in which order, the processing of the raw numerical results, and the description of the final output. MIASE allows for the reproduction of any simulation experiment. The provision of this information, along with a set of required models, guarantees that the simulation experiment represents the intention of the original authors. Following MIASE guidelines will thus improve the quality of scientific reporting, and will also allow collaborative, more distributed efforts in computational modeling and simulation of biological processes.

149 citations

Journal ArticleDOI
TL;DR: The catalog of transcripts assembled in this study dramatically expands and refines planarian gene annotation, demonstrated by validation of several previously unknown transcripts with stem cell-dependent expression patterns, and could be applied to other organisms without genome assembly.
Abstract: Freshwater planaria are a very attractive model system for stem cell biology, tissue homeostasis, and regeneration. The genome of the planarian Schmidtea mediterranea has recently been sequenced and is estimated to contain >20,000 protein-encoding genes. However, the characterization of its transcriptome is far from complete. Furthermore, not a single proteome of the entire phylum has been assayed on a genome-wide level. We devised an efficient sequencing strategy that allowed us to de novo assemble a major fraction of the S. mediterranea transcriptome. We then used independent assays and massive shotgun proteomics to validate the authenticity of transcripts. In total, our de novo assembly yielded 18,619 candidate transcripts with a mean length of 1118 nt after filtering. A total of 17,564 candidate transcripts could be mapped to 15,284 distinct loci on the current genome reference sequence. RACE confirmed complete or almost complete 5' and 3' ends for 22/24 transcripts. The frequencies of frame shifts, fusion, and fission events in the assembled transcripts were computationally estimated to be 4.2%-13%, 0%-3.7%, and 2.6%, respectively. Our shotgun proteomics produced 16,135 distinct peptides that validated 4200 transcripts (FDR ≤1%). The catalog of transcripts assembled in this study, together with the identified peptides, dramatically expands and refines planarian gene annotation, demonstrated by validation of several previously unknown transcripts with stem cell-dependent expression patterns. In addition, our robust transcriptome characterization pipeline could be applied to other organisms without genome assembly. All of our data, including homology annotation, are freely available at SmedGD, the S. mediterranea genome database.

118 citations

Journal ArticleDOI
TL;DR: This article describes how the first comprehensive system for the classification of peptidases, which included a set of simple names for the families, has developed since then, and provides the structure around which the MEROPS protease database is built.
Abstract: The enzymes that hydrolyse peptide bonds, called peptidases or proteases, are very important to mankind and are also very numerous. The many scientists working on these enzymes are rapidly acquiring new data, and they need good methods to store it and retrieve it. The storage and retrieval require effective systems of classification and nomenclature, and it is the design and implementation of these that we mean by 'managing' peptidases. Ten years ago Rawlings and Barrett proposed the first comprehensive system for the classification of peptidases, which included a set of simple names for the families. In the present article we describe how the system has developed since then. The peptidase classification has now been adopted for use by many other databases, and provides the structure around which the MEROPS protease database (http://merops.sanger.ac.uk) is built.

50 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: Improvement in accuracy was generally observed for most methods, but remarkably large for the new options of MAFFT proposed here, which showed higher accuracy than currently available methods including TCoffee version 2 and CLUSTAL W in benchmark tests consisting of alignments of >50 sequences.
Abstract: The accuracy of multiple sequence alignment program MAFFT has been improved. The new version (5.3) of MAFFT offers new iterative refinement options, H-INS-i, F-INS-i and G-INS-i, in which pairwise alignment information are incorporated into objective function. These new options of MAFFT showed higher accuracy than currently available methods including TCoffee version 2 and CLUSTAL W in benchmark tests consisting of alignments of >50 sequences. Like the previously available options, the new options of MAFFT can handle hundreds of sequences on a standard desktop computer. We also examined the effect of the number of homologues included in an alignment. For a multiple alignment consisting of ∼8 sequences with low similarity, the accuracy was improved (2–10 percentage points) when the sequences were aligned together with dozens of their close homologues (E-value < 10−5–10−20) collected from a database. Such improvement was generally observed for most methods, but remarkably large for the new options of MAFFT proposed here. Thus, we made a Ruby script, mafftE.rb, which aligns the input sequences together with their close homologues collected from SwissProt using NCBI-BLAST.

4,528 citations

Journal ArticleDOI
TL;DR: Recent studies in mice and flies point to essential roles of MMPs as mediators of change and physical adaptation in tissues, whether developmentally regulated, environmentally induced or disease associated.
Abstract: Matrix metalloproteinases (MMPs) were discovered because of their role in amphibian metamorphosis, yet they have attracted more attention because of their roles in disease. Despite intensive scrutiny in vitro, in cell culture and in animal models, the normal physiological roles of these extracellular proteases have been elusive. Recent studies in mice and flies point to essential roles of MMPs as mediators of change and physical adaptation in tissues, whether developmentally regulated, environmentally induced or disease associated.

2,634 citations

Journal ArticleDOI
TL;DR: The MEROPS database has added an analysis tool to the relevant species pages to show significant gains and losses of peptidase genes relative to related species, and has collected over 39 000 known cleavage sites in proteins, peptides and synthetic substrates.
Abstract: Peptidases (proteolytic enzymes) are of great relevance to biology, medicine and biotechnology. This practical importance creates a need for an integrated source of information about them, and also about their natural inhibitors. The MEROPS database (http://merops.sanger.ac.uk) aims to fill this need. The organizational principle of the database is a hierarchical classification in which homologous sets of the proteins of interest are grouped in families and the homologous families are grouped in clans. Each peptidase, family and clan has a unique identifier. The database has recently been expanded to include the protein inhibitors of peptidases, and these are classified in much the same way as the peptidases. Forms of information recently added include new links to other databases, summary alignments for peptidase clans, displays to show the distribution of peptidases and inhibitors among organisms, substrate cleavage sites and indexes for expressed sequence tag libraries containing peptidases. A new way of making hyperlinks to the database has been devised and a BlastP search of our library of peptidase and inhibitor sequences has been added.

2,406 citations

Journal ArticleDOI
TL;DR: The InterPro database integrates together predictive models or ‘signatures’ representing protein domains, families and functional sites from multiple, diverse source databases: Gene3D, PANTHER, Pfam, PIRSF, PRINTS, ProDom, PROSITE, SMART, SUPERFAMILY and TIGRFAMs.
Abstract: The InterPro database (http://www.ebi.ac.uk/interpro/) integrates together predictive models or 'signatures' representing protein domains, families and functional sites from multiple, diverse source databases: Gene3D, PANTHER, Pfam, PIRSF, PRINTS, ProDom, PROSITE, SMART, SUPERFAMILY and TIGRFAMs. Integration is performed manually and approximately half of the total approximately 58,000 signatures available in the source databases belong to an InterPro entry. Recently, we have started to also display the remaining un-integrated signatures via our web interface. Other developments include the provision of non-signature data, such as structural data, in new XML files on our FTP site, as well as the inclusion of matchless UniProtKB proteins in the existing match XML files. The web interface has been extended and now links out to the ADAN predicted protein-protein interaction database and the SPICE and Dasty viewers. The latest public release (v18.0) covers 79.8% of UniProtKB (v14.1) and consists of 16 549 entries. InterPro data may be accessed either via the web address above, via web services, by downloading files by anonymous FTP or by using the InterProScan search software (http://www.ebi.ac.uk/Tools/InterProScan/).

1,834 citations

Journal ArticleDOI
20 May 2011-Science
TL;DR: The value of characterizing vertebrate gut microbiomes to understand host evolutionary histories at a supraorganismal level is illustrated by shotgun sequencing of microbial community DNA and targeted sequencing of bacterial 16S ribosomal RNA genes.
Abstract: Coevolution of mammals and their gut microbiota has profoundly affected their radiation into myriad habitats. We used shotgun sequencing of microbial community DNA and targeted sequencing of bacterial 16S ribosomal RNA genes to gain an understanding of how microbial communities adapt to extremes of diet. We sampled fecal DNA from 33 mammalian species and 18 humans who kept detailed diet records, and we found that the adaptation of the microbiota to diet is similar across different mammalian lineages. Functional repertoires of microbiome genes, such as those encoding carbohydrate-active enzymes and proteases, can be predicted from bacterial species assemblages. These results illustrate the value of characterizing vertebrate gut microbiomes to understand host evolutionary histories at a supraorganismal level.

1,585 citations