scispace - formally typeset
Search or ask a question
Journal ArticleDOI

REBASE—a database for DNA restriction and modification: enzymes, genes and genomes

01 Jan 2010-Nucleic Acids Research (Oxford University Press)-Vol. 38, pp 234-236
TL;DR: REBASE is a comprehensive and fully curated database of information about the components of restriction-modification (RM) systems that contains fully referenced information about recognition and cleavage sites for both restriction enzymes and methyltransferases as well as commercial availability, methylation sensitivity, crystal and sequence data.
Abstract: REBASE is a comprehensive and fully curated database of information about the components of restriction-modification (RM) systems. It contains fully referenced information about recognition and cleavage sites for both restriction enzymes and methyltransferases as well as commercial availability, methylation sensitivity, crystal and sequence data. All genomes that are completely sequenced are analyzed for RM system components, and with the advent of PacBio sequencing, the recognition sequences of DNA methyltransferases (MTases) are appearing rapidly. Thus, Type I and Type III systems can now be characterized in terms of recognition specificity merely by DNA sequencing. The contents of REBASE may be browsed from the web http://rebase.neb.com and selected compilations can be downloaded by FTP (ftp.neb.com). Monthly updates are also available via email.
Citations
More filters
Journal ArticleDOI
TL;DR: This article illustrates how Tablet-a high-performance graphical viewer for visualization of 2GS assemblies and read mappings-plays an important role in the analysis of these data and demonstrates its value in quality assurance and scientific discovery.
Abstract: The advent of second-generation sequencing (2GS) has provided a range of significant new challenges for the visualization of sequence assemblies. These include the large volume of data being generated, short-read lengths and different data types and data formats associated with the diversity of new sequencing technologies. This article illustrates how Tabletca high-performance graphical viewer for visualization of 2GS assemblies and read mappingscplays an important role in the analysis of these data. We present Tablet, and through a selection of use cases, demonstrate its value in quality assurance and scientific discovery, through features such as whole-reference coverage overviews, variant highlighting, paired-end read mark-up, GFF3-based feature tracks and protein translations. We discuss the computing and visualization techniques utilized to provide a rich and responsive graphical environment that enables users to view a range of file formats with ease. Tablet installers can be freely downloaded from http://bioinf.hutton.ac.uk/tablet in 32 or 64 -bit versions for Windows, OS X, Linux or Solaris. For further details on the Tablet, contact tablet@hutton.ac.uk.

768 citations


Cites methods from "REBASE—a database for DNA restricti..."

  • ...Beyond GFF3, we intend to include support for automatic restriction site tagging and highlighting using data provided via REBASE [18]....

    [...]

Journal ArticleDOI
TL;DR: This review surveys nuclease activities with known structures and catalytic machinery and classify them by reaction mechanism and metal-ion dependence and by their biological function ranging from DNA replication, recombination, repair, RNA maturation, processing, interference, to defense, nutrient regeneration or cell death.
Abstract: Nucleases cleave the phosphodiester bonds of nucleic acids and may be endo or exo, DNase or RNase, topoisomerases, recombinases, ribozymes, or RNA splicing enzymes. In this review, I survey nuclease activities with known structures and catalytic machinery and classify them by reaction mechanism and metal-ion dependence and by their biological function ranging from DNA replication, recombination, repair, RNA maturation, processing, interference, to defense, nutrient regeneration or cell death. Several general principles emerge from this analysis. There is little correlation between catalytic mechanism and biological function. A single catalytic mechanism can be adapted in a variety of reactions and biological pathways. Conversely, a single biological process can often be accomplished by multiple tertiary and quaternary folds and by more than one catalytic mechanism. Two-metal-ion-dependent nucleases comprise the largest number of different tertiary folds and mediate the most diverse set of biological functions. Metal-ion-dependent cleavage is exclusively associated with exonucleases producing mononucleotides and endonucleases that cleave double- or single-stranded substrates in helical and base-stacked conformations. All metal-ion-independent RNases generate 2',3'-cyclic phosphate products, and all metal-ion-independent DNases form phospho-protein intermediates. I also find several previously unnoted relationships between different nucleases and shared catalytic configurations.

478 citations

Journal ArticleDOI
TL;DR: New developments which provide insights into the roles of these enzymes in other aspects of cellular function are dealt with, with emphasis placed on novel hypotheses and various findings that have not yet been dealt with in a critical review.
Abstract: Restriction-modification (R-M) systems are ubiquitous and are often considered primitive immune systems in bacteria. Their diversity and prevalence across the prokaryotic kingdom are an indication of their success as a defense mechanism against invading genomes. However, their cellular defense function does not adequately explain the basis for their immaculate specificity in sequence recognition and nonuniform distribution, ranging from none to too many, in diverse species. The present review deals with new developments which provide insights into the roles of these enzymes in other aspects of cellular function. In this review, emphasis is placed on novel hypotheses and various findings that have not yet been dealt with in a critical review. Emerging studies indicate their role in various cellular processes other than host defense, virulence, and even controlling the rate of evolution of the organism. We also discuss how R-M systems could have successfully evolved and be involved in additional cellular portfolios, thereby increasing the relative fitness of their hosts in the population.

464 citations


Cites background from "REBASE—a database for DNA restricti..."

  • ...large number of diverse R-M systems (21, 195)....

    [...]

  • ...has only reaffirmed their vast diversity in the prokaryotic kingdom (21)....

    [...]

  • ...To date, nearly 4,000 enzymes are known, with about 300 different specificities (21)....

    [...]

Journal ArticleDOI
TL;DR: The unique and common features of phage resistance mechanisms and their role in global biodiversity are discussed and the commonalities between defense mechanisms suggest avenues for the discovery of novel forms of these mechanisms based on their evolutionary traits.
Abstract: Bacteria, the most abundant organisms on the planet, are outnumbered by a factor of 10 to 1 by phages that infect them. Faced with the rapid evolution and turnover of phage particles, bacteria have evolved various mechanisms to evade phage infection and killing, leading to an evolutionary arms-race. The extensive co-evolution of both phage and host has resulted in considerable diversity on the part of both bacterial and phage defensive and offensive strategies. Here, we discuss the unique and common features of phage resistance mechanisms and their role in global biodiversity. The commonalities between defense mechanisms suggest avenues for the discovery of novel such mechanisms based on their evolutionary traits.

432 citations

Journal ArticleDOI
TL;DR: Individual sample replicates are used, under the expectation of identical genotypes, to quantify genotyping error in the absence of a reference genome and optimize de novo assembly parameters within the program Stacks, by minimizing error and maximizing the retrieval of informative loci.
Abstract: Restriction site-associated DNA sequencing (RADseq) provides researchers with the ability to record genetic polymorphism across thousands of loci for nonmodel organisms, potentially revolutionizing the field of molecular ecology. However, as with other genotyping methods, RADseq is prone to a number of sources of error that may have consequential effects for population genetic inferences, and these have received only limited attention in terms of the estimation and reporting of genotyping error rates. Here we use individual sample replicates, under the expectation of identical genotypes, to quantify genotyping error in the absence of a reference genome. We then use sample replicates to (i) optimize de novo assembly parameters within the program Stacks, by minimizing error and maximizing the retrieval of informative loci; and (ii) quantify error rates for loci, alleles and single-nucleotide polymorphisms. As an empirical example, we use a double-digest RAD data set of a nonmodel plant species, Berberis alpina, collected from high-altitude mountains in Mexico.

353 citations

References
More filters
Journal ArticleDOI
TL;DR: The definition and use of family-specific, manually curated gathering thresholds are explained and some of the features of domains of unknown function (also known as DUFs) are discussed, which constitute a rapidly growing class of families within Pfam.
Abstract: Pfam is a widely used database of protein families and domains. This article describes a set of major updates that we have implemented in the latest release (version 24.0). The most important change is that we now use HMMER3, the latest version of the popular profile hidden Markov model package. This software is approximately 100 times faster than HMMER2 and is more sensitive due to the routine use of the forward algorithm. The move to HMMER3 has necessitated numerous changes to Pfam that are described in detail. Pfam release 24.0 contains 11,912 families, of which a large number have been significantly updated during the past two years. Pfam is available via servers in the UK (http://pfam.sanger.ac.uk/), the USA (http://pfam.janelia.org/) and Sweden (http://pfam.sbc.su.se/).

14,075 citations


"REBASE—a database for DNA restricti..." refers background in this paper

  • ...Links to other major databases such as UniProt (5), PDB (6) and Pfam ( 7 ) are also maintained....

    [...]

Journal ArticleDOI
TL;DR: Pfam as discussed by the authors is a widely used database of protein families, containing 14 831 manually curated entries in the current version, version 27.0, and has been updated several times since 2012.
Abstract: Pfam, available via servers in the UK (http://pfam.sanger.ac.uk/) and the USA (http://pfam.janelia.org/), is a widely used database of protein families, containing 14 831 manually curated entries in the current release, version 27.0. Since the last update article 2 years ago, we have generated 1182 new families and maintained sequence coverage of the UniProt Knowledgebase (UniProtKB) at nearly 80%, despite a 50% increase in the size of the underlying sequence database. Since our 2012 article describing Pfam, we have also undertaken a comprehensive review of the features that are provided by Pfam over and above the basic family data. For each feature, we determined the relevance, computational burden, usage statistics and the functionality of the feature in a website context. As a consequence of this review, we have removed some features, enhanced others and developed new ones to meet the changing demands of computational biology. Here, we describe the changes to Pfam content. Notably, we now provide family alignments based on four different representative proteome sequence data sets and a new interactive DNA search interface. We also discuss the mapping between Pfam and known 3D structures.

9,415 citations


"REBASE—a database for DNA restricti..." refers background in this paper

  • ...Links to other major databases such as UniProt (5), PDB (6) and Pfam (7) are also maintained....

    [...]

Journal ArticleDOI
TL;DR: The National Center for Biotechnology Information Reference Sequence (RefSeq) database provides a non-redundant collection of sequences representing genomic data, transcripts and proteins that pragmatically includes sequence data that are currently publicly available in the archival databases.
Abstract: The National Center for Biotechnology Information (NCBI) Reference Sequence (RefSeq) database (http://www.ncbi.nlm.nih.gov/RefSeq/) provides a non-redundant collection of sequences representing genomic data, transcripts and proteins. Although the goal is to provide a comprehensive dataset representing the complete sequence information for any given species, the database pragmatically includes sequence data that are currently publicly available in the archival databases. The database incorporates data from over 2400 organisms and includes over one million proteins representing significant taxonomic diversity spanning prokaryotes, eukaryotes and viruses. Nucleotide and protein sequences are explicitly linked, and the sequences are linked to other resources including the NCBI Map Viewer and Gene. Sequences are annotated to include coding regions, conserved domains, variation, references, names, database cross-references, and other features using a combined approach of collaboration and other input from the scientific community, automated annotation, propagation from GenBank and curation by NCBI staff.

4,229 citations


"REBASE—a database for DNA restricti..." refers background in this paper

  • ...The previous description of REBASE in the 2007 NAR Database Issue (1) described 3805 biochemically or genetically characterized restriction–modification (R–M) systems and included an analysis of approximately 400 bacterial and archaeal genomes that had been deposited in the RefSeq Database of GenBank (2, 3 )....

    [...]

Journal ArticleDOI
02 Apr 2004-Science
TL;DR: Over 1.2 million previously unknown genes represented in these samples, including more than 782 new rhodopsin-like photoreceptors are identified, suggesting substantial oceanic microbial diversity.
Abstract: We have applied “whole-genome shotgun sequencing” to microbial populations collected en masse on tangential flow and impact filters from seawater samples collected from the Sargasso Sea near Bermuda. A total of 1.045 billion base pairs of nonredundant sequence was generated, annotated, and analyzed to elucidate the gene content, diversity, and relative abundance of the organisms within these environmental samples. These data are estimated to derive from at least 1800 genomic species based on sequence relatedness, including 148 previously unknown bacterial phylotypes. We have identified over 1.2 million previously unknown genes represented in these samples, including more than 782 new rhodopsin-like photoreceptors. Variation in species present and stoichiometry suggests substantial oceanic microbial diversity. Microorganisms are responsible for most of the biogeochemical cycles that shape the environment of Earth and its oceans. Yet, these organisms are the least well understood on Earth, as the ability to study and understand the metabolic potential of microorganisms has been hampered by the inability to generate pure cultures. Recent studies have begun to explore environ

4,210 citations


"REBASE—a database for DNA restricti..." refers background in this paper

  • ...The surge in 2004 represents the addition of metagenomic sequences from the Sargasso Sea collecting expedition ( 9 )....

    [...]

Journal ArticleDOI
TL;DR: During 2004, tens of thousands of Knowledgebase records got manually annotated or updated; the UniProt keyword list got augmented by additional keywords; the documentation of the keywords and are continuously overhauling and standardizing the annotation of post-translational modifications.
Abstract: The Universal Protein Resource (UniProt) provides the scientific community with a single, centralized, authoritative resource for protein sequences and functional information. Formed by uniting the Swiss-Prot, TrEMBL and PIR protein database activities, the UniProt consortium produces three layers of protein sequence databases: the UniProt Archive (UniParc), the UniProt Knowledgebase (UniProt) and the UniProt Reference (UniRef) databases. The UniProt Knowledgebase is a comprehensive, fully classified, richly and accurately annotated protein sequence knowledgebase with extensive cross-references. This centrepiece consists of two sections: UniProt/Swiss-Prot, with fully, manually curated entries; and UniProt/TrEMBL, enriched with automated classification and annotation. During 2004, tens of thousands of Knowledgebase records got manually annotated or updated; we introduced a new comment line topic: TOXIC DOSE to store information on the acute toxicity of a toxin; the UniProt keyword list got augmented by additional keywords; we improved the documentation of the keywords and are continuously overhauling and standardizing the annotation of post-translational modifications. Furthermore, we introduced a new documentation file of the strains and their synonyms. Many new database cross-references were introduced and we started to make use of Digital Object Identifiers. We also achieved in collaboration with the Macromolecular Structure Database group at EBI an improved integration with structural databases by residue level mapping of sequences from the Protein Data Bank entries onto corresponding UniProt entries. For convenient sequence searches we provide the UniRef non-redundant sequence databases. The comprehensive UniParc database stores the complete body of publicly available protein sequence data. The UniProt databases can be accessed online (http://www.uniprot.org) or downloaded in several formats (ftp://ftp.uniprot.org/pub). New releases are published every two weeks.

4,074 citations

Related Papers (5)