scispace - formally typeset
Search or ask a question

Showing papers in "Nucleic Acids Research in 2000"


Journal ArticleDOI
TL;DR: The goals of the PDB are described, the systems in place for data deposition and access, how to obtain further information and plans for the future development of the resource are described.
Abstract: The Protein Data Bank (PDB; http://www.rcsb.org/pdb/ ) is the single worldwide archive of structural data of biological macromolecules. This paper describes the goals of the PDB, the systems in place for data deposition and access, how to obtain further information, and near-term plans for the future development of the resource.

34,239 citations


Journal ArticleDOI
TL;DR: The definition and use of family-specific, manually curated gathering thresholds are explained and some of the features of domains of unknown function (also known as DUFs) are discussed, which constitute a rapidly growing class of families within Pfam.
Abstract: Pfam is a widely used database of protein families and domains. This article describes a set of major updates that we have implemented in the latest release (version 24.0). The most important change is that we now use HMMER3, the latest version of the popular profile hidden Markov model package. This software is approximately 100 times faster than HMMER2 and is more sensitive due to the routine use of the forward algorithm. The move to HMMER3 has necessitated numerous changes to Pfam that are described in detail. Pfam release 24.0 contains 11,912 families, of which a large number have been significantly updated during the past two years. Pfam is available via servers in the UK (http://pfam.sanger.ac.uk/), the USA (http://pfam.janelia.org/) and Sweden (http://pfam.sbc.su.se/).

14,075 citations


Journal ArticleDOI
TL;DR: A novel method that amplifies DNA with high specificity, efficiency and rapidity under isothermal conditions that employs a DNA polymerase and a set of four specially designed primers that recognize a total of six distinct sequences on the target DNA.
Abstract: We have developed a novel method, termed loop-mediated isothermal amplification (LAMP), that amplifies DNA with high specificity, efficiency and rapidity under isothermal conditions. This method employs a DNA polymerase and a set of four specially designed primers that recognize a total of six distinct sequences on the target DNA. An inner primer containing sequences of the sense and antisense strands of the target DNA initiates LAMP. The following strand displacement DNA synthesis primed by an outer primer releases a single-stranded DNA. This serves as template for DNA synthesis primed by the second inner and outer primers that hybridize to the other end of the target, which produces a stem–loop DNA structure. In subsequent LAMP cycling one inner primer hybridizes to the loop on the product and initiates displacement DNA synthesis, yielding the original stem–loop DNA and a new stem–loop DNA with a stem twice as long. The cycling reaction continues with accumulation of 109 copies of target in less than an hour. The final products are stem–loop DNAs with several inverted repeats of the target and cauliflower-like structures with multiple loops formed by annealing between alternately inverted repeats of the target in the same strand. Because LAMP recognizes the target by six distinct sequences initially and by four distinct sequences afterwards, it is expected to amplify the target sequence with high selectivity.

6,765 citations


Journal ArticleDOI
TL;DR: The database of Clusters of Orthologous Groups of proteins (COGs) is an attempt on a phylogenetic classification of the proteins encoded in 21 complete genomes of bacteria, archaea and eukaryotes.
Abstract: Rational classification of proteins encoded in sequenced genomes is critical for making the genome sequences maximally useful for functional and evolutionary studies. The database of Clusters of Orthologous Groups of proteins (COGs) is an attempt on a phylogenetic classification of the proteins encoded in 21 complete genomes of bacteria, archaea and eukaryotes (http://www. ncbi.nlm. nih.gov/COG). The COGs were constructed by applying the criterion of consistency of genome-specific best hits to the results of an exhaustive comparison of all protein sequences from these genomes. The database comprises 2091 COGs that include 56-83% of the gene products from each of the complete bacterial and archaeal genomes and approximately 35% of those from the yeast Saccharomyces cerevisiae genome. The COG database is accompanied by the COGNITOR program that is used to fit new proteins into the COGs and can be applied to functional and phylogenetic annotation of newly sequenced genomes.

3,656 citations


Journal ArticleDOI
TL;DR: The frequencies of each of the 257 468 complete protein coding sequences (CDSs) have been compiled from the taxonomical divisions of the GenBank DNA sequence database.
Abstract: The frequencies of each of the 257 468 complete protein coding sequences (CDSs) have been compiled from the taxonomical divisions of the GenBank DNA sequence database. The sum of the codons used by 8792 organisms has also been calculated. The data files can be obtained from the anonymous ftp sites of DDBJ, Kazusa and EBI. A list of the codon usage of genes and the sum of the codons used by each organism can be obtained through the web site http://www.kazusa.or.jp/codon/. The present study also reports recent developments on the WWW site. The new web interface provides data in the CodonFrequency-compatible format as well as in the traditional table format. The use of the database is facilitated by keyword based search analysis and the availability of codon usage tables for selected genes from each species. These new tools will provide users with the ability to further analyze for variations in codon usage among different genomes.

1,462 citations


Journal ArticleDOI
TL;DR: A high-throughput quantitative methylation assay that utilizes fluorescence-based real-time PCR (TaqMan) technology that requires no further manipulations after the PCR step is described, and MethyLight is a highly sensitive assay, capable of detecting methylated alleles in the presence of a 10,000-fold excess of unmethylated allele.
Abstract: Cytosine-5 DNA methylation occurs in the context of CpG dinucleotides in vertebrates. Aberrant methylation of CpG islands in human tumors has been shown to cause transcriptional silencing of tumor-suppressor genes. Most methods used to analyze cytosine-5 methylation patterns require cumbersome manual techniques that employ gel electrophoresis, restriction enzyme digestion, radiolabeled dNTPs or hybridization probes. The development of high-throughput technology for the analysis of DNA methylation would significantly expand our ability to derive molecular information from clinical specimens. This study describes a high-throughput quantitative methylation assay that utilizes fluorescence-based real-time PCR (TaqMan®) technology that requires no further manipulations after the PCR step. MethyLight is a highly sensitive assay, capable of detecting methylated alleles in the presence of a 10 000-fold excess of unmethylated alleles. The assay is also highly quantitative and can very accurately determine the relative prevalence of a particular pattern of DNA methylation. We show that MethyLight can distinguish between mono-allelic and bi-allelic methylation of the MLH1 mismatch repair gene in human colorectal tumor specimens. The development of this technique should considerably enhance our ability to rapidly and accurately generate epigenetic profiles of tumor samples.

1,451 citations


Journal ArticleDOI
TL;DR: The TRANSFAC content has been enhanced by information about training sequences used for the construction of nucleotide matrices as well as by data on plant sites and factors, and the database has been extended by two new modules.
Abstract: TRANSFAC is a database on transcription factors, their genomic binding sites and DNA-binding profiles (http://transfac.gbf.de/TRANSFAC/ ). Its content has been enhanced, in particular by information about training sequences used for the construction of nucleotide matrices as well as by data on plant sites and factors. Moreover, TRANSFAC has been extended by two new modules: PathoDB provides data on pathologically relevant mutations in regulatory regions and transcription factor genes, whereas S/MARt DB compiles features of scaffold/matrix attached regions (S/MARs) and the proteins binding to them. Additionally, the databases TRANSPATH, about signal transduction, and CYTOMER, about organs and cell types, have been extended and are increasingly integrated with the TRANSFAC data sources.

1,253 citations


Journal ArticleDOI
TL;DR: SMART (a Simple Modular Architecture Research Tool) allows the identification and annotation of genetically mobile domains and the analysis of domain architectures.
Abstract: SMART (a Simple Modular Architecture Research Tool) allows the identification and annotation of genetically mobile domains and the analysis of domain architectures (http://SMART.embl-heidelberg.de ). More than 400 domain families found in signalling, extracellular and chromatin-associated proteins are detectable. These domains are extensively annotated with respect to phyletic distributions, functional class, tertiary structures and functionally important residues. Each domain found in a non-redundant protein database as well as search parameters and taxonomic information are stored in a relational database system. User interfaces to this database allow searches for proteins containing specific combinations of domains in defined taxa.

1,227 citations


Journal ArticleDOI
TL;DR: The Database of Interacting Proteins (DIP; http://dip.doe-mbi.ucla.edu) is a database that documents experimentally determined protein-protein interactions to provide the scientific community with a comprehensive and integrated tool for browsing and efficiently extracting information about protein interactions and interaction networks in biological processes.
Abstract: The Database of Interacting Proteins (DIP; http:// dip.doe-mbi.ucla.edu ) is a database that documents experimentally determined protein‐protein interactions. This database is intended to provide the scientific community with a comprehensive and integrated tool for browsing and efficiently extracting information about protein interactions and interaction networks in biological processes. Beyond cataloging details of protein‐protein interactions, the DIP is useful for understanding protein function and protein‐protein relationships, studying the properties of networks of interacting proteins, benchmarking predictions of protein‐protein interactions, and studying the evolution of protein‐protein interactions.

1,117 citations


Journal ArticleDOI
TL;DR: The ENZYME database is a repository of information related to the nomenclature of enzymes that became an indispensable resource for the development of metabolic databases in recent years.
Abstract: The ENZYME database is a repository of information related to the nomenclature of enzymes. In recent years it has became an indispensable resource for the development of metabolic databases. The current version contains information on 3705 enzymes. It is available through the ExPASy WWW server (http://www. expasy.ch/enzyme/ ).

1,033 citations


Journal ArticleDOI
TL;DR: STRING (search tool for recurring instances of neighbouring genes), a tool to retrieve and display the genes a query gene repeatedly occurs with in clusters on the genome, performs iterative searches and visualises the results in their genomic context.
Abstract: The repeated occurrence of genes in each other’s neighbourhood on genomes has been shown to indicate a functional association between the proteins they encode. Here we introduce STRING (search tool for recurring instances of neighbouring genes), a tool to retrieve and display the genes a query gene repeatedly occurs with in clusters on the genome. The tool performs iterative searches and visualises the results in their genomic context. By finding the genomically associated genes for a query, it delineates a set of potentially functionally associated genes. The usefulness of STRING is illustrated with an example that suggests a functional context for an RNA methylase with unknown specificity. STRING is available at http://www.bork.embl-heidelberg.de/STRING

Journal ArticleDOI
TL;DR: MGB probes were more sequence specific than standard DNA probes, especially for single base mismatches at elevated hybridization temperatures, and fluorescence quenching was more efficient, giving increased sensitivity.
Abstract: DNA probes with conjugated minor groove binder (MGB) groups form extremely stable duplexes with single-stranded DNA targets, allowing shorter probes to be used for hybridization based assays. In this paper, sequence specificity of 3′-MGB probes was explored. In comparison with unmodified DNA, MGB probes had higher melting temperature (Tm) and increased specificity, especially when a mismatch was in the MGB region of the duplex. To exploit these properties, fluorogenic MGB probes were prepared and investigated in the 5′-nuclease PCR assay (real-time PCR assay, TaqMan assay). A 12mer MGB probe had the same Tm (65°C) as a no-MGB 27mer probe. The fluorogenic MGB probes were more specific for single base mismatches and fluorescence quenching was more efficient, giving increased sensitivity. A/T rich duplexes were stabilized more than G/C rich duplexes, thereby leveling probe Tm and simplifying design. In summary, MGB probes were more sequence specific than standard DNA probes, especially for single base mismatches at elevated hybridization temperatures.

Journal ArticleDOI
TL;DR: The preparation, operation and applications of biosensors and gene chips, which provide fast, sensitive and selective detection of DNA hybridization, are described.
Abstract: Wide-scale DNA testing requires the development of small, fast and easy-to-use devices. This article describes the preparation, operation and applications of biosensors and gene chips, which provide fast, sensitive and selective detection of DNA hybridization. Various new strategies for DNA biosensors and gene chips are examined, along with recent trends and future directions. The integration of hybridization detection schemes with the sample preparation process in a ‘Lab-on-a-Chip’ format is also covered. While the use of DNA biosensors and gene chips is at an early stage, such devices are expected to have an enormous effect on future DNA diagnostics.

Journal ArticleDOI
TL;DR: The striking synteny of the Chlamydia genomes and prevalence of tandemly duplicated genes are evidence of minimal chromosome rearrangement and foreign gene uptake, presumably owing to the ecological isolation of the obligate intracellular parasites.
Abstract: The genome sequences of Chlamydia trachomatis mouse pneumonitis (MoPn) strain Nigg (1 069 412 nt) and Chlamydia pneumoniae strain AR39 (1 229 853 nt) were determined using a random shotgun strategy. The MoPn genome exhibited a general conservation of gene order and content with the previously sequenced C.trachomatis serovar D. Differences between C.trachomatis strains were focused on an ~50 kb ‘plasticity zone’ near the termination origins. In this region MoPn contained three copies of a novel gene encoding a >3000 amino acid toxin homologous to a predicted toxin from Escherichia coli 0157:H7 but had apparently lost the tryptophan biosyntheis genes found in serovar D in this region. The C.pneumoniae AR39 chromosome was >99.9% identical to the previously sequenced C.pneumoniae CWL029 genome, however, comparative analysis identified an invertible DNA segment upstream of the uridine kinase gene which was in different orientations in the two genomes. AR39 also contained a novel 4524 nt circular single-stranded (ss)DNA bacteriophage, the first time a virus has been reported infecting C.pneumoniae. Although the chlamydial genomes were highly conserved, there were intriguing differences in key nucleotide salvage pathways: C.pneumoniae has a uridine kinase gene for dUTP production, MoPn has a uracil phosphororibosyl transferase, while C.trachomatis serovar D contains neither gene. Chromosomal comparison revealed that there had been multiple large inversion events since the species divergence of C.trachomatis and C.pneumoniae, apparently oriented around the axis of the origin of replication and the termination region. The striking synteny of the Chlamydia genomes and prevalence of tandemly duplicated genes are evidence of minimal chromosome rearrangement and foreign gene uptake, presumably owing to the ecological isolation of the obligate intracellular parasites. In the absence of genetic analysis, comparative genomics will continue to provide insight into the virulence mechanisms of these important human pathogens.

Journal ArticleDOI
TL;DR: Knowing about the target sequence, as well as its similarity to other mRNAs in the target tissue or RNA sample, is required to design successful oligonucleotide probes for quality microarray results.
Abstract: To examine the utility and performance of 50mer oligonucleotide (oligonucleotide probe) microarrays, gene-specific oligonucleotide probes were spotted along with PCR probes onto glass microarrays and the performance of each probe type was evaluated. The specificity of oligonucleotide probes was studied using target RNAs that shared various degrees of sequence similarity. Sensitivity was defined as the ability to detect a 3-fold change in mRNA. No significant difference in sensitivity between oligonucleotide probes and PCR probes was observed and both had a minimum reproducible detection limit of ∼10 mRNA copies/cell. Specificity studies showed that for a given oligonucleotide probe any ‘non-target’ transcripts (cDNAs) >75% similar over the 50 base target may show crosshybridization. Thus non-target sequences which have >75–80% sequence similarity with target sequences (within the oligonucleotide probe 50 base target region) will contribute to the overall signal intensity. In addition, if the 50 base target region is marginally similar, it must not include a stretch of complementary sequence >15 contiguous bases. Therefore, knowledge about the target sequence, as well as its similarity to other mRNAs in the target tissue or RNA sample, is required to design successful oligonucleotide probes for quality microarray results. Together these results validate the utility of oligonucleotide probe (50mer) glass microarrays.

Journal ArticleDOI
TL;DR: The Ribosomal Database Project (RDP-II), previously described by Maidak et al., continued during the past year to add new rRNA sequences to the aligned data and to improve the analysis commands.
Abstract: The Ribosomal Database Project (RDP-II), previously described by Maidak et al., continued during the past year to add new rRNA sequences to the aligned data and to improve the analysis commands. Release 7.1 (September 17, 1999) included more than 10 700 small subunit rRNA sequences. More than 850 type strain sequences were identified and added to the prokaryotic alignment, bringing the total number of type sequences to 3324 representing 2460 different species. Availability of an RDP-II mirror site in Japan is also near completion. RDP-II provides aligned and annotated rRNA sequences, derived phylogenetic trees and taxonomic hierarchies, and analysis services through its WWW server (http://rdp.cme.msu.edu/ ). Analysis services include rRNA probe checking, approximate phylogenetic placement of user sequences, screening user sequences for possible chimeric rRNA sequences, automated alignment, production of similarity matrices and services to plan and analyze terminal restriction fragment length polymorphism (T-RFLP) experiments.

Journal ArticleDOI
TL;DR: The complete sequence of the Bombyx mori fibroin gene has been determined by means of combining a shotgun sequencing strategy with physical map-based sequencing procedures, showing a spectacular organization, with a highly repetitive and G-rich core flanked by non-repetitive 5' and 3' ends.
Abstract: The complete sequence of the Bombyx mori fibroin gene has been determined by means of combining a shotgun sequencing strategy with physical map-based sequencing procedures. It consists of two exons (67 and 15 750 bp, respectively) and one intron (971 bp). The fibroin coding sequence presents a spectacular organization, with a highly repetitive and G-rich (~45%) core flanked by non-repetitive 5′ and 3′ ends. This repetitive core is composed of alternate arrays of 12 repetitive and 11 amorphous domains. The sequences of the amorphous domains are evolutionarily conserved and the repetitive domains differ from each other in length by a variety of tandem repeats of subdomains of ~208 bp which are reminiscent of the repetitive nucleosome organization. A typical composition of a subdomain is a cluster of repetitive units, Ua, followed by a cluster of units, Ub, (with a Ua:Ub ratio of 2:1) flanked by conserved boundary elements at the 3′ end. Moreover some repeats are also perfectly conserved at the peptide level indicating that the evolutionary pressure is not identical along the sequence. A tentative model for the constitution and evolution of this unusual gene is discussed.

Journal ArticleDOI
TL;DR: Several characteristics of EST-verified splice sites are analyzed and weight matrices for the major groups are built, which can be incorporated into gene prediction programs and should be significant for future investigations of the splicing mechanism.
Abstract: A set of 43 337 splice junction pairs was extracted from mammalian GenBank annotated genes. Expressed sequence tag (EST) sequences support 22 489 of them. Of these, 98.71% contain canonical dinucleotides GT and AG for donor and acceptor sites, respectively; 0.56% hold non-canonical GC-AG splice site pairs; and the remaining 0.73% occurs in a lot of small groups (with a maximum size of 0.05%). Studying these groups we observe that many of them contain splicing dinucleotides shifted from the annotated splice junction by one position. After close examination of such cases we present a new classification consisting of only eight observed types of splice site pairs (out of 256 a priori possible combinations). EST alignments allow us to verify the exonic part of the splice sites, but many non-canonical cases may be due to intron sequencing errors. This idea is given substantial support when we compare the sequences of human genes having non-canonical splice sites deposited in GenBank by high throughput genome sequencing projects (HTG). A high proportion (156 out of 171) of the human non-canonical and EST-supported splice site sequences had a clear match in the human HTG. They can be classified after corrections as: 79 GC-AG pairs (of which one was an error that corrected to GC-AG), 61 errors that were corrected to GT-AG canonical pairs, six AT-AC pairs (of which two were errors that corrected to AT-AC), one case was produced from non-existent intron, seven cases were found in HTG that were deposited to GenBank and finally there were only two cases left of supported non-canonical splice sites. If we assume that approximately the same situation is true for the whole set of annotated mammalian non-canonical splice sites, then the 99.24% of splice site pairs should be GT-AG, 0.69% GC-AG, 0.05% AT-AC and finally only 0.02% could consist of other types of non-canonical splice sites. We analyze several characteristics of EST-verified splice sites and build weight matrices for the major groups, which can be incorporated into gene prediction programs. We also present a set of EST-verified canonical splice sites larger by two orders of magnitude than the current one (22 199 entries versus approximately 600) and finally, a set of 290 EST-supported non-canonical splice sites. Both sets should be significant for future investigations of the splicing mechanism.

Journal ArticleDOI
TL;DR: Out of 11 sigma factors which belong to the extracytoplasmic function family, 10 are unique to B. halodurans, suggesting that they may have a role in the special mechanism of adaptation to an alkaline environment.
Abstract: The 4 202 353 bp genome of the alkaliphilic bacterium Bacillus halodurans C-125 contains 4066 predicted protein coding sequences (CDSs), 2141 (52.7%) of which have functional assignments, 1182 (29%) of which are conserved CDSs with unknown function and 743 (18. 3%) of which have no match to any protein database. Among the total CDSs, 8.8% match sequences of proteins found only in Bacillus subtilis and 66.7% are widely conserved in comparison with the proteins of various organisms, including B.subtilis. The B. halodurans genome contains 112 transposase genes, indicating that transposases have played an important evolutionary role in horizontal gene transfer and also in internal genetic rearrangement in the genome. Strain C-125 lacks some of the necessary genes for competence, such as comS, srfA and rapC, supporting the fact that competence has not been demonstrated experimentally in C-125. There is no paralog of tupA, encoding teichuronopeptide, which contributes to alkaliphily, in the C-125 genome and an ortholog of tupA cannot be found in the B.subtilis genome. Out of 11 sigma factors which belong to the extracytoplasmic function family, 10 are unique to B. halodurans, suggesting that they may have a role in the special mechanism of adaptation to an alkaline environment.

Journal ArticleDOI
TL;DR: This survey will summarise what is known about the process of transcription by pol I and pol III, how it happens and the proteins involved and attention will be drawn to the similarities between the three nuclear RNA polymerase systems and also to their differences.
Abstract: The task of transcribing nuclear genes is shared between three RNA polymerases in eukaryotes: RNA polymerase (pol) I synthesises the large rRNA, pol II synthesises mRNA and pol III synthesises tRNA and 5S rRNA. Although pol II has received most attention, pol I and pol III are together responsible for the bulk of transcriptional activity. This survey will summarise what is known about the process of transcription by pol I and pol III, how it happens and the proteins involved. Attention will be drawn to the similarities between the three nuclear RNA polymerase systems and also to their differences.

Journal ArticleDOI
TL;DR: The ASTRAL compendium provides several databases and tools to aid in the analysis of protein structures, particularly through the use of their sequences, and summarizes the overall characteristics of a protein structure.
Abstract: The ASTRAL compendium provides several databases and tools to aid in the analysis of protein structures, particularly through the use of their sequences. The SPACI scores included in the system summarize the overall characteristics of a protein structure. A structural alignments database indicates residue equivalencies in superimposed protein domain structures. The PDB sequence-map files provide a linkage between the amino acid sequence of the molecule studied (SEQRES records in a database entry) and the sequence of the atoms experimentally observed in the structure (ATOM records). These maps are combined with information in the SCOP database to provide sequences of protein domains. Selected subsets of the domain database, with varying degrees of similarity measured in several different ways, are also available. ASTRAL may be accessed at http://astral.stanford.edu/

Journal ArticleDOI
TL;DR: Using fibroblast lineages in different stages of transformation, it is found that c-Myc and Sp1 were induced to a dramatic extent when cells overcame replicative senescence and obtained immortal characteristics, in association with telomerase activation.
Abstract: Telomerase activation is thought to be a critical step in cellular immortalization and carcinogenesis. The human telomerase catalytic subunit (hTERT) is a rate limiting determinant of the enzymatic activity of human telomerase. In the previous study, we identified the proximal 181 bp core promoter responsible for transcriptional activity of the hTERT gene. To identify the regulatory factors of transcription, transient expression assays were performed using hTERT promoter reporter plasmids. Serial deletion assays of the core promoter revealed that the 5′-region containing the E-box, which binds Myc/Max, as well as the 3′-region containing the GC-box, which binds Sp1, are essential for transactivation. The mutations introduced in the E-box or GC-box significantly decreased transcriptional activity of the promoter. Overexpression of Myc/Max or Sp1 led to significant activation of transcription in a cell type-specific manner, while Mad/Max introduction repressed it. However, the effects of Myc/Max on transactivation were marginal when Sp1 sites were mutated. Western blot analysis using various cell lines revealed a positive correlation between c-Myc and Sp1 expression and transcriptional activity of hTERT. Using fibroblast lineages in different stages of transformation, we found that c-Myc and Sp1 were induced to a dramatic extent when cells overcame replicative senescence and obtained immortal characteristics, in association with telomerase activation. These findings suggest that c-Myc and Sp1 cooperatively function as the major determinants of hTERT expression, and that the switching functions of Myc/Max and Mad/Max might also play roles in telomerase regulation.

Journal ArticleDOI
TL;DR: The dbSNP database is a general catalog of genome variation to address the large-scale sampling designs required by association studies, gene mapping and evolutionary biology, the National Cancer for Biotechnology Information (NCBI) has established.
Abstract: In response to a need for a general catalog of genome variation to address the large-scale sampling designs required by association studies, gene mapping and evolutionary biology, the National Cancer for Biotechnology Information (NCBI) has established the dbSNP database. Submissions to dbSNP will be integrated with other sources of information at NCBI such as GenBank, PubMed, LocusLink and the Human Genome Project data. The complete contents of dbSNP are available to the public at website: http://www.ncbi.nlm.nih.gov/SNP. Submitted SNPs can also be downloaded via anonymous FTP at ftp://ncbi.nlm.nih.gov/snp/

Journal ArticleDOI
TL;DR: These new surface amplification processes are seen as an interesting approach for attachment of DNA molecules by their 5'-end on a solid support and can be used as an alternative route for producing DNA chips for genomic studies.
Abstract: Different chemical methods used to attach oligonucleotides by their 5'-end on a glass surface were tested in the framework of solid phase PCR where surface-bound instead of freely-diffusing primers are used to amplify DNA. Each method was first evaluated for its capacity to provide a high surface coverage of oligonucleotides essentially attached via a 5'-specific linkage that satisfyingly withstands PCR conditions and leaves the 3'-ends available for DNA polymerase activity. The best results were obtained with 5'-thiol-modified oligonucleotides attached to amino-silanised glass slides using a heterobifunctional cross-linker reagent. It was then demonstrated that the primers bound to the glass surface using the optimal chemistry can be involved in attaching and amplifying DNA molecules present in the reaction mix in the absence of freely-diffusing primers. Two distinct amplification processes called interfacial and surface amplification have been observed and characterised. The newly synthesised DNA can be detected and quantified by radioactive and fluorescent hybridisation assays. These new surface amplification processes are seen as an interesting approach for attachment of DNA molecules by their 5'-end on a solid support and can be used as an alternative route for producing DNA chips for genomic studies.

Journal ArticleDOI
TL;DR: The results define requirements for effective targets of chimeric nucleases and will guide the design of novel specificities for directed DNA cleavage in vitro and in vivo.
Abstract: This study concerns chimeric restriction enzymes that are hybrids between a zinc finger DNA-binding domain and the non-specific DNA-cleavage domain from the natural restriction enzyme FOK:I. Because of the flexibility of DNA recognition by zinc fingers, these enzymes are potential tools for cleaving DNA at arbitrarily selected sequences. Efficient double-strand cleavage by the chimeric nucleases requires two binding sites in close proximity. When cuts were mapped on the DNA strands, it was found that they occur in pairs separated by approximately 4 bp with a 5' overhang, as for native FOK:I. Furthermore, amino acid changes in the dimer interface of the cleavage domain abolished activity. These results reflect a requirement for dimerization of the cleavage domain. The dependence of cleavage efficiency on the distance between two inverted binding sites was determined and both upper and lower limits were defined. Two different zinc finger combinations binding to non-identical sites also supported specific cleavage. Molecular modeling was employed to gain insight into the precise location of the cut sites. These results define requirements for effective targets of chimeric nucleases and will guide the design of novel specificities for directed DNA cleavage in vitro and in vivo.

Journal ArticleDOI
TL;DR: InBase (InBase), the Intein Database and Registry, is a curated compilation of published and unpublished information about protein splicing, which presents general information as well as detailed data for each intein, including tabulated comparisons and a comprehensive bibliography.
Abstract: Inteins are self-catalytic protein splicing elements. InBase (http://www.neb.com/neb/inteins.html), the Intein Database and Registry, is a curated compilation of published and unpublished information about protein splicing. It presents general information as well as detailed data for each intein, including tabulated comparisons and a comprehensive bibliography. An intein-specific BLAST server is now available to assist in identifying new inteins.

Journal ArticleDOI
TL;DR: A two-step technology takes advantage of an Escherichia coli strain expressing the phage lambda Red functions and enables the rapid establishment of mutant strains carrying gene knock-outs with efficiencies >50%.
Abstract: The construction of mutant fungal strains is often limited by the poor efficiency of homologous recombination in these organisms. Higher recombination efficiencies can be obtained by increasing the length of homologous DNA flanking the transformation marker, although this is a tedious process when standard molecular biology techniques are used for the construction of gene replacement cassettes. Here, we present a two-step technology which takes advantage of an Escherichia coli strain expressing the phage λ Red(gam, bet, exo) functions and involves (i) the construction in this strain of a recombinant cosmid by in vivo recombination between a cosmid carrying a genomic region of interest and a PCR-generated transformation marker flanked by 50 bp regions of homology with the target DNA and (ii) genetic exchange in the fungus itself between the chromosomal locus and the circular or linearized recombinant cosmid. This strategy enables the rapid establishment of mutant strains carrying gene knock-outs with efficiencies >50%. It should also be appropriate for the construction of fungal strains with gene fusions or promoter replacements.

Journal ArticleDOI
TL;DR: Development of a cystic fibrosis mutation detection assay shows that Scorpion primers are selective enough to detect single base mutations and give good sensitivity in all cases.
Abstract: Scorpion primers can be used to detect PCR products in homogeneous solution. Their structure promotes a unimolecular probing mechanism. We compare their performance with that of the same probe sequence forced to act in a bimolecular manner. The data suggest that Scorpions indeed probe by a unimolecular mechanism which is faster and more efficient than the bimolecular mechanism. This mechanism is not dependent on enzymatic cleavage of the probe. A direct comparison between Scorpions, TaqMan and Molecular Beacons on a Roche LightCycler indicates that Scorpions perform better, particularly under fast cycling conditions. Development of a cystic fibrosis mutation detection assay shows that Scorpion primers are selective enough to detect single base mutations and give good sensitivity in all cases. Simultaneous detection of both normal and mutant alleles in a single reaction is possible by combining two Scorpions in a multiplex reaction. Such favourable properties of Scorpion primers should make the technology ideal in numerous applications.

Journal ArticleDOI
TL;DR: The results demonstrate that nucleic acid enzymes are capable of binding transition metal ions such as Zn(2+)with high affinity, and the resulting enzymes are more efficient at RNA cleavage than most Mg(2+)-dependent nucleic acids enzymes under similar conditions.
Abstract: A group of highly efficient Zn(II)-dependent RNA-cleaving deoxyribozymes has been obtained through in vitro selection. They share a common motif with the '8-17' deoxyribozyme isolated under different conditions, including different design of the random pool and metal ion cofactor. We found that this commonly selected motif can efficiently cleave both RNA and DNA/RNA chimeric substrates. It can cleave any substrate containing rNG (where rN is any ribo-nucleotide base and G can be either ribo- or deoxy-ribo-G). The pH profile and reaction products of this deoxyribozyme are similar to those reported for hammerhead ribozyme. This deoxyribozyme has higher activity in the presence of transition metal ions compared to alkaline earth metal ions. At saturating concentrations of Zn(2+), the cleavage rate is 1.35 min(-1)at pH 6.0; based on pH profile this rate is estimated to be at least approximately 30 times faster at pH 7.5, where most assays of Mg(2+)-dependent DNA and RNA enzymes are carried out. This work represents a comprehensive characterization of a nucleic acid-based endonuclease that prefers transition metal ions to alkaline earth metal ions. The results demonstrate that nucleic acid enzymes are capable of binding transition metal ions such as Zn(2+)with high affinity, and the resulting enzymes are more efficient at RNA cleavage than most Mg(2+)-dependent nucleic acid enzymes under similar conditions.

Journal ArticleDOI
TL;DR: The WIT (What Is There) system has been designed to support comparative analysis of sequenced genomes and to generate metabolic reconstructions based on chromosomal sequences and metabolic modules from the EMP/MPW family of databases.
Abstract: The WIT (What Is There) (http://wit.mcs.anl.gov/WIT2/ ) system has been designed to support comparative analysis of sequenced genomes and to generate metabolic reconstructions based on chromosomal sequences and metabolic modules from the EMP/MPW family of databases. This system contains data derived from about 40 completed or nearly completed genomes. Sequence homologies, various ORF-clustering algorithms, relative gene positions on the chromosome and placement of gene products in metabolic pathways (metabolic reconstruction) can be used for the assignment of gene functions and for development of overviews of genomes within WIT. The integration of a large number of phylogenetically diverse genomes in WIT facilitates the understanding of the physiology of different organisms.