scispace - formally typeset
Search or ask a question

Showing papers in "Nucleic Acids Research in 1999"


Journal ArticleDOI
TL;DR: The Kyoto Encyclopedia of Genes and Genomes (KEGG) as discussed by the authors is a knowledge base for systematic analysis of gene functions in terms of the networks of genes and molecules.
Abstract: Kyoto Encyclopedia of Genes and Genomes (KEGG) is a knowledge base for systematic analysis of gene functions in terms of the networks of genes and molecules. The major component of KEGG is the PATHWAY database that consists of graphical diagrams of biochemical pathways including most of the known metabolic pathways and some of the known regulatory pathways. The pathway information is also represented by the ortholog group tables summarizing orthologous and paralogous gene groups among different organisms. KEGG maintains the GENES database for the gene catalogs of all organisms with complete genomes and selected organisms with partial genomes, which are continuously re-annotated, as well as the LIGAND database for chemical compounds and enzymes. Each gene catalog is associated with the graphical genome map for chromosomal locations that is represented by Java applet. In addition to the data collection efforts, KEGG develops and provides various computational tools, such as for reconstructing biochemical pathways from the complete genome sequence and for predicting gene regulatory networks from the gene expression profiles. The KEGG databases are daily updated and made freely available (http://www.genome.ad.jp/kegg/).

24,024 citations


Journal ArticleDOI
TL;DR: A new algorithm for finding tandem repeats which works without the need to specify either the pattern or pattern size is presented and its ability to detect tandem repeats that have undergone extensive mutational change is demonstrated.
Abstract: A tandem repeat in DNA is two or more contiguous, approximate copies of a pattern of nucleotides. Tandem repeats have been shown to cause human disease, may play a variety of regulatory and evolutionary roles and are important laboratory and analytic tools. Extensive knowledge about pattern size, copy number, mutational history, etc. for tandem repeats has been limited by the inability to easily detect them in genomic sequence data. In this paper, we present a new algorithm for finding tandem repeats which works without the need to specify either the pattern or pattern size. We model tandem repeats by percent identity and frequency of indels between adjacent pattern copies and use statistically based recognition criteria. We demonstrate the algorithm’s speed and its ability to detect tandem repeats that have undergone extensive mutational change by analyzing four sequences: the human frataxin gene, the human β T cell receptor locus sequence and two yeast chromosomes. These sequences range in size from 3 kb up to 700 kb. A World Wide Web server interface at c3.biomath.mssm.edu/trf.html has been established for automated use of the program.

6,577 citations


Journal ArticleDOI
TL;DR: This report summarizes the present status of this database of nucleotide sequence motifs found in plant cis-acting regulatory DNA elements and available tools.
Abstract: PLACE (http://www.dna.affrc.go.jp/htdocs/PLACE/) is a database of nucleotide sequence motifs found in plant cis-acting regulatory DNA elements. Motifs were extracted from previously published reports on genes in vascular plants. In addition to the motifs originally reported, their variations in other genes or in other plant species in later reports are also compiled. Documents for each motif in the PLACE database contains, in addition to a motif sequence, a brief definition and description of each motif, and relevant literature with PubMed ID numbers and GenBank accession numbers where available. Users can search their query sequences for cis-elements using the Signal Scan program at our web site. The results will be reported in one of the three forms. Clicking the PLACE accession numbers in the result report will open the pertinent motif document. Clicking the PubMed or GenBank accession number in the document will allow users to access to these databases, and to read the of the literature or the annotation in the DNA database. This report summarizes the present status of this database and available tools.

3,140 citations


Journal ArticleDOI
TL;DR: The comparison of animal mitochondrial gene arrangements has become a very powerful means for inferring ancient evolutionary relationships, since rearrangements appear to be unique, generally rare events that are unlikely to arise independently in separate evolutionary lineages.
Abstract: Animal mitochondrial DNA is a small, extrachromosomal genome, typically ~16 kb in size. With few exceptions, all animal mitochondrial genomes contain the same 37 genes: two for rRNAs, 13 for proteins and 22 for tRNAs. The products of these genes, along with RNAs and proteins imported from the cytoplasm, endow mitochondria with their own systems for DNA replication, transcription, mRNA processing and translation of proteins. The study of these genomes as they function in mitochondrial systems—‘mitochondrial genomics’— serves as a model for genome evolution. Furthermore, the comparison of animal mitochondrial gene arrangements has become a very powerful means for inferring ancient evolutionary relationships, since rearrangements appear to be unique, generally rare events that are unlikely to arise independently in separate evolutionary lineages. Complete mitochondrial gene arrangements have been published for 58 chordate species and 29 non-chordate species, and partial arrangements for hundreds of other taxa. This review compares and summarizes these gene arrangements and points out some of the questions that may be addressed by comparing mitochondrial systems.

2,923 citations


Journal ArticleDOI
TL;DR: The MEROPS database has added an analysis tool to the relevant species pages to show significant gains and losses of peptidase genes relative to related species, and has collected over 39 000 known cleavage sites in proteins, peptides and synthetic substrates.
Abstract: Peptidases (proteolytic enzymes) are of great relevance to biology, medicine and biotechnology. This practical importance creates a need for an integrated source of information about them, and also about their natural inhibitors. The MEROPS database (http://merops.sanger.ac.uk) aims to fill this need. The organizational principle of the database is a hierarchical classification in which homologous sets of the proteins of interest are grouped in families and the homologous families are grouped in clans. Each peptidase, family and clan has a unique identifier. The database has recently been expanded to include the protein inhibitors of peptidases, and these are classified in much the same way as the peptidases. Forms of information recently added include new links to other databases, summary alignments for peptidase clans, displays to show the distribution of peptidases and inhibitors among organisms, substrate cleavage sites and indexes for expressed sequence tag libraries containing peptidases. A new way of making hyperlinks to the database has been devised and a BlastP search of our library of peptidase and inhibitor sequences has been added.

2,406 citations


Journal ArticleDOI
TL;DR: Significant technical improvements to GLIMMER are reported that improve its accuracy still further, and a comprehensive evaluation demonstrates that the accuracy of the system is likely to be higher than previously recognized.
Abstract: The GLIMMER system for microbial gene identification finds approximately 97-98% of all genes in a genome when compared with published annotation. This paper reports on two new results: (i) significant technical improvements to GLIMMER that improve its accuracy still further, and (ii) a comprehensive evaluation that demonstrates that the accuracy of the system is likely to be higher than previously recognized. A significant proportion of the genes missed by the system appear to be hypothetical proteins whose existence is only supported by the predictions of other programs. When the analysis is restricted to genes that have significant homology to genes in other organisms, GLIMMER misses <1% of known genes.

2,369 citations


Journal ArticleDOI
TL;DR: This report describes the systematic and up-to-date analysis of genomes (PEDANT), a comprehensive database of the yeast genome (MYGD), a database reflecting the progress in sequencing the Arabidopsis thaliana genome (MATD), the database of assembled, annotated human EST clusters (MEST), and the collection of protein sequence data within the framework of the PIR-International Protein Sequence Database (described elsewhere in this volume).
Abstract: The Munich Information Center for Protein Sequences (MIPS-GSF, Neuherberg, Germany) continues to provide genome-related information in a systematic way. MIPS supports both national and European sequencing and functional analysis projects, develops and maintains automatically generated and manually annotated genome-specific databases, develops systematic classification schemes for the functional annotation of protein sequences, and provides tools for the comprehensive analysis of protein sequences. This report updates the information on the yeast genome (CYGD), the Neurospora crassa genome (MNCDB), the databases for the comprehensive set of genomes (PEDANT genomes), the database of annotated human EST clusters (HIB), the database of complete cDNAs from the DHGP (German Human Genome Project), as well as the project specific databases for the GABI (Genome Analysis in Plants) and HNB (Helmholtz-Netzwerk Bioinformatik) networks. The Arabidospsis thaliana database (MATDB), the database of mitochondrial proteins (MITOP) and our contribution to the PIR International Protein Sequence Database have been described elsewhere [Schoof et al. (2002) Nucleic Acids Res., 30, 91-93; Scharfe et al. (2000) Nucleic Acids Res., 28, 155-158; Barker et al. (2001) Nucleic Acids Res., 29, 29-32]. All databases described, the protein analysis tools provided and the detailed descriptions of our projects can be accessed through the MIPS World Wide Web server (http://mips.gsf.de).

1,314 citations


Journal ArticleDOI
TL;DR: The PROSITE database consists of biologically significant patterns and profiles formulated in such a way that with appropriate computational tools it can help to determine to which known family of protein (if any) a new sequence belongs, or which known domain(s) it contains.
Abstract: The PROSITE database (http://www.expasy.ch/sprot/prosite.htm l) consists of biologically significant patterns and profiles formulated in such a way that with appropriate computational tools it can help to determine to which known family of protein (if any) a new sequence belongs, or which known domain(s) it contains.

1,108 citations


Journal ArticleDOI
TL;DR: The HIV RT and Protease Sequence Database is an on-line relational database that catalogues evolutionary and drug-related human immunodeficiency virus reverse transcriptase (RT) and protease sequence variation.
Abstract: The HIV RT and Protease Sequence Database is an online relational database that catalogs evolutionary and drug-related human immunodeficiency virus (HIV) reverse transcriptase (RT) and protease sequence variation (http://hivdb.stanford.edu ). The database contains a compilation of nearly all published HIV RT and protease sequences including International Collaboration database submissions (e.g., GenBank) and sequences published in journal articles. Sequences are linked to data about the source of the sequence sample and the antiretroviral drug treatment history of the individual from whom the isolate was obtained. The database is curated and sequences are annotated with data from >230 literature references. Users can retrieve additional data and view alignments of sequence sets meeting specific criteria (e.g., treatment history, subtype, presence of a particular mutation). A gene-specific sequence analysis program, new user-defined queries and nearly 2000 additional sequences were added in 1999.

980 citations


Journal ArticleDOI
TL;DR: Using an efficient data structure called a suffix tree, the system is able to rapidly align sequences containing millions of nucleotides and should facilitate analysis of syntenic chromosomal regions, strain-to-strain comparisons, evolutionary comparisons and genomic duplications.
Abstract: A new system for aligning whole genome sequences is described. Using an efficient data structure called a suffix tree, the system is able to rapidly align sequences containing millions of nucleotides. Its use is demonstrated on two strains of Mycoplasma tuberculosis, on two less similar species of Mycoplasma bacteria and on two syntenic sequences from human chromosome 12 and mouse chromosome 6. In each case it found an alignment of the input sequences, using between 30 s and 2 min of computation time. From the system output, information on single nucleotide changes, translocations and homologous genes can easily be extracted. Use of the algorithm should facilitate analysis of syntenic chromosomal regions, strain-to-strain comparisons, evolutionary comparisons and genomic duplications.

962 citations


Journal ArticleDOI
TL;DR: The definition, the reference information, a list of related entries in terms of the correlation coefficient, and the actual data of AAindex are presented.
Abstract: AAindex is a database of numerical indices representing various physicochemical and biochemical properties of amino acids and pairs of amino acids. It consists of two sections: AAindex1 for the amino acid index of 20 numerical values and AAindex2 for the amino acid mutation matrix of 210 numerical values. Each entry of either AAindex1 or AAindex2 consists of the definition, the reference information, a list of related entries in terms of the correlation coefficient, and the actual data. The database may be accessed through the DBGET/LinkDB system at GenomeNet (http://www.genome.ad. jp/dbget/) or may be downloaded by anonymous FTP (ftp://ftp.genome. ad.jp/db/genomenet/aaindex/).

Journal ArticleDOI
TL;DR: The Ribosomal Database Project (RDP-II), previously described by Maidak et al. (1997), is now hosted by the Center for Microbial Ecology at Michigan State University and will provide more rapid updating of data, better data accuracy and increased user access.
Abstract: The Ribosomal Database Project (RDP-II), previously described by Maidak et al. [ Nucleic Acids Res. (1997), 25, 109-111], is now hosted by the Center for Microbial Ecology at Michigan State University. RDP-II is a curated database that offers ribosomal RNA (rRNA) nucleotide sequence data in aligned and unaligned forms, analysis services, and associated computer programs. During the past two years, data alignments have been updated and now include >9700 small subunit rRNA sequences. The recent development of an ObjectStore database will provide more rapid updating of data, better data accuracy and increased user access. RDP-II includes phylogenetically ordered alignments of rRNA sequences, derived phylogenetic trees, rRNA secondary structure diagrams, and various software programs for handling, analyzing and displaying alignments and trees. The data are available via anonymous ftp (ftp.cme.msu. edu) and WWW (http://www.cme.msu.edu/RDP). The WWW server provides ribosomal probe checking, approximate phylogenetic placement of user-submitted sequences, screening for possible chimeric rRNA sequences, automated alignment, and a suggested placement of an unknown sequence on an existing phylogenetic tree. Additional utilities also exist at RDP-II, including distance matrix, T-RFLP, and a Java-based viewer of the phylogenetic trees that can be used to create subtrees.

Journal ArticleDOI
TL;DR: Sox proteins perform their function in a complex interplay with other transcription factors in a manner highly dependent on cell type and promoter context, and exhibit a remarkable crosstalk and functional redundancy among each other.
Abstract: Sox proteins belong to the HMG box superfamily of DNA-binding proteins and are found throughout the animal kingdom. They are involved in the regulation of such diverse developmental processes as germ layer formation, organ development and cell type specifi-cation. Hence, deletion or mutation of Sox proteins often results in developmental defects and congenital disease in humans. Sox proteins perform their function in a complex interplay with other transcription factors in a manner highly dependent on cell type and promoter context. They exhibit a remarkable crosstalk and functional redundancy among each other.

Journal ArticleDOI
TL;DR: Investigation of the expression of human DNMT1, 3a and 3b found widespread, coordinate expression of all three transcripts in most normal tissues, and several novel alternatively spliced forms of DNMT3b, which may have altered enzymatic activity, were found to be expressed in a tissue-specific manner.
Abstract: DNA methylation in mammals is required for embryonic development, X chromosome inactivation and imprinting. Previous studies have shown that methylation patterns become abnormal in malignant cells and may contribute to tumorigenesis by improper de novo methylation and silencing of the promoters for growth-regulatory genes. RNA and protein levels of the DNA methyltransferase DNMT1 have been shown to be elevated in tumors, however murine stem cells lacking Dnmt1 are still able to de novo methylate viral DNA. The recent cloning of a new family of DNA methyltransferases (Dnmt3a and Dnmt3b) in mouse which methylate hemimethylated and unmethylated templates with equal efficiencies make them candidates for the long sought de novo methyltransferases. We have investigated the expression of human DNMT1, 3a and 3b and found widespread, coordinate expression of all three transcripts in most normal tissues. Chromosomal mapping placed DNMT3a on chromosome 2p23 and DNMT3b on chromosome 20q11.2. Significant overexpression of DNMT3b was seen in tumors while DNMT1 and DNMT3a were only modestly over-expressed and with lower frequency. Lastly, several novel alternatively spliced forms of DNMT3b, which may have altered enzymatic activity, were found to be expressed in a tissue-specific manner.

Journal ArticleDOI
TL;DR: This paper presents the first systematic study of the most commonly used alignment programs using BAliBASE benchmark alignments as test cases, and proposes appropriate alignment strategies, depending on the nature of a particular set of sequences.
Abstract: In recent years improvements to existing programs and the introduction of new iterative algorithms have changed the state-of-the-art in protein sequence alignment. This paper presents the first systematic study of the most commonly used alignment programs using BAliBASE benchmark alignments as test cases. Even below the ‘twilight zone’ at 10‐20% residue identity, the best programs were capable of correctly aligning on average 47% of the residues. We show that iterative algorithms often offer improved alignment accuracy though at the expense of computation time. A notable exception was the effect of introducing a single divergent sequence into a set of closely related sequences, causing the iteration to diverge away from the best alignment. Global alignment programs generally performed better than local methods, except in the presence of large N/C-terminal extensions and internal insertions. In these cases, a local algorithm was more successful in identifying the most conserved motifs. This study enables us to propose appropriate alignment strategies, depending on the nature of a particular set of sequences. The employment of more than one program based on different alignment techniques should significantly improve the quality of automatic protein sequence alignment methods. The results also indicate guidelines for improvement of alignment algorithms.

Journal ArticleDOI
TL;DR: A new ligand-dependent recombinase, Cre-ER(T2), was recently characterized, which was approximately 4-fold more efficiently induced by OHT than Cre- ER(T) in cultured cells, and a dose-response study showed that Cre-Er(T 2) was approximately 10- fold more sensitive to OHT induction thanCre-ER (T).
Abstract: Conditional DNA excision between two LoxP sites can be achieved in the mouse using Cre-ER(T), a fusion protein between a mutated ligand binding domain of the human estrogen receptor (ER) and the Cre recombinase, the activity of which can be induced by 4-hydroxy-tamoxifen (OHT), but not natural ER ligands. We have recently characterized a new ligand-dependent recombinase, Cre-ER(T2), which was approximately 4-fold more efficiently induced by OHT than Cre-ER(T) in cultured cells. In order to compare the in vivo efficiency of these two ligand-inducible recombinases to generate temporally-controlled somatic mutations, we have engineered transgenic mice expressing a LoxP-flanked (floxed) transgene reporter and either Cre-ER(T) or Cre-ER(T2) under the control of the bovine keratin 5 promoter that is specifically active in the epidermis basal cell layer. No background recombinase activity could be detected, while recombination was induced in basal keratinocytes upon OHT administration. Interestingly, a dose-response study showed that Cre-ER(T2) was approximately 10-fold more sensitive to OHT induction than Cre-ER(T).

Journal ArticleDOI
TL;DR: It was found that formalin-fixed tissue was resistant to solubilization by chaotropic agents, however, proteinase K completelysolubilized the fixed tissue and enabled the extraction of almost the same amount of RNA as from a fresh sample.
Abstract: Formalin-fixed archival samples are known to be poor materials for molecular biological applications We conducted a series of experiments to understand the alterations in RNA in fixed tissue We found that formalin-fixed tissue was resistant to solubilization by chaotropic agents However, proteinase K completely solubilized the fixed tissue and enabled the extraction of almost the same amount of RNA as from a fresh sample The extracted RNA did not show apparent degradation However, as reported, successful PCR amplification was limited to short targets The nature of such 'fixed' RNA was analyzed using synthetic homo-oligo RNAs The heterogeneous increase in molecular weight of the RNAs, measured by MALDI-TOF mass spectrometry, showed that all four bases showed addition of mono-methylol (-CH(2)OH) groups at various rates The modification rate varied from 40% for adenine to 4% for uracil In addition, some adenines underwent dimerization through methylene bridging The majority of the methylol groups, however, could be removed from bases by simply elevating the temperature in formalin-free buffer This demodification proved effective in restoring the template activity of RNA from fixed tissue The improvement in PCR results suggested that more than half of the modification was removed by this demodification

Journal ArticleDOI
TL;DR: Recent advances that have been directed towards understanding the biological role of transcription factors are summarized and discussed.
Abstract: One of the most common regulatory elements is the GC box and the related GT/CACC box, which are widely distributed in promoters, enhancers and locus control regions of housekeeping as well as tissue-specific genes. For long it was generally thought that Sp1 is the major factor acting through these motifs. Recent discoveries have shown that Sp1 is only one of many transcription factors binding and acting through these elements. Sp1 simply represents the first identified and cloned protein of a family of transcription factors characterised by a highly conserved DNA-binding domain consisting of three zinc fingers. Currently this new family of transcription factors has at least 16 different mammalian members. Here, we will summarise and discuss recent advances that have been directed towards understanding the biological role of these proteins.

Journal ArticleDOI
TL;DR: The method is based on homologous recombination by ET-cloning and was found to work with high efficiency and should be applicable to any BAC modification desired.
Abstract: We present a method to modify bacterial artificial chromosomes (BACs) resident in their host strain. The method is based on homologous recombination by ET-cloning. We have successfully modified BACs at two distinct loci by recombination with a PCR product containing homology arms of 50 nt. The procedure we describe here is rapid, was found to work with high efficiency and should be applicable to any BAC modification desired.

Journal ArticleDOI
TL;DR: Analysis of spot intensities from hybridization to replicate arrays identified sets of genes with signals consistently above background suggesting that at least 25% of genes were expressed at detectable levels during growth in rich media.
Abstract: We have established high resolution methods for global monitoring of gene expression in Escherichia coli. Hybridization of radiolabeled cDNA to spot blots on nylon membranes was compared to hybridization of fluorescently-labeled cDNA to glass microarrays for efficiency and reproducibility. A complete set of PCR primers was created for all 4290 annotated open reading frames (ORFs) from the complete genome sequence of E.coli K-12 (MG1655). Glass- and nylonbased arrays of PCR products were prepared and used to assess global changes in gene expression. Full-length coding sequences for array printing were generated by two-step PCR amplification. In this study we measured changes in RNA levels after exposure to heat shock and following treatment with isopropyl-β-D-thiogalactopyranoside (IPTG). Both radioactive and fluorescence-based methods showed comparable results. Treatment with IPTG resulted in high level induction of the lacZYA and melAB operons. Following heat shock treatment 119 genes were shown to have significantly altered expression levels, including 35 previously uncharacterized ORFs and most genes of the heat shock stimulon. Analysis of spot intensities from hybridization to replicate arrays identified sets of genes with signals consistently above background suggesting that at least 25% of genes were expressed at detectable levels during growth in rich media.

Journal ArticleDOI
TL;DR: Pfam is a collection of multiple alignments and profile hidden Markov models of protein domain families that contains 1313 families and over 54% of proteins in SWISS-PROT-35 and SP-TrEMBL-5 match a Pfam family.
Abstract: Pfam is a collection of multiple alignments and profile hidden Markov models of protein domain families. Release 3.1 is a major update of the Pfam database and contains 1313 families which are available on the World Wide Web in Europe at http://www.sanger.ac.uk/Software/Pfam/ and http://www.cgr.ki.se/Pfam/, and in the US at http://pfam.wustl.edu/. Over 54% of proteins in SWISS-PROT-35 and SP-TrEMBL-5 match a Pfam family. The primary changes of Pfam since release 2.1 are that we now use the more advanced version 2 of the HMMER software, which is more sensitive and provides expectation values for matches, and that it now includes proteins from both SP-TrEMBL and SWISS-PROT.

Journal ArticleDOI
TL;DR: The possible structural basis and functional consequences of the observed alterations in chromatin associated with transcriptional activation and repression are discussed.
Abstract: Chromatin disruption and modification are associated with transcriptional regulation by diverse coactivators and corepressors. Here we discuss the possible structural basis and functional consequences of the observed alterations in chromatin associated with transcriptional activation and repression. Recent advances in defining the roles of individual histones and their domains in the assembly and maintenance of regulatory architectures provide a framework for understanding how chromatin remodelling machines, histone acetyltransferases and deacetylases function.

Journal ArticleDOI
TL;DR: Several additional general trends in the evolution of repair proteins were noticed; in particular, multiple, independent fusions of helicase and nuclease domains, and independent inactivation of enzymatic domains that apparently retain adaptor or regulatory functions.
Abstract: A detailed analysis of protein domains involved in DNA repair was performed by comparing the sequences of the repair proteins from two well-studied model organisms, the bacterium Escherichia coli and yeast Saccharomyces cerevisiae, to the entire sets of protein sequences encoded in completely sequenced genomes of bacteria, archaea and eukaryotes. Previously uncharacterized conserved domains involved in repair were identified, namely four families of nucleases and a family of eukaryotic repair proteins related to the proliferating cell nuclear antigen. In addition, a number of previously undetected occurrences of known conserved domains were detected; for example, a modified helix-hairpin-helix nucleic acid-binding domain in archaeal and eukaryotic RecA homologs. There is a limited repertoire of conserved domains, primarily ATPases and nucleases, nucleic acid-binding domains and adaptor (protein-protein interaction) domains that comprise the repair machinery in all cells, but very few of the repair proteins are represented by orthologs with conserved domain architecture across the three superkingdoms of life. Both the external environment of an organism and the internal environment of the cell, such as the chromatin superstructure in eukaryotes, seem to have a profound effect on the layout of the repair systems. Another factor that apparently has made a major contribution to the composition of the repair machinery is horizontal gene transfer, particularly the invasion of eukaryotic genomes by organellar genes, but also a number of likely transfer events between bacteria and archaea. Several additional general trends in the evolution of repair proteins were noticed; in particular, multiple, independent fusions of helicase and nuclease domains, and independent inactivation of enzymatic domains that apparently retain adaptor or regulatory functions.

Journal ArticleDOI
TL;DR: Differential mismatch detection was accomplished irrespective of DNA sequence composition and mismatch identity, and single-base changes in sequences hybridized at the electrode surface are also detected accurately.
Abstract: High-throughput DNA sensors capable of detecting single-base mismatches are required for the routine screening of genetic mutations and disease. A new strategy for the electrochemical detection of single-base mismatches in DNA has been developed based upon charge transport through DNA films. Double-helical DNA films on gold surfaces have been prepared and used to detect DNA mismatches electrochemically. The signals obtained from redox-active intercalators bound to DNA-modified gold surfaces display a marked sensitivity to the presence of base mismatches within the immobilized duplexes. Differential mismatch detection was accomplished irrespective of DNA sequence composition and mismatch identity. Single-base changes in sequences hybridized at the electrode surface are also detected accurately. Coupling the redox reactions of intercalated species to electrocatalytic processes in solution considerably increases the sensitivity of this assay. Reporting on the electronic structure of DNA, as opposed to the hybridization energetics of single-stranded oligonucleotides, electrochemical sensors based on charge transport may offer fundamental advantages in both scope and sensitivity.

Journal ArticleDOI
TL;DR: PlantCARE is a database of plant cis- acting regulatory elements, enhancers and repressors that offers a link to the EMBL entry that contains the full gene sequence as well as a description of the conditions in which a motif becomes functional.
Abstract: PlantCARE is a database of plant cis- acting regulatory elements, enhancers and repressors. Besides the transcription motifs found on a sequence, it also offers a link to the EMBL entry that contains the full gene sequence as well as a description of the conditions in which a motif becomes functional. The information on these sites is given by matrices, consensus and individual site sequences on particular genes, depending on the available information. PlantCARE is a relational database available via the web at the URL: http://sphinx.rug.ac.be:8080/PlantCARE/

Journal ArticleDOI
TL;DR: The variable intron-exon structures of the 10 model organisms reveal two interesting statistical phenomena, which cast light on some previous speculations about genome size and intron size.
Abstract: To investigate the distribution of intron-exon structures of eukaryotic genes, we have constructed a general exon database comprising all available intron-containing genes and exon databases from 10 eukaryotic model organisms: Homo sapiens, Mus musculus, Gallus gallus, Rattus norvegicus, Arabidopsis thaliana, Zea mays, Schizosaccharomyces pombe, Aspergillus, Caenorhabditis elegans and Drosophila. We purged redundant genes to avoid the possible bias brought about by redundancy in the databases. After discarding those questionable introns that do not contain correct splice sites, the final database contained 17 102 introns, 21 019 exons and 2903 independent or quasi-independent genes. On average, a eukaryotic gene contains 3.7 introns per kb protein coding region. The exon distribution peaks around 30-40 residues and most introns are 40-125 nt long. The variable intron-exon structures of the 10 model organisms reveal two interesting statistical phenomena, which cast light on some previous speculations. (i) Genome size seems to be correlated with total intron length per gene. For example, invertebrate introns are smaller than those of human genes, while yeast introns are shorter than invertebrate introns. However, this correlation is weak, suggesting that other factors besides genome size may also affect intron size. (ii) Introns smaller than 50 nt are significantly less frequent than longer introns, possibly resulting from a minimum intron size requirement for intron splicing.

Journal ArticleDOI
TL;DR: It is suggested that the AP2 and B3-like domains of RAV1 are connected by a highly flexible structure enabling the two domains to bind to the CAACA and CACCTG motifs in various spacings and orientations.
Abstract: We have cloned and characterized two novel DNA binding proteins designated RAV1 and RAV2 from Arabidopsis thaliana. RAV1 and RAV2 contain two distinct amino acid sequence domains found only in higher plant species. The N-terminal regions of RAV1 and RAV2 are homologous to the AP2 DNA-binding domain present in a family of transcription factors represented by the Arabidopsis APETALA2 and tobacco EREBP proteins, while the C-terminal region exhibits homology to the highly conserved C-terminal domain, designated B3, of VP1/ABI3 transcription factors. Binding site selection assays using a recombinant glutathione S-transferase fusion protein have revealed that RAV1 binds specifically to bipartite recognition sequences composed of two unrelated motifs, 5'-CAACA-3' and 5'-CACCTG-3', separated by various spacings in two different relative orientations. Analyses using various deletion derivatives of the RAV1 fusion protein show that the AP2 and B3-like domains of RAV1 bind autonomously to the CAACA and CACCTG motifs, respectively, and together achieve a high affinity and specificity of binding. From these results, we suggest that the AP2 and B3-like domains of RAV1 are connected by a highly flexible structure enabling the two domains to bind to the CAACA and CACCTG motifs in various spacings and orientations.

Journal ArticleDOI
TL;DR: New features include capability to search database files by name or substructural features, modifications in tmRNA, and links to related data and sites.
Abstract: The RNA Modification Database (http://medlib.med.utah.edu/RNAmods/) provides a comprehensive listing of naturally modified nucleosides in RNA. Each file includes: chemical structure; common name and symbol; type(s) of RNA in which found and corresponding phylogenetic distribution; Chemical s registry number and index name; and initial literature citations for structure characterization and chemical synthesis. New features include capability to search database files by name or substructural features, modifications in tmRNA, and links to related data and sites.

Journal ArticleDOI
TL;DR: A new method for amplifying cDNA ends is described which requires only first-strand cDNA synthesis and a single PCR to generate a correct product with very low or no background.
Abstract: A new method for amplifying cDNA ends is described which requires only first-strand cDNA synthesis and a single PCR to generate a correct product with very low or no background. The method can be successfully applied to total RNA as well as poly A+ RNA. The same first-strand cDNA can be used to amplify flanking sequences of any cDNA species present in the sample.

Journal ArticleDOI
TL;DR: A new, heuristic method producing fairly accurate inhomogeneous Markov models of protein coding regions that can be built 'on the fly' by a web server for any DNA sequence >400 nt and gives an insight into the mechanism of codon usage pattern evolution.
Abstract: Computer methods of accurate gene finding in DNA sequences require models of protein coding and non-coding regions derived either from experimentally validated training sets or from large amounts of anonymous DNA sequence. Here we propose a new, heuristic method producing fairly accurate inhomogeneous Markov models of protein coding regions. The new method needs such a small amount of DNA sequence data that the model can be built 'on the fly' by a web server for any DNA sequence >400 nt. Tests on 10 complete bacterial genomes performed with the GeneMark.hmm program demonstrated the ability of the new models to detect 93.1% of annotated genes on average, while models built by traditional training predict an average of 93.9% of genes. Models built by the heuristic approach could be used to find genes in small fragments of anonymous prokaryotic genomes and in genomes of organelles, viruses, phages and plasmids, as well as in highly inhomogeneous genomes where adjustment of models to local DNA composition is needed. The heuristic method also gives an insight into the mechanism of codon usage pattern evolution.