scispace - formally typeset
Search or ask a question

Showing papers in "Nucleic Acids Research in 2002"


Journal ArticleDOI
TL;DR: A simplified scoring system is proposed that performs well for reducing CPU time and increasing the accuracy of alignments even for sequences having large insertions or extensions as well as distantly related sequences of similar length.
Abstract: A multiple sequence alignment program, MAFFT, has been developed. The CPU time is drastically reduced as compared with existing methods. MAFFT includes two novel techniques. (i) Homologous regions are rapidly identified by the fast Fourier transform (FFT), in which an amino acid sequence is converted to a sequence composed of volume and polarity values of each amino acid residue. (ii) We propose a simplified scoring system that performs well for reducing CPU time and increasing the accuracy of alignments even for sequences having large insertions or extensions as well as distantly related sequences of similar length. Two different heuristics, the progressive method (FFT-NS-2) and the iterative refinement method (FFT-NS-i), are implemented in MAFFT. The performances of FFT-NS-2 and FFT-NS-i were compared with other methods by computer simulations and benchmark tests; the CPU time of FFT-NS-2 is drastically reduced as compared with CLUSTALW with comparable accuracy. FFT-NS-i is over 100 times faster than T-COFFEE, when the number of input sequences exceeds 60, without sacrificing the accuracy.

12,003 citations


Journal ArticleDOI
TL;DR: The Gene Expression Omnibus (GEO) project was initiated in response to the growing demand for a public repository for high-throughput gene expression data and provides a flexible and open design that facilitates submission, storage and retrieval of heterogeneous data sets from high-power gene expression and genomic hybridization experiments.
Abstract: The Gene Expression Omnibus (GEO) project was initiated in response to the growing demand for a public repository for high-throughput gene expression data. GEO provides a flexible and open design that facilitates submission, storage and retrieval of heterogeneous data sets from high-throughput gene expression and genomic hybridization experiments. GEO is not intended to replace in house gene expression databases that benefit from coherent data sets, and which are constructed to facilitate a particular analytic method, but rather complement these by acting as a tertiary, central data distribution hub. The three central data entities of GEO are platforms, samples and series, and were designed with gene expression and genomic hybridization experiments in mind. A platform is, essentially, a list of probes that define what set of molecules may be detected. A sample describes the set of molecules that are being probed and references a single platform used to generate its molecular abundance data. A series organizes samples into the meaningful data sets which make up an experiment. The GEO repository is publicly accessible through the World Wide Web at http://www.ncbi.nlm.nih.gov/geo.

10,968 citations


Journal ArticleDOI
TL;DR: Development and application of REST is explained, the usefulness of relative expression in real-time PCR using REST is discussed and the mathematical model used is based on the PCR efficiencies and the mean crossing point deviation between the sample and control group.
Abstract: Real-time reverse transcription followed by polymerase chain reaction (RT–PCR) is the most suitable method for the detection and quantification of mRNA. It offers high sensitivity, good reproducibility and a wide quantification range. Today, relative expression is increasingly used, where the expression of a target gene is standardised by a non-regulated reference gene. Several mathematical algorithms have been developed to compute an expression ratio, based on real-time PCR efficiency and the crossing point deviation of an unknown sample versus a control. But all published equations and available models for the calculation of relative expression ratio allow only for the determination of a single transcription difference between one control and one sample. Therefore a new software tool was established, named REST© (relative expression software tool), which compares two groups, with up to 16 data points in a sample and 16 in a control group, for reference and up to four target genes. The mathematical model used is based on the PCR efficiencies and the mean crossing point deviation between the sample and control group. Subsequently, the expression ratio results of the four investigated transcripts are tested for significance by a randomisation test. Herein, development and application of REST© is explained and the usefulness of relative expression in real-time PCR using REST© is discussed. The latest software version of REST© and examples for the correct use can be downloaded at http://www.wzw.tum.de/gene-quantification/.

7,196 citations


Journal ArticleDOI
TL;DR: New features have been implemented to search for plant cis-acting regulatory elements in a query sequence and links are now provided to a new clustering and motif search method to investigate clusters of co-expressed genes.
Abstract: PlantCARE is a database of plant cis-acting regulatory elements, enhancers and repressors. Regulatory elements are represented by positional matrices, consensus sequences and individual sites on particular promoter sequences. Links to the EMBL, TRANSFAC and MEDLINE databases are provided when available. Data about the transcription sites are extracted mainly from the literature, supplemented with an increasing number of in silico predicted data. Apart from a general description for specific transcription factor sites, levels of confidence for the experimental evidence, functional information and the position on the promoter are given as well. New features have been implemented to search for plant cis-acting regulatory elements in a query sequence. Furthermore, links are now provided to a new clustering and motif search method to investigate clusters of co-expressed genes. New regulatory elements can be sent automatically and will be added to the database after curation. The PlantCARE relational database is available via the World Wide Web at http://sphinx.rug.ac.be:8080/PlantCARE/.

4,184 citations


Journal ArticleDOI
TL;DR: This article proposes normalization methods that are based on robust local regression and account for intensity and spatial dependence in dye biases for different types of cDNA microarray experiments.
Abstract: There are many sources of systematic variation in cDNA microarray experiments which affect the measured gene expression levels (e.g. differences in labeling efficiency between the two fluorescent dyes). The term normalization refers to the process of removing such variation. A constant adjustment is often used to force the distribution of the intensity log ratios to have a median of zero for each slide. However, such global normalization approaches are not adequate in situations where dye biases can depend on spot overall intensity and/or spatial location within the array. This article proposes normalization methods that are based on robust local regression and account for intensity and spatial dependence in dye biases for different types of cDNA microarray experiments. The selection of appropriate controls for normalization is discussed and a novel set of controls (microarray sample pool, MSP) is introduced to aid in intensity-dependent normalization. Lastly, to allow for comparisons of expression levels across slides, a robust method based on maximum likelihood estimation is proposed to adjust for scale differences among slides.

3,605 citations


Journal ArticleDOI
TL;DR: This work presents a novel approach called TRIBE-MCL for rapid and accurate clustering of protein sequences into families based on precomputed sequence similarity information that has been rigorously tested and validated on a number of very large databases.
Abstract: Detection of protein families in large databases is one of the principal research objectives in structural and functional genomics. Protein family classification can significantly contribute to the delineation of functional diversity of homologous proteins, the prediction of function based on domain architecture or the presence of sequence motifs as well as comparative genomics, providing valuable evolutionary insights. We present a novel approach called TRIBE-MCL for rapid and accurate clustering of protein sequences into families. The method relies on the Markov cluster (MCL) algorithm for the assignment of proteins into families based on precomputed sequence similarity information. This novel approach does not suffer from the problems that normally hinder other protein sequence clustering algorithms, such as the presence of multi-domain proteins, promiscuous domains and fragmented proteins. The method has been rigorously tested and validated on a number of very large databases, including SwissProt, InterPro, SCOP and the draft human genome. Our results indicate that the method is ideally suited to the rapid and accurate detection of protein families on a large scale. The method has been used to detect and categorise protein families within the draft human genome and the resulting families have been used to annotate a large proportion of human proteins.

3,468 citations


Journal ArticleDOI
TL;DR: A primer pair is presented that eliminates the problem of presumed impossible to measure telomeres in vertebrate DNA by PCR amplification with oligonucleotide primers designed to hybridize to the TTAGGG and CCCTAA repeats, allowing simple and rapid measurement of telomere length in a closed tube, fluorescence-based assay.
Abstract: It has long been presumed impossible to measure telomeres in vertebrate DNA by PCR amplification with oligonucleotide primers designed to hybridize to the TTAGGG and CCCTAA repeats, because only primer dimer-derived products are expected. Here we present a primer pair that eliminates this problem, allowing simple and rapid measurement of telomeres in a closed tube, fluorescence-based assay. This assay will facilitate investigations of the biology of telomeres and the roles they play in the molecular pathophysiology of diseases and aging.

3,014 citations


Journal ArticleDOI
TL;DR: Online Mendelian Inheritance in Man (OMIM) is a comprehensive, authoritative and timely knowledgebase of human genes and genetic disorders compiled to support research and education in human genomics and the practice of clinical genetics.
Abstract: Online Mendelian Inheritance in Man (OMIM) is a comprehensive, authoritative and timely knowledgebase of human genes and genetic disorders compiled to support human genetics research and education and the practice of clinical genetics. Started by Dr Victor A. McKusick as the definitive reference Mendelian Inheritance in Man, OMIM (http://www.ncbi.nlm.nih.gov/omim/) is now distributed electronically by the National Center for Biotechnology Information, where it is integrated with the Entrez suite of databases. Derived from the biomedical literature, OMIM is written and edited at Johns Hopkins University with input from scientists and physicians around the world. Each OMIM entry has a full-text summary of a genetically determined phenotype and/or gene and has numerous links to other genetic databases such as DNA and protein sequence, PubMed references, general and locus-specific mutation databases, HUGO nomenclature, MapViewer, GeneTests, patient support groups and many others. OMIM is an easy and straightforward portal to the burgeoning information in human genetics.

2,715 citations


Journal ArticleDOI
TL;DR: A new method for relative quantification of 40 different DNA sequences in an easy to perform reaction requiring only 20 ng of human DNA is described.
Abstract: We describe a new method for relative quantification of 40 different DNA sequences in an easy to perform reaction requiring only 20 ng of human DNA. Applications shown of this multiplex ligation-dependent probe amplification (MLPA) technique include the detection of exon deletions and duplications in the human BRCA1, MSH2 and MLH1 genes, detection of trisomies such as Down’s syndrome, characterisation of chromosomal aberrations in cell lines and tumour samples and SNP/mutation detection. Relative quantification of mRNAs by MLPA will be described elsewhere. In MLPA, not sample nucleic acids but probes added to the samples are amplified and quantified. Amplification of probes by PCR depends on the presence of probe target sequences in the sample. Each probe consists of two oligonucleotides, one synthetic and one M13 derived, that hybridise to adjacent sites of the target sequence. Such hybridised probe oligonucleotides are ligated, permitting subsequent amplification. All ligated probes have identical end sequences, permitting simultaneous PCR amplification using only one primer pair. Each probe gives rise to an amplification product of unique size between 130 and 480 bp. Probe target sequences are small (50–70 nt). The prerequisite of a ligation reaction provides the opportunity to discriminate single nucleotide differences.

2,675 citations


Journal ArticleDOI
TL;DR: A World Wide Web server is presented to predict the effect of an nsSNP on protein structure and function and the dependence of selective pressure on the structural and functional properties of proteins is studied.
Abstract: Human single nucleotide polymorphisms (SNPs) represent the most frequent type of human population DNA variation. One of the main goals of SNP research is to understand the genetics of the human phenotype variation and especially the genetic basis of human complex diseases. Non-synonymous coding SNPs (nsSNPs) comprise a group of SNPs that, together with SNPs in regulatory regions, are believed to have the highest impact on phenotype. Here we present a World Wide Web server to predict the effect of an nsSNP on protein structure and function. The prediction method enabled analysis of the publicly available SNP database HGVbase, which gave rise to a dataset of nsSNPs with predicted functionality. The dataset was further used to compare the effect of various structural and functional characteristics of amino acid substitutions responsible for phenotypic display of nsSNPs. We also studied the dependence of selective pressure on the structural and functional properties of proteins. We found that in our dataset the selection pressure against deleterious SNPs depends on the molecular function of the protein, although it is insensitive to several other protein features considered. The strongest selective pressure was detected for proteins involved in transcription regulation.

2,276 citations


Journal ArticleDOI
TL;DR: The Database of Interacting Proteins (DIP) is a database that documents experimentally determined protein-protein interactions and provides the scientific community with an integrated set of tools for browsing and extracting information about protein interaction networks.
Abstract: The Database of Interacting Proteins (DIP: http://dip.doe-mbi.ucla.edu) is a database that documents experimentally determined protein-protein interactions. It provides the scientific community with an integrated set of tools for browsing and extracting information about protein interaction networks. As of September 2001, the DIP catalogs approximately 11 000 unique interactions among 5900 proteins from >80 organisms; the vast majority from yeast, Helicobacter pylori and human. Tools have been developed that allow users to analyze, visualize and integrate their own experimental data with the information about protein-protein interactions available in the DIP database.

Journal ArticleDOI
TL;DR: The Ensembl database project provides a bioinformatics framework to organise biology around the sequences of large genomes and is a comprehensive source of stable automatic annotation of the human genome sequence, with confirmed gene predictions that have been integrated with external data sources.
Abstract: The Ensembl (http://www.ensembl.org/) database project provides a bioinformatics framework to organise biology around the sequences of large genomes. It is a comprehensive source of stable automatic annotation of the human genome sequence, with confirmed gene predictions that have been integrated with external data sources, and is available as either an interactive web site or as flat files. It is also an open source software engineering project to develop a portable system able to handle very large genomes and associated requirements from sequence analysis to data storage and visualisation. The Ensembl site is one of the leading sources of human genome sequence annotation and provided much of the analysis for publication by the international human genome project of the draft genome. The Ensembl system is being installed around the world in both companies and academic sites on machines ranging from supercomputers to laptops.

Journal ArticleDOI
TL;DR: The PROSITE database consists of biologically significant patterns and profiles designed in such a way that with appropriate computational tools it can rapidly and reliably help to determine to which known family of proteins (if any) a new sequence belongs, or which known domain(s) it contains.
Abstract: PROSITE [Bairoch and Bucher (1994) Nucleic Acids Res., 22, 3583-3589; Hofmann et al. (1999) Nucleic Acids Res., 27, 215-219] is a method of identifying the functions of uncharacterized proteins translated from genomic or cDNA sequences. The PROSITE database (http://www.expasy.org/prosite/) consists of biologically significant patterns and profiles designed in such a way that with appropriate computational tools it can rapidly and reliably help to determine to which known family of proteins (if any) a new sequence belongs, or which known domain(s) it contains.

Journal ArticleDOI
TL;DR: The Kyoto Encyclopedia of Genes and Genomes (KEGG) is the primary database resource of the Japanese GenomeNet service for understanding higher order functional meanings and utilities of the cell or the organism from its genome information.
Abstract: The Kyoto Encyclopedia of Genes and Genomes (KEGG) is the primary database resource of the Japanese GenomeNet service (http://www.genome.ad.jp/) for understanding higher order functional meanings and utilities of the cell or the organism from its genome information. KEGG consists of the PATHWAY database for the computerized knowledge on molecular interaction networks such as pathways and complexes, the GENES database for the information about genes and proteins generated by genome sequencing projects, and the LIGAND database for the information about chemical compounds and chemical reactions that are relevant to cellular processes. In addition to these three main databases, limited amounts of experimental data for microarray gene expression profiles and yeast two-hybrid systems are stored in the EXPRESSION and BRITE databases, respectively. Furthermore, a new database, named SSDB, is available for exploring the universe of all protein coding genes in the complete genomes and for identifying functional links and ortholog groups. The data objects in the KEGG databases are all represented as graphs and various computational methods are developed to detect graph features that can be related to biological functions. For example, the correlated clusters are graph similarities which can be used to predict a set of genes coding for a pathway or a complex, as summarized in the ortholog group tables, and the cliques in the SSDB graph are used to annotate genes. The KEGG databases are updated daily and made freely available (http://www.genome.ad.jp/kegg/).

Journal ArticleDOI
TL;DR: The background, advantages and limitations of real-time PCR are described, the literature as it applies to virus detection in the routine and research laboratory is reviewed and the technology discussed has been applied to other areas of microbiology as well as studies of gene expression and genetic disease.
Abstract: The use of the polymerase chain reaction (PCR) in molecular diagnostics has increased to the point where it is now accepted as the gold standard for detecting nucleic acids from a number of origins and it has become an essential tool in the research laboratory. Real-time PCR has engendered wider acceptance of the PCR due to its improved rapidity, sensitivity, reproducibility and the reduced risk of carry-over contamination. There are currently five main chemistries used for the detection of PCR product during real-time PCR. These are the DNA binding fluorophores, the 5' endonuclease, adjacent linear and hairpin oligoprobes and the self-fluorescing amplicons, which are described in detail. We also discuss factors that have restricted the development of multiplex real-time PCR as well as the role of real-time PCR in quantitating nucleic acids. Both amplification hardware and the fluorogenic detection chemistries have evolved rapidly as the understanding of real-time PCR has developed and this review aims to update the scientist on the current state of the art. We describe the background, advantages and limitations of real-time PCR and we review the literature as it applies to virus detection in the routine and research laboratory in order to focus on one of the many areas in which the application of real-time PCR has provided significant methodological benefits and improved patient outcomes. However, the technology discussed has been applied to other areas of microbiology as well as studies of gene expression and genetic disease.

Journal ArticleDOI
TL;DR: A scalable transfection procedure using polyethylenimine (PEI) is described for the human embryonic kidney 293 cell line grown in suspension and 10- and 3-fold increases in SEAP expression was obtained in 293E cells compared with pcDNA3.1 and pCEP4 vectors.
Abstract: A scalable transfection procedure using polyethylenimine (PEI) is described for the human embryonic kidney 293 cell line grown in suspension. Green fluorescent protein (GFP) and human placental secreted alkaline phosphatase (SEAP) were used as reporter genes to monitor transfection efficiency and productivity. Up to 75% of GFP-positive cells were obtained using linear or branched 25 kDa PEI. The 293 cell line and two genetic variants, either expressing the SV40 large T-antigen (293T) or the Epstein–Barr virus (EBV) EBNA1 protein (293E), were tested for protein expression. The highest expression level was obtained with 293E cells using the EBV oriP-containing plasmid pCEP4. We designed the pTT vector, an oriP-based vector having an improved cytomegalovirus expression cassette. Using this vector, 10- and 3-fold increases in SEAP expression was obtained in 293E cells compared with pcDNA3.1 and pCEP4 vectors, respectively. The presence of serum had a positive effect on gene transfer and expression. Transfection of suspension-growing cells was more efficient with linear PEI and was not affected by the presence of medium conditioned for 24 h. Using the pTT vector, >20 mg/l of purified Histagged SEAP was recovered from a 3.5 l bioreactor. Intracellular proteins were also produced at levels as high as 50 mg/l, representing up to 20% of total cell proteins.

Journal ArticleDOI
TL;DR: An additional set of four completely heterologous loxP-flanked marker cassettes carrying the genes URA3 and LEU2 from Kluyveromyces lactis, his5(+) from Schizosaccharomyces pombe and the dominant resistance marker ble(r) from the bacterial transposon Tn5, which confers resistance to the antibiotic phleomycin are described.
Abstract: Heterologous markers are important tools required for the molecular dissection of gene function in many organisms, including Saccharomyces cerevisiae. Moreover, the presence of gene families and isoenzymes often makes it necessary to delete more than one gene. We recently introduced a new and efficient gene disruption cassette for repeated use in budding yeast, which combines the heterologous dominant kanr resistance marker with a Cre/loxP-mediated marker removal procedure. Here we describe an additional set of four completely heterologous loxP-flanked marker cassettes carrying the genes URA3 and LEU2 from Kluyveromyces lactis, his5+ from Schizosaccharomyces pombe and the dominant resistance marker bler from the bacterial transposon Tn5, which confers resistance to the antibiotic phleomycin. All five loxP–marker gene–loxP gene disruption cassettes can be generated using the same pair of oligonucleotides and all can be used for gene disruption with high efficiency. For marker rescue we have created three additional Cre expression vectors carrying HIS3, TRP1 or bler as the yeast selection marker. The set of disruption cassettes and Cre expression plasmids described here represents a significant further development of the marker rescue system, which is ideally suited to functional analysis of the yeast genome.

Journal ArticleDOI
TL;DR: The silencing effect was transient, with the level of mRNA recovering fully within 4-5 days, suggesting absence of a propagative system for RNAi in humans and the depletion rate-dependent appearance of 3' mRNA cleavage fragments argues for the existence of a two-step mRNA degradation mechanism.
Abstract: Chemically synthesised 21-23 bp double-stranded short interfering RNAs (siRNA) can induce sequence-specific post-transcriptional gene silencing, in a process termed RNA interference (RNAi). In the present study, several siRNAs synthesised against different sites on the same target mRNA (human Tissue Factor) demonstrated striking differences in silencing efficiency. Only a few of the siRNAs resulted in a significant reduction in expression, suggesting that accessible siRNA target sites may be rare in some human mRNAs. Blocking of the 3'-OH with FITC did not reduce the effect on target mRNA. Mutations in the siRNAs relative to target mRNA sequence gradually reduced, but did not abolish mRNA depletion. Inactive siRNAs competed reversibly with active siRNAs in a sequence-independent manner. Several lines of evidence suggest the existence of a near equilibrium kinetic balance between mRNA production and siRNA-mediated mRNA depletion. The silencing effect was transient, with the level of mRNA recovering fully within 4-5 days, suggesting absence of a propagative system for RNAi in humans. Finally, we observed 3' mRNA cleavage fragments resulting from the action of the most effective siRNAs. The depletion rate-dependent appearance of these fragments argues for the existence of a two-step mRNA degradation mechanism.

Journal ArticleDOI
TL;DR: MUMmer as discussed by the authors is a suffix-tree algorithm that can align the entire genome sequences of eukaryotic and prokaryotic organisms with minimal use of computer time and memory.
Abstract: We describe a suffix-tree algorithm that can align the entire genome sequences of eukaryotic and prokaryotic organisms with minimal use of computer time and memory. The new system, MUMmer 2, runs three times faster while using one-third as much memory as the original MUMmer system. It has been used successfully to align the entire human and mouse genomes to each other, and to align numerous smaller eukaryotic and prokaryotic genomes. A new module permits the alignment of multiple DNA sequence fragments, which has proven valuable in the comparison of incomplete genome sequences. We also describe a method to align more distantly related genomes by detecting protein sequence homology. This extension to MUMmer aligns two genomes after translating the sequence in all six reading frames, extracts all matching protein sequences and then clusters together matches. This method has been applied to both incomplete and complete genome sequences in order to detect regions of conserved synteny, in which multiple proteins from one organism are found in the same order and orientation in another. The system code is being made freely available by the authors.

Journal ArticleDOI
TL;DR: This paper presents the 4 x 4 'isostericity matrices' summarizing the geometric relationships between the 16 pairwise combinations of the four standard bases, A, C, G and U, and helps identify isosteric pairs that co-vary or interchange in sequences of homologous molecules while maintaining conserved three-dimensional motifs.
Abstract: RNA molecules exhibit complex structures in which a large fraction of the bases engage in non-Watson-Crick base pairing, forming motifs that mediate long-range RNA-RNA interactions and create binding sites for proteins and small molecule ligands. The rapidly growing number of three-dimensional RNA structures at atomic resolution requires that databases contain the annotation of such base pairs. An unambiguous and descriptive nomenclature was proposed recently in which RNA base pairs were classified by the base edges participating in the interaction (Watson-Crick, Hoogsteen/CH or sugar edge) and the orientation of the glycosidic bonds relative to the hydrogen bonds (cis or trans). Twelve basic geometric families were identified and all 12 have been observed in crystal structures. For each base pairing family, we present here the 4 x 4 'isostericity matrices' summarizing the geometric relationships between the 16 pairwise combinations of the four standard bases, A, C, G and U. Whenever available, a representative example of each observed base pair from X-ray crystal structures (3.0 A resolution or better) is provided or, otherwise, theoretically plausible models. This format makes apparent the recurrent geometric patterns that are observed and helps identify isosteric pairs that co-vary or interchange in sequences of homologous molecules while maintaining conserved three-dimensional motifs.

Journal ArticleDOI
TL;DR: The SMART database now contains information on intrinsic sequence features such as transmembrane regions, coiled-coils, signal peptides and internal repeats and new advanced queries provide direct access to the SMART relational database using SQL.
Abstract: SMART (Simple Modular Architecture Research Tool, http://smart.embl-heidelberg.de) is a web-based resource used for the annotation of protein domains and the analysis of domain architectures, with particular emphasis on mobile eukaryotic domains. Extensive annotation for each domain family is available, providing information relating to function, subcellular localization, phyletic distribution and tertiary structure. The January 2002 release has added more than 200 hand-curated domain models. This brings the total to over 600 domain families that are widely represented among nuclear, signalling and extracellular proteins. Annotation now includes links to the Online Mendelian Inheritance in Man (OMIM) database in cases where a human disease is associated with one or more mutations in a particular domain. We have implemented new analysis methods and updated others. New advanced queries provide direct access to the SMART relational database using SQL. This database now contains information on intrinsic sequence features such as transmembrane regions, coiled-coils, signal peptides and internal repeats. SMART output can now be easily included in users' documents. A SMART mirror has been created at http://smart.ox.ac.uk.

Journal ArticleDOI
TL;DR: The substantial diversification of HATs and HDACs that has occurred since the divergence of plants, animals and fungi suggests a surprising degree of evolutionary plasticity and functional diversification in these core chromatin components.
Abstract: Sequence similarity and profile searching tools were used to analyze the genome sequences of Arabidopsis thaliana, Saccharomyces cerevisiae, Schizosaccharomyces pombe, Caenorhabditis elegans and Drosophila melanogaster for genes encoding three families of histone deacetylase (HDAC) proteins and three families of histone acetyltransferase (HAT) proteins. Plants, animals and fungi were found to have a single member of each of three subfamilies of the GNAT family of HATs, suggesting conservation of these functions. However, major differences were found with respect to sizes of gene families and multi-domain protein structures within other families of HATs and HDACs, indicating substantial evolutionary diversification. Phylogenetic analysis identified a new class of HDACs within the RPD3/HDA1 family that is represented only in plants and animals. A similar analysis of the plant-specific HD2 family of HDACs suggests a duplication event early in dicot evolution, followed by further diversification in the lineage leading to Arabidopsis. Of three major classes of SIR2-type HDACs that are found in animals, fungi have representatives only in one class, whereas plants have representatives only in the other two. Plants possess five CREB-binding protein (CBP)-type HATs compared with one to two in animals and none in fungi. Domain and phylogenetic analyses of the CBP family proteins showed that this family has evolved three distinct types of CBPs in plants. The domain architecture of CBP and TAFII250 families of HATs show significant differences between plants and animals, most notably with respect to bromodomain occurrence and their number. Bromodomain-containing proteins in Arabidopsis differ strikingly from animal bromodomain proteins with respect to the numbers of bromodomains and the other types of domains that are present. The substantial diversification of HATs and HDACs that has occurred since the divergence of plants, animals and fungi suggests a surprising degree of evolutionary plasticity and functional diversification in these core chromatin components.

Journal ArticleDOI
TL;DR: The Conserved Domain Database (CDD), a compilation of multiple sequence alignments representing protein domains conserved in molecular evolution, has been populated with alignment data from the public collections Pfam and SMART, as well as with contributions from colleagues at NCBI.
Abstract: The Conserved Domain Database (CDD) is a compilation of multiple sequence alignments representing protein domains conserved in molecular evolution. It has been populated with alignment data from the public collections Pfam and SMART, as well as with contributions from colleagues at NCBI. The current version of CDD (v.1.54) contains 3693 such models. CDD alignments are linked to protein sequence and structure data in Entrez. The molecular structure viewer Cn3D serves as a tool to interactively visualize alignments and three-dimensional structure, and to link three-dimensional residue coordinates to descriptions of evolutionary conservation. CDD can be accessed on the World Wide Web at http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml. Protein query sequences may be compared against databases of position-specific score matrices derived from alignments in CDD, using a service named CD-Search, which can be found at http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi. CD-Search runs reverse-position-specific BLAST (RPS-BLAST), a variant of the widely used PSI-BLAST algorithm. CD-Search is run by default for protein–protein queries submitted to NCBI’s BLAST service at http://www.ncbi.nlm.nih.gov/BLAST.

Journal ArticleDOI
TL;DR: The tendency of the fluorophore and the quencher to bind to each other has a strong influence on quenching efficiency, and the availability of these measurements should facilitate the design of oligonucleotide probes that contain interactive fluorophores and quenchers.
Abstract: An important consideration in the design of oligonucleotide probes for homogeneous hybridization assays is the efficiency of energy transfer between the fluorophore and quencher used to label the probes. We have determined the efficiency of energy transfer for a large number of combinations of commonly used fluorophores and quenchers. We have also measured the quenching effect of nucleotides on the fluorescence of each fluorophore. Quenching efficiencies were measured for both the resonance energy transfer and the static modes of quenching. We found that, in addition to their photochemical characteristics, the tendency of the fluorophore and the quencher to bind to each other has a strong influence on quenching efficiency. The availability of these measurements should facilitate the design of oligonucleotide probes that contain interactive fluorophores and quenchers, including competitive hybridization probes, adjacent probes, TaqMan probes and molecular beacons.

PatentDOI
TL;DR: In this article, a method for purification and synthesis of RNA molecules and enzymatic RNA molecules in enzymatically active form is presented, and the method is used to synthesize RNA molecules.
Abstract: Method for purification and synthesis of RNA molecules and enzymatic RNA molecules in enzymatically active form.

Journal ArticleDOI
TL;DR: These chimeric LNA/DNA oligonucleotides are more stable than isosequential phosphorothioates and 2'-O-methyl gapmers, which have half-lives of 10 and 12 h, respectively.
Abstract: The design of antisense oligonucleotides containing locked nucleic acids (LNA) was optimized and compared to intensively studied DNA oligonucleotides, phosphorothioates and 2'-O-methyl gapmers. In contradiction to the literature, a stretch of seven or eight DNA monomers in the center of a chimeric DNA/LNA oligonucleotide is necessary for full activation of RNase H to cleave the target RNA. For 2'-O-methyl gapmers a stretch of six DNA monomers is sufficient to recruit RNase H. Compared to the 18mer DNA the oligonucleotides containing LNA have an increased melting temperature of 1.5-4 degrees C per LNA depending on the positions of the modified residues. 2'-O-methyl nucleotides increase the T(m) by only 2'-O-methyl > DNA > phosphorothioate. Three LNAs at each end of the oligonucleotide are sufficient to stabilize the oligonucleotide in human serum 10-fold compared to an unmodified oligodeoxynucleotide (from t(1/2) = approximately 1.5 h to t(1/2) = approximately 15 h). These chimeric LNA/DNA oligonucleotides are more stable than isosequential phosphorothioates and 2'-O-methyl gapmers, which have half-lives of 10 and 12 h, respectively.

Journal ArticleDOI
TL;DR: The approach presented here simplifies the production of proteins from a wide variety of organisms for genomics-based studies and automates the design of oligonucleotides for gene synthesis.
Abstract: The availability of sequences of entire genomes has dramatically increased the number of protein targets, many of which will need to be overexpressed in cells other than the original source of DNA Gene synthesis often provides a fast and economically efficient approach The synthetic gene can be optimized for expression and constructed for easy mutational manipulation without regard to the parent genome Yet design and construction of synthetic genes, especially those coding for large proteins, can be a slow, difficult and confusing process We have written a computer program that automates the design of oligonucleotides for gene synthesis Our program requires simple input information, ie amino acid sequence of the target protein and melting temperature (needed for the gene assembly) of synthetic oligonucleotides The program outputs a series of oligonucleotide sequences with codons optimized for expression in an organism of choice Those oligonucleotides are characterized by highly homogeneous melting temperatures and a minimized tendency for hairpin formation With the help of this program and a two-step PCR method, we have successfully constructed numerous synthetic genes, ranging from 139 to 1042 bp The approach presented here simplifies the production of proteins from a wide variety of organisms for genomics-based studies

Journal ArticleDOI
TL;DR: The Therapeutic Target Database (TTD) is designed to provide information about the known therapeutic protein and nucleic acid targets described in the literature, the targeted disease conditions, the pathway information and the corresponding drugs/ligands directed at each of these targets.
Abstract: A number of proteins and nucleic acids have been explored as therapeutic targets. These targets are subjects of interest in different areas of biomedical and pharmaceutical research and in the development and evaluation of bioinformatics, molecular modeling, computer-aided drug design and analytical tools. A publicly accessible database that provides comprehensive information about these targets is therefore helpful to the relevant communities. The Therapeutic Target Database (TTD) is designed to provide information about the known therapeutic protein and nucleic acid targets described in the literature, the targeted disease conditions, the pathway information and the corresponding drugs/ligands directed at each of these targets. Cross-links to other databases are also introduced to facilitate the access of information about the sequence, 3D structure, function, nomenclature, drug/ligand binding properties, drug usage and effects, and related literature for each target. This database can be accessed at http://xin.cz3.nus.edu.sg/group/ttd/ttd.asp and it currently contains entries for 433 targets covering 125 disease conditions along with 809 drugs/ligands directed at each of these targets. Each entry can be retrieved through multiple methods including target name, disease name, drug/ligand name, drug/ligand function and drug therapeutic classification.

Journal ArticleDOI
TL;DR: Following calibration, fold-change measurements generated by custom cDNA arrays were more accurate than those obtained by commercial oligonucleotide arrays.
Abstract: We compared the accuracy of microarray measurements obtained with oligonucleotide arrays (GeneChip, Affymetrix) with a laboratory-developed cDNA array by assaying test RNA samples from an experiment using a paradigm known to regulate many genes measured on both arrays. We selected 47 genes represented on both arrays, including both known regulated and unregulated transcripts, and established reference relative expression measurements for these genes in the test RNA samples using quantitative reverse transcriptase real-time PCR (QRTPCR) assays. The validity of the reproducible (average coefficient of variation = 11.8%) QRTPCR measurements were established through application of a new mathematical model. The performance of both array platforms in identifying regulated and non-regulated genes was identical. With either platform, 16 of 17 definitely regulated genes were correctly identified, and no definitely unregulated transcript was falsely identified as regulated. Accuracy of the fold-change measurements obtained with each platform was assessed by determining measurement bias. Both platforms consistently underestimate the relative changes in mRNA expression between experimental and control samples. The bias observed with cDNA arrays was predictable for fold-changes <250-fold by QRTPCR and could be corrected by the calibration function F(c) = F(a(cDNA))(q), where F(a(cDNA)) is the microarray-determined fold-change comparing experimental with control samples, q is the correction factor and F(c) is the calibrated value. The bias observed with the commercial oligonucleotide arrays was less predictable and calibration was unfeasible. Following calibration, fold-change measurements generated by custom cDNA arrays were more accurate than those obtained by commercial oligonucleotide arrays. Our study demonstrates systematic bias of microarray measurements and identifies a calibration function that improves the accuracy of cDNA array data.

Journal ArticleDOI
TL;DR: EcoCyc is an organism-specific pathway/genome database that describes the metabolic and signal-transduction pathways of Escherichia coli, its enzymes, its transport proteins and its mechanisms of transcriptional control of gene expression.
Abstract: EcoCyc is an organism-specific pathway/genome database that describes the metabolic and signal-transduction pathways of Escherichia coli, its enzymes, its transport proteins and its mechanisms of transcriptional control of gene expression. EcoCyc is queried using the Pathway Tools graphical user interface, which provides a wide variety of query operations and visualization tools. EcoCyc is available at http://ecocyc.org/.