Showing papers in "Nucleic Acids Research in 2002"

PDF

Open Access

Journal Article•DOI•

MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform

[...]

Kazutaka Katoh¹, Kazuharu Misawa, Kei-ichi Kuma¹, Takashi Miyata¹•Institutions (1)

15 Jul 2002-Nucleic Acids Research

TL;DR: A simplified scoring system is proposed that performs well for reducing CPU time and increasing the accuracy of alignments even for sequences having large insertions or extensions as well as distantly related sequences of similar length.

...read moreread less

Abstract: A multiple sequence alignment program, MAFFT, has been developed. The CPU time is drastically reduced as compared with existing methods. MAFFT includes two novel techniques. (i) Homologous regions are rapidly identified by the fast Fourier transform (FFT), in which an amino acid sequence is converted to a sequence composed of volume and polarity values of each amino acid residue. (ii) We propose a simplified scoring system that performs well for reducing CPU time and increasing the accuracy of alignments even for sequences having large insertions or extensions as well as distantly related sequences of similar length. Two different heuristics, the progressive method (FFT-NS-2) and the iterative refinement method (FFT-NS-i), are implemented in MAFFT. The performances of FFT-NS-2 and FFT-NS-i were compared with other methods by computer simulations and benchmark tests; the CPU time of FFT-NS-2 is drastically reduced as compared with CLUSTALW with comparable accuracy. FFT-NS-i is over 100 times faster than T-COFFEE, when the number of input sequences exceeds 60, without sacrificing the accuracy.

...read moreread less

12,003 citations

Journal Article•DOI•

Gene Expression Omnibus: NCBI gene expression and hybridization array data repository

[...]

Ron Edgar¹, Michael Domrachev¹, Alex E. Lash¹•Institutions (1)

National Institutes of Health¹

01 Jan 2002-Nucleic Acids Research

TL;DR: The Gene Expression Omnibus (GEO) project was initiated in response to the growing demand for a public repository for high-throughput gene expression data and provides a flexible and open design that facilitates submission, storage and retrieval of heterogeneous data sets from high-power gene expression and genomic hybridization experiments.

...read moreread less

Abstract: The Gene Expression Omnibus (GEO) project was initiated in response to the growing demand for a public repository for high-throughput gene expression data. GEO provides a flexible and open design that facilitates submission, storage and retrieval of heterogeneous data sets from high-throughput gene expression and genomic hybridization experiments. GEO is not intended to replace in house gene expression databases that benefit from coherent data sets, and which are constructed to facilitate a particular analytic method, but rather complement these by acting as a tertiary, central data distribution hub. The three central data entities of GEO are platforms, samples and series, and were designed with gene expression and genomic hybridization experiments in mind. A platform is, essentially, a list of probes that define what set of molecules may be detected. A sample describes the set of molecules that are being probed and references a single platform used to generate its molecular abundance data. A series organizes samples into the meaningful data sets which make up an experiment. The GEO repository is publicly accessible through the World Wide Web at http://www.ncbi.nlm.nih.gov/geo.

...read moreread less

10,968 citations

Journal Article•DOI•

Relative expression software tool (REST©) for group-wise comparison and statistical analysis of relative expression results in real-time PCR

[...]

Michael W. Pfaffl¹, Graham W. Horgan, Leo Dempfle•Institutions (1)

Technische Universität München¹

01 May 2002-Nucleic Acids Research

TL;DR: Development and application of REST is explained, the usefulness of relative expression in real-time PCR using REST is discussed and the mathematical model used is based on the PCR efficiencies and the mean crossing point deviation between the sample and control group.

...read moreread less

Abstract: Real-time reverse transcription followed by polymerase chain reaction (RT–PCR) is the most suitable method for the detection and quantification of mRNA. It offers high sensitivity, good reproducibility and a wide quantification range. Today, relative expression is increasingly used, where the expression of a target gene is standardised by a non-regulated reference gene. Several mathematical algorithms have been developed to compute an expression ratio, based on real-time PCR efficiency and the crossing point deviation of an unknown sample versus a control. But all published equations and available models for the calculation of relative expression ratio allow only for the determination of a single transcription difference between one control and one sample. Therefore a new software tool was established, named REST© (relative expression software tool), which compares two groups, with up to 16 data points in a sample and 16 in a control group, for reference and up to four target genes. The mathematical model used is based on the PCR efficiencies and the mean crossing point deviation between the sample and control group. Subsequently, the expression ratio results of the four investigated transcripts are tested for significance by a randomisation test. Herein, development and application of REST© is explained and the usefulness of relative expression in real-time PCR using REST© is discussed. The latest software version of REST© and examples for the correct use can be downloaded at http://www.wzw.tum.de/gene-quantification/.

...read moreread less

7,196 citations

Journal Article•DOI•

PlantCARE, a database of plant cis-acting regulatory elements and a portal to tools for in silico analysis of promoter sequences

[...]

Magali Lescot¹, Patrice Dehais, Gert Thijs, Kathleen Marchal, Yves Moreau, Yves Van de Peer, Pierre Rouzé, Stephane Rombauts - Show less +4 more•Institutions (1)

Ghent University¹

01 Jan 2002-Nucleic Acids Research

TL;DR: New features have been implemented to search for plant cis-acting regulatory elements in a query sequence and links are now provided to a new clustering and motif search method to investigate clusters of co-expressed genes.

...read moreread less

Abstract: PlantCARE is a database of plant cis-acting regulatory elements, enhancers and repressors. Regulatory elements are represented by positional matrices, consensus sequences and individual sites on particular promoter sequences. Links to the EMBL, TRANSFAC and MEDLINE databases are provided when available. Data about the transcription sites are extracted mainly from the literature, supplemented with an increasing number of in silico predicted data. Apart from a general description for specific transcription factor sites, levels of confidence for the experimental evidence, functional information and the position on the promoter are given as well. New features have been implemented to search for plant cis-acting regulatory elements in a query sequence. Furthermore, links are now provided to a new clustering and motif search method to investigate clusters of co-expressed genes. New regulatory elements can be sent automatically and will be added to the database after curation. The PlantCARE relational database is available via the World Wide Web at http://sphinx.rug.ac.be:8080/PlantCARE/.

...read moreread less

4,184 citations

Journal Article•DOI•

Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation

[...]

Yee Hwa Yang¹, Sandrine Dudoit, Percy Luu, David M. Lin, Vivian Peng, John Ngai, Terence P. Speed - Show less +3 more•Institutions (1)

Helen Wills Neuroscience Institute¹

15 Feb 2002-Nucleic Acids Research

TL;DR: This article proposes normalization methods that are based on robust local regression and account for intensity and spatial dependence in dye biases for different types of cDNA microarray experiments.

...read moreread less

Abstract: There are many sources of systematic variation in cDNA microarray experiments which affect the measured gene expression levels (e.g. differences in labeling efficiency between the two fluorescent dyes). The term normalization refers to the process of removing such variation. A constant adjustment is often used to force the distribution of the intensity log ratios to have a median of zero for each slide. However, such global normalization approaches are not adequate in situations where dye biases can depend on spot overall intensity and/or spatial location within the array. This article proposes normalization methods that are based on robust local regression and account for intensity and spatial dependence in dye biases for different types of cDNA microarray experiments. The selection of appropriate controls for normalization is discussed and a novel set of controls (microarray sample pool, MSP) is introduced to aid in intensity-dependent normalization. Lastly, to allow for comparisons of expression levels across slides, a robust method based on maximum likelihood estimation is proposed to adjust for scale differences among slides.

...read moreread less

3,605 citations

Journal Article•DOI•

An efficient algorithm for large-scale detection of protein families

[...]

Anton J. Enright¹, S. Van Dongen, Christos A. Ouzounis•Institutions (1)

European Bioinformatics Institute¹

01 Apr 2002-Nucleic Acids Research

TL;DR: This work presents a novel approach called TRIBE-MCL for rapid and accurate clustering of protein sequences into families based on precomputed sequence similarity information that has been rigorously tested and validated on a number of very large databases.

...read moreread less

Abstract: Detection of protein families in large databases is one of the principal research objectives in structural and functional genomics. Protein family classification can significantly contribute to the delineation of functional diversity of homologous proteins, the prediction of function based on domain architecture or the presence of sequence motifs as well as comparative genomics, providing valuable evolutionary insights. We present a novel approach called TRIBE-MCL for rapid and accurate clustering of protein sequences into families. The method relies on the Markov cluster (MCL) algorithm for the assignment of proteins into families based on precomputed sequence similarity information. This novel approach does not suffer from the problems that normally hinder other protein sequence clustering algorithms, such as the presence of multi-domain proteins, promiscuous domains and fragmented proteins. The method has been rigorously tested and validated on a number of very large databases, including SwissProt, InterPro, SCOP and the draft human genome. Our results indicate that the method is ideally suited to the rapid and accurate detection of protein families on a large scale. The method has been used to detect and categorise protein families within the draft human genome and the resulting families have been used to annotate a large proportion of human proteins.

...read moreread less

3,468 citations

Journal Article•DOI•

Telomere measurement by quantitative PCR

[...]

Richard M. Cawthon¹•Institutions (1)

University of Utah¹

15 May 2002-Nucleic Acids Research

TL;DR: A primer pair is presented that eliminates the problem of presumed impossible to measure telomeres in vertebrate DNA by PCR amplification with oligonucleotide primers designed to hybridize to the TTAGGG and CCCTAA repeats, allowing simple and rapid measurement of telomere length in a closed tube, fluorescence-based assay.

...read moreread less

Abstract: It has long been presumed impossible to measure telomeres in vertebrate DNA by PCR amplification with oligonucleotide primers designed to hybridize to the TTAGGG and CCCTAA repeats, because only primer dimer-derived products are expected. Here we present a primer pair that eliminates this problem, allowing simple and rapid measurement of telomeres in a closed tube, fluorescence-based assay. This assay will facilitate investigations of the biology of telomeres and the roles they play in the molecular pathophysiology of diseases and aging.

...read moreread less

3,014 citations

Journal Article•DOI•

Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders

[...]

Ada Hamosh, Alan F. Scott¹, Joanna S. Amberger¹, Carol A. Bocchini¹, Victor A. McKusick¹ - Show less +1 more•Institutions (1)

Johns Hopkins University School of Medicine¹

01 Jan 2002-Nucleic Acids Research

TL;DR: Online Mendelian Inheritance in Man (OMIM) is a comprehensive, authoritative and timely knowledgebase of human genes and genetic disorders compiled to support research and education in human genomics and the practice of clinical genetics.

...read moreread less

Abstract: Online Mendelian Inheritance in Man (OMIM) is a comprehensive, authoritative and timely knowledgebase of human genes and genetic disorders compiled to support human genetics research and education and the practice of clinical genetics. Started by Dr Victor A. McKusick as the definitive reference Mendelian Inheritance in Man, OMIM (http://www.ncbi.nlm.nih.gov/omim/) is now distributed electronically by the National Center for Biotechnology Information, where it is integrated with the Entrez suite of databases. Derived from the biomedical literature, OMIM is written and edited at Johns Hopkins University with input from scientists and physicians around the world. Each OMIM entry has a full-text summary of a genetically determined phenotype and/or gene and has numerous links to other genetic databases such as DNA and protein sequence, PubMed references, general and locus-specific mutation databases, HUGO nomenclature, MapViewer, GeneTests, patient support groups and many others. OMIM is an easy and straightforward portal to the burgeoning information in human genetics.

...read moreread less

2,715 citations

Journal Article•DOI•

Relative quantification of 40 nucleic acid sequences by multiplex ligation-dependent probe amplification

[...]

Jan P. Schouten, Cathal J. McElgunn, Raymond Waaijer, Danny A. Zwijnenburg, Filip Diepvens, Gerard Pals - Show less +2 more

15 Jun 2002-Nucleic Acids Research

TL;DR: A new method for relative quantification of 40 different DNA sequences in an easy to perform reaction requiring only 20 ng of human DNA is described.

...read moreread less

Abstract: We describe a new method for relative quantification of 40 different DNA sequences in an easy to perform reaction requiring only 20 ng of human DNA. Applications shown of this multiplex ligation-dependent probe amplification (MLPA) technique include the detection of exon deletions and duplications in the human BRCA1, MSH2 and MLH1 genes, detection of trisomies such as Down’s syndrome, characterisation of chromosomal aberrations in cell lines and tumour samples and SNP/mutation detection. Relative quantification of mRNAs by MLPA will be described elsewhere. In MLPA, not sample nucleic acids but probes added to the samples are amplified and quantified. Amplification of probes by PCR depends on the presence of probe target sequences in the sample. Each probe consists of two oligonucleotides, one synthetic and one M13 derived, that hybridise to adjacent sites of the target sequence. Such hybridised probe oligonucleotides are ligated, permitting subsequent amplification. All ligated probes have identical end sequences, permitting simultaneous PCR amplification using only one primer pair. Each probe gives rise to an amplification product of unique size between 130 and 480 bp. Probe target sequences are small (50–70 nt). The prerequisite of a ligation reaction provides the opportunity to discriminate single nucleotide differences.

...read moreread less

2,675 citations

Journal Article•DOI•

Human non‐synonymous SNPs: server and survey

[...]

Vasily Ramensky, Peer Bork, Shamil R. Sunyaev

01 Sep 2002-Nucleic Acids Research

TL;DR: A World Wide Web server is presented to predict the effect of an nsSNP on protein structure and function and the dependence of selective pressure on the structural and functional properties of proteins is studied.

...read moreread less

Abstract: Human single nucleotide polymorphisms (SNPs) represent the most frequent type of human population DNA variation. One of the main goals of SNP research is to understand the genetics of the human phenotype variation and especially the genetic basis of human complex diseases. Non-synonymous coding SNPs (nsSNPs) comprise a group of SNPs that, together with SNPs in regulatory regions, are believed to have the highest impact on phenotype. Here we present a World Wide Web server to predict the effect of an nsSNP on protein structure and function. The prediction method enabled analysis of the publicly available SNP database HGVbase, which gave rise to a dataset of nsSNPs with predicted functionality. The dataset was further used to compare the effect of various structural and functional characteristics of amino acid substitutions responsible for phenotypic display of nsSNPs. We also studied the dependence of selective pressure on the structural and functional properties of proteins. We found that in our dataset the selection pressure against deleterious SNPs depends on the molecular function of the protein, although it is insensitive to several other protein features considered. The strongest selective pressure was detected for proteins involved in transcription regulation.

...read moreread less

2,276 citations

Journal Article•DOI•

DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions

[...]

Ioannis Xenarios¹, Lukasz Salwinski¹, Xiaoqun Joyce Duan¹, Patrick Higney¹, Sul-Min Kim¹, David Eisenberg¹ - Show less +2 more•Institutions (1)

University of California, Los Angeles¹

01 Jan 2002-Nucleic Acids Research

TL;DR: The Database of Interacting Proteins (DIP) is a database that documents experimentally determined protein-protein interactions and provides the scientific community with an integrated set of tools for browsing and extracting information about protein interaction networks.

...read moreread less

Abstract: The Database of Interacting Proteins (DIP: http://dip.doe-mbi.ucla.edu) is a database that documents experimentally determined protein-protein interactions. It provides the scientific community with an integrated set of tools for browsing and extracting information about protein interaction networks. As of September 2001, the DIP catalogs approximately 11 000 unique interactions among 5900 proteins from >80 organisms; the vast majority from yeast, Helicobacter pylori and human. Tools have been developed that allow users to analyze, visualize and integrate their own experimental data with the information about protein-protein interactions available in the DIP database.

...read moreread less

Journal Article•DOI•

The Ensembl genome database project

[...]

Tim Hubbard¹, Daniel Barker, Ewan Birney, Graham Cameron, Yuan Chen, Louise Clark¹, Tony Cox¹, James Cuff¹, Val Curwen¹, Thomas A. Down¹, Richard Durbin¹, Eduardo Eyras¹, James G. R. Gilbert¹, Martin Hammond, Lukasz Huminiecki, Arek Kasprzyk, Heikki Lehväslaiho, Philip Lijnzaad, Craig Melsopp, Emmanuel Mongin, Roger Pettett¹, Matthew Pocock¹, Simon C. Potter¹, Alistair G. Rust, Esther Schmidt, Stephen M. J. Searle¹, Guy Slater, James Smith¹, William Spooner¹, Arne Stabenau, Jim Stalker¹, Elia Stupka², Abel Ureta-Vidal, Imre Vastrik, Michele Clamp¹ - Show less +31 more•Institutions (2)

Wellcome Trust Sanger Institute¹, Vita-Salute San Raffaele University²

01 Jan 2002-Nucleic Acids Research

TL;DR: The Ensembl database project provides a bioinformatics framework to organise biology around the sequences of large genomes and is a comprehensive source of stable automatic annotation of the human genome sequence, with confirmed gene predictions that have been integrated with external data sources.

...read moreread less

Abstract: The Ensembl (http://www.ensembl.org/) database project provides a bioinformatics framework to organise biology around the sequences of large genomes. It is a comprehensive source of stable automatic annotation of the human genome sequence, with confirmed gene predictions that have been integrated with external data sources, and is available as either an interactive web site or as flat files. It is also an open source software engineering project to develop a portable system able to handle very large genomes and associated requirements from sequence analysis to data storage and visualisation. The Ensembl site is one of the leading sources of human genome sequence annotation and provided much of the analysis for publication by the international human genome project of the draft genome. The Ensembl system is being installed around the world in both companies and academic sites on machines ranging from supercomputers to laptops.

...read moreread less

Journal Article•DOI•

The PROSITE database, its status in 2002.

[...]

Laurent Falquet¹, Marco Pagni, Philipp Bucher, Nicolas Hulo, Christian J. A. Sigrist, Kay Hofmann, Amos Marc Bairoch - Show less +3 more•Institutions (1)

Swiss Institute of Bioinformatics¹

01 Jan 2002-Nucleic Acids Research

TL;DR: The PROSITE database consists of biologically significant patterns and profiles designed in such a way that with appropriate computational tools it can rapidly and reliably help to determine to which known family of proteins (if any) a new sequence belongs, or which known domain(s) it contains.

...read moreread less

Abstract: PROSITE [Bairoch and Bucher (1994) Nucleic Acids Res., 22, 3583-3589; Hofmann et al. (1999) Nucleic Acids Res., 27, 215-219] is a method of identifying the functions of uncharacterized proteins translated from genomic or cDNA sequences. The PROSITE database (http://www.expasy.org/prosite/) consists of biologically significant patterns and profiles designed in such a way that with appropriate computational tools it can rapidly and reliably help to determine to which known family of proteins (if any) a new sequence belongs, or which known domain(s) it contains.

...read moreread less

Journal Article•DOI•

The KEGG databases at GenomeNet

[...]

Minoru Kanehisa¹, Susumu Goto¹, Shuichi Kawashima¹, Akihiro Nakaya¹•Institutions (1)

Kyoto University¹

01 Jan 2002-Nucleic Acids Research

TL;DR: The Kyoto Encyclopedia of Genes and Genomes (KEGG) is the primary database resource of the Japanese GenomeNet service for understanding higher order functional meanings and utilities of the cell or the organism from its genome information.

...read moreread less

Abstract: The Kyoto Encyclopedia of Genes and Genomes (KEGG) is the primary database resource of the Japanese GenomeNet service (http://www.genome.ad.jp/) for understanding higher order functional meanings and utilities of the cell or the organism from its genome information. KEGG consists of the PATHWAY database for the computerized knowledge on molecular interaction networks such as pathways and complexes, the GENES database for the information about genes and proteins generated by genome sequencing projects, and the LIGAND database for the information about chemical compounds and chemical reactions that are relevant to cellular processes. In addition to these three main databases, limited amounts of experimental data for microarray gene expression profiles and yeast two-hybrid systems are stored in the EXPRESSION and BRITE databases, respectively. Furthermore, a new database, named SSDB, is available for exploring the universe of all protein coding genes in the complete genomes and for identifying functional links and ortholog groups. The data objects in the KEGG databases are all represented as graphs and various computational methods are developed to detect graph features that can be related to biological functions. For example, the correlated clusters are graph similarities which can be used to predict a set of genes coding for a pathway or a complex, as summarized in the ortholog group tables, and the cliques in the SSDB graph are used to annotate genes. The KEGG databases are updated daily and made freely available (http://www.genome.ad.jp/kegg/).

...read moreread less

Journal Article•DOI•

Real-time PCR in virology

[...]

Ian M. Mackay¹, Katherine E. Arden, Andreas Nitsche•Institutions (1)

Royal Children's Hospital¹

15 Mar 2002-Nucleic Acids Research

TL;DR: The background, advantages and limitations of real-time PCR are described, the literature as it applies to virus detection in the routine and research laboratory is reviewed and the technology discussed has been applied to other areas of microbiology as well as studies of gene expression and genetic disease.

...read moreread less

Abstract: The use of the polymerase chain reaction (PCR) in molecular diagnostics has increased to the point where it is now accepted as the gold standard for detecting nucleic acids from a number of origins and it has become an essential tool in the research laboratory. Real-time PCR has engendered wider acceptance of the PCR due to its improved rapidity, sensitivity, reproducibility and the reduced risk of carry-over contamination. There are currently five main chemistries used for the detection of PCR product during real-time PCR. These are the DNA binding fluorophores, the 5' endonuclease, adjacent linear and hairpin oligoprobes and the self-fluorescing amplicons, which are described in detail. We also discuss factors that have restricted the development of multiplex real-time PCR as well as the role of real-time PCR in quantitating nucleic acids. Both amplification hardware and the fluorogenic detection chemistries have evolved rapidly as the understanding of real-time PCR has developed and this review aims to update the scientist on the current state of the art. We describe the background, advantages and limitations of real-time PCR and we review the literature as it applies to virus detection in the routine and research laboratory in order to focus on one of the many areas in which the application of real-time PCR has provided significant methodological benefits and improved patient outcomes. However, the technology discussed has been applied to other areas of microbiology as well as studies of gene expression and genetic disease.

...read moreread less

Journal Article•DOI•

High-level and high-throughput recombinant protein production by transient transfection of suspension-growing human 293-EBNA1 cells

[...]

Yves Durocher¹, Sylvie Perret¹, Amine Kamen¹•Institutions (1)

National Research Council¹

15 Jan 2002-Nucleic Acids Research

TL;DR: A scalable transfection procedure using polyethylenimine (PEI) is described for the human embryonic kidney 293 cell line grown in suspension and 10- and 3-fold increases in SEAP expression was obtained in 293E cells compared with pcDNA3.1 and pCEP4 vectors.

...read moreread less

Abstract: A scalable transfection procedure using polyethylenimine (PEI) is described for the human embryonic kidney 293 cell line grown in suspension. Green fluorescent protein (GFP) and human placental secreted alkaline phosphatase (SEAP) were used as reporter genes to monitor transfection efficiency and productivity. Up to 75% of GFP-positive cells were obtained using linear or branched 25 kDa PEI. The 293 cell line and two genetic variants, either expressing the SV40 large T-antigen (293T) or the Epstein–Barr virus (EBV) EBNA1 protein (293E), were tested for protein expression. The highest expression level was obtained with 293E cells using the EBV oriP-containing plasmid pCEP4. We designed the pTT vector, an oriP-based vector having an improved cytomegalovirus expression cassette. Using this vector, 10- and 3-fold increases in SEAP expression was obtained in 293E cells compared with pcDNA3.1 and pCEP4 vectors, respectively. The presence of serum had a positive effect on gene transfer and expression. Transfection of suspension-growing cells was more efficient with linear PEI and was not affected by the presence of medium conditioned for 24 h. Using the pTT vector, >20 mg/l of purified Histagged SEAP was recovered from a 3.5 l bioreactor. Intracellular proteins were also produced at levels as high as 50 mg/l, representing up to 20% of total cell proteins.

...read moreread less

Journal Article•DOI•

A second set of loxP marker cassettes for Cre-mediated multiple gene knockouts in budding yeast.

[...]

U. Gueldener¹, J. Heinisch¹, Gabriele J. Koehler¹, D. Voss¹, Johannes H. Hegemann¹ - Show less +1 more•Institutions (1)

University of Düsseldorf¹

15 Mar 2002-Nucleic Acids Research

TL;DR: An additional set of four completely heterologous loxP-flanked marker cassettes carrying the genes URA3 and LEU2 from Kluyveromyces lactis, his5(+) from Schizosaccharomyces pombe and the dominant resistance marker ble(r) from the bacterial transposon Tn5, which confers resistance to the antibiotic phleomycin are described.

...read moreread less

Abstract: Heterologous markers are important tools required for the molecular dissection of gene function in many organisms, including Saccharomyces cerevisiae. Moreover, the presence of gene families and isoenzymes often makes it necessary to delete more than one gene. We recently introduced a new and efficient gene disruption cassette for repeated use in budding yeast, which combines the heterologous dominant kanr resistance marker with a Cre/loxP-mediated marker removal procedure. Here we describe an additional set of four completely heterologous loxP-flanked marker cassettes carrying the genes URA3 and LEU2 from Kluyveromyces lactis, his5+ from Schizosaccharomyces pombe and the dominant resistance marker bler from the bacterial transposon Tn5, which confers resistance to the antibiotic phleomycin. All five loxP–marker gene–loxP gene disruption cassettes can be generated using the same pair of oligonucleotides and all can be used for gene disruption with high efficiency. For marker rescue we have created three additional Cre expression vectors carrying HIS3, TRP1 or bler as the yeast selection marker. The set of disruption cassettes and Cre expression plasmids described here represents a significant further development of the marker rescue system, which is ideally suited to functional analysis of the yeast genome.

...read moreread less

Journal Article•DOI•

Positional effects of short interfering RNAs targeting the human coagulation trigger Tissue Factor

[...]

Torgeir Holen¹, Mohammed Amarzguioui¹, Merete Thune Wiiger¹, Eshrat Babaie¹, Hans Prydz¹ - Show less +1 more•Institutions (1)

University of Oslo¹

15 Apr 2002-Nucleic Acids Research

TL;DR: The silencing effect was transient, with the level of mRNA recovering fully within 4-5 days, suggesting absence of a propagative system for RNAi in humans and the depletion rate-dependent appearance of 3' mRNA cleavage fragments argues for the existence of a two-step mRNA degradation mechanism.

...read moreread less

Abstract: Chemically synthesised 21-23 bp double-stranded short interfering RNAs (siRNA) can induce sequence-specific post-transcriptional gene silencing, in a process termed RNA interference (RNAi). In the present study, several siRNAs synthesised against different sites on the same target mRNA (human Tissue Factor) demonstrated striking differences in silencing efficiency. Only a few of the siRNAs resulted in a significant reduction in expression, suggesting that accessible siRNA target sites may be rare in some human mRNAs. Blocking of the 3'-OH with FITC did not reduce the effect on target mRNA. Mutations in the siRNAs relative to target mRNA sequence gradually reduced, but did not abolish mRNA depletion. Inactive siRNAs competed reversibly with active siRNAs in a sequence-independent manner. Several lines of evidence suggest the existence of a near equilibrium kinetic balance between mRNA production and siRNA-mediated mRNA depletion. The silencing effect was transient, with the level of mRNA recovering fully within 4-5 days, suggesting absence of a propagative system for RNAi in humans. Finally, we observed 3' mRNA cleavage fragments resulting from the action of the most effective siRNAs. The depletion rate-dependent appearance of these fragments argues for the existence of a two-step mRNA degradation mechanism.

...read moreread less

Journal Article•DOI•

Fast algorithms for large-scale genome alignment and comparison.

[...]

Arthur L. Delcher¹, Adam M. Phillippy¹, Jane M. Carlton, Steven L. Salzberg²•Institutions (2)

Loyola University Maryland¹, J. Craig Venter Institute²

01 Jun 2002-Nucleic Acids Research

TL;DR: MUMmer as discussed by the authors is a suffix-tree algorithm that can align the entire genome sequences of eukaryotic and prokaryotic organisms with minimal use of computer time and memory.

...read moreread less

Abstract: We describe a suffix-tree algorithm that can align the entire genome sequences of eukaryotic and prokaryotic organisms with minimal use of computer time and memory. The new system, MUMmer 2, runs three times faster while using one-third as much memory as the original MUMmer system. It has been used successfully to align the entire human and mouse genomes to each other, and to align numerous smaller eukaryotic and prokaryotic genomes. A new module permits the alignment of multiple DNA sequence fragments, which has proven valuable in the comparison of incomplete genome sequences. We also describe a method to align more distantly related genomes by detecting protein sequence homology. This extension to MUMmer aligns two genomes after translating the sequence in all six reading frames, extracts all matching protein sequences and then clusters together matches. This method has been applied to both incomplete and complete genome sequences in order to detect regions of conserved synteny, in which multiple proteins from one organism are found in the same order and orientation in another. The system code is being made freely available by the authors.

...read moreread less

Journal Article•DOI•

The non‐Watson–Crick base pairs and their associated isostericity matrices

[...]

Neocles B. Leontis¹, Jesse Stombaugh, Eric Westhof•Institutions (1)

Bowling Green State University¹

15 Aug 2002-Nucleic Acids Research

TL;DR: This paper presents the 4 x 4 'isostericity matrices' summarizing the geometric relationships between the 16 pairwise combinations of the four standard bases, A, C, G and U, and helps identify isosteric pairs that co-vary or interchange in sequences of homologous molecules while maintaining conserved three-dimensional motifs.

...read moreread less

Abstract: RNA molecules exhibit complex structures in which a large fraction of the bases engage in non-Watson-Crick base pairing, forming motifs that mediate long-range RNA-RNA interactions and create binding sites for proteins and small molecule ligands. The rapidly growing number of three-dimensional RNA structures at atomic resolution requires that databases contain the annotation of such base pairs. An unambiguous and descriptive nomenclature was proposed recently in which RNA base pairs were classified by the base edges participating in the interaction (Watson-Crick, Hoogsteen/CH or sugar edge) and the orientation of the glycosidic bonds relative to the hydrogen bonds (cis or trans). Twelve basic geometric families were identified and all 12 have been observed in crystal structures. For each base pairing family, we present here the 4 x 4 'isostericity matrices' summarizing the geometric relationships between the 16 pairwise combinations of the four standard bases, A, C, G and U. Whenever available, a representative example of each observed base pair from X-ray crystal structures (3.0 A resolution or better) is provided or, otherwise, theoretically plausible models. This format makes apparent the recurrent geometric patterns that are observed and helps identify isosteric pairs that co-vary or interchange in sequences of homologous molecules while maintaining conserved three-dimensional motifs.

...read moreread less

Journal Article•DOI•

Recent improvements to the SMART domain-based sequence annotation resource

[...]

Ivica Letunic¹, Leo Goodstadt², Nicholas J. Dickens², Tobias Doerks, Jörg Schultz, Richard Mott², Francesca D. Ciccarelli, Richard R. Copley, Chris P. Ponting², Peer Bork - Show less +6 more•Institutions (2)

European Bioinformatics Institute¹, University of Oxford²

01 Jan 2002-Nucleic Acids Research

TL;DR: The SMART database now contains information on intrinsic sequence features such as transmembrane regions, coiled-coils, signal peptides and internal repeats and new advanced queries provide direct access to the SMART relational database using SQL.

...read moreread less

Abstract: SMART (Simple Modular Architecture Research Tool, http://smart.embl-heidelberg.de) is a web-based resource used for the annotation of protein domains and the analysis of domain architectures, with particular emphasis on mobile eukaryotic domains. Extensive annotation for each domain family is available, providing information relating to function, subcellular localization, phyletic distribution and tertiary structure. The January 2002 release has added more than 200 hand-curated domain models. This brings the total to over 600 domain families that are widely represented among nuclear, signalling and extracellular proteins. Annotation now includes links to the Online Mendelian Inheritance in Man (OMIM) database in cases where a human disease is associated with one or more mutations in a particular domain. We have implemented new analysis methods and updated others. New advanced queries provide direct access to the SMART relational database using SQL. This database now contains information on intrinsic sequence features such as transmembrane regions, coiled-coils, signal peptides and internal repeats. SMART output can now be easily included in users' documents. A SMART mirror has been created at http://smart.ox.ac.uk.

...read moreread less

Journal Article•DOI•

Analysis of histone acetyltransferase and histone deacetylase families of Arabidopsis thaliana suggests functional diversification of chromatin modification among multicellular eukaryotes

[...]

Ritu Pandey¹, Andreas E. Müller, Carolyn A. Napoli, David Selinger, Craig S. Pikaard, Eric J. Richards, Judith Bender, David W. Mount, Richard A. Jorgensen - Show less +5 more•Institutions (1)

University of Arizona¹

01 Dec 2002-Nucleic Acids Research

TL;DR: The substantial diversification of HATs and HDACs that has occurred since the divergence of plants, animals and fungi suggests a surprising degree of evolutionary plasticity and functional diversification in these core chromatin components.

...read moreread less

Abstract: Sequence similarity and profile searching tools were used to analyze the genome sequences of Arabidopsis thaliana, Saccharomyces cerevisiae, Schizosaccharomyces pombe, Caenorhabditis elegans and Drosophila melanogaster for genes encoding three families of histone deacetylase (HDAC) proteins and three families of histone acetyltransferase (HAT) proteins. Plants, animals and fungi were found to have a single member of each of three subfamilies of the GNAT family of HATs, suggesting conservation of these functions. However, major differences were found with respect to sizes of gene families and multi-domain protein structures within other families of HATs and HDACs, indicating substantial evolutionary diversification. Phylogenetic analysis identified a new class of HDACs within the RPD3/HDA1 family that is represented only in plants and animals. A similar analysis of the plant-specific HD2 family of HDACs suggests a duplication event early in dicot evolution, followed by further diversification in the lineage leading to Arabidopsis. Of three major classes of SIR2-type HDACs that are found in animals, fungi have representatives only in one class, whereas plants have representatives only in the other two. Plants possess five CREB-binding protein (CBP)-type HATs compared with one to two in animals and none in fungi. Domain and phylogenetic analyses of the CBP family proteins showed that this family has evolved three distinct types of CBPs in plants. The domain architecture of CBP and TAFII250 families of HATs show significant differences between plants and animals, most notably with respect to bromodomain occurrence and their number. Bromodomain-containing proteins in Arabidopsis differ strikingly from animal bromodomain proteins with respect to the numbers of bromodomains and the other types of domains that are present. The substantial diversification of HATs and HDACs that has occurred since the divergence of plants, animals and fungi suggests a surprising degree of evolutionary plasticity and functional diversification in these core chromatin components.

...read moreread less

Journal Article•DOI•

CDD: a database of conserved domain alignments with links to domain three-dimensional structure

[...]

Aron Marchler-Bauer¹, Anna R. Panchenko¹, Benjamin A. Shoemaker¹, Paul A. Thiessen¹, Lewis Y. Geer¹, Stephen H. Bryant¹ - Show less +2 more•Institutions (1)

National Institutes of Health¹

01 Jan 2002-Nucleic Acids Research

TL;DR: The Conserved Domain Database (CDD), a compilation of multiple sequence alignments representing protein domains conserved in molecular evolution, has been populated with alignment data from the public collections Pfam and SMART, as well as with contributions from colleagues at NCBI.

...read moreread less

Abstract: The Conserved Domain Database (CDD) is a compilation of multiple sequence alignments representing protein domains conserved in molecular evolution. It has been populated with alignment data from the public collections Pfam and SMART, as well as with contributions from colleagues at NCBI. The current version of CDD (v.1.54) contains 3693 such models. CDD alignments are linked to protein sequence and structure data in Entrez. The molecular structure viewer Cn3D serves as a tool to interactively visualize alignments and three-dimensional structure, and to link three-dimensional residue coordinates to descriptions of evolutionary conservation. CDD can be accessed on the World Wide Web at http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml. Protein query sequences may be compared against databases of position-specific score matrices derived from alignments in CDD, using a service named CD-Search, which can be found at http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi. CD-Search runs reverse-position-specific BLAST (RPS-BLAST), a variant of the widely used PSI-BLAST algorithm. CD-Search is run by default for protein–protein queries submitted to NCBI’s BLAST service at http://www.ncbi.nlm.nih.gov/BLAST.

...read moreread less

Journal Article•DOI•

Efficiencies of fluorescence resonance energy transfer and contact-mediated quenching in oligonucleotide probes

[...]

Salvatore A. E. Marras¹, Fred Russell Kramer¹, Sanjay Tyagi¹•Institutions (1)

Public Health Research Institute¹

01 Nov 2002-Nucleic Acids Research

TL;DR: The tendency of the fluorophore and the quencher to bind to each other has a strong influence on quenching efficiency, and the availability of these measurements should facilitate the design of oligonucleotide probes that contain interactive fluorophores and quenchers.

...read moreread less

Abstract: An important consideration in the design of oligonucleotide probes for homogeneous hybridization assays is the efficiency of energy transfer between the fluorophore and quencher used to label the probes. We have determined the efficiency of energy transfer for a large number of combinations of commonly used fluorophores and quenchers. We have also measured the quenching effect of nucleotides on the fluorescence of each fluorophore. Quenching efficiencies were measured for both the resonance energy transfer and the static modes of quenching. We found that, in addition to their photochemical characteristics, the tendency of the fluorophore and the quencher to bind to each other has a strong influence on quenching efficiency. The availability of these measurements should facilitate the design of oligonucleotide probes that contain interactive fluorophores and quenchers, including competitive hybridization probes, adjacent probes, TaqMan probes and molecular beacons.

...read moreread less

Patent•DOI•

Synthesis, deprotection, analysis and purification of RNA and ribozymes

[...]

Nassim Usman¹, Francine E. Wincott, David Sweedler, Leonid Beigelman, Lech W. Dudycz, Susan Grimm, Anthony Direnzo, Danuta Tracz - Show less +4 more•Institutions (1)

Sirna Therapeutics¹

28 May 2002-Nucleic Acids Research

TL;DR: In this article, a method for purification and synthesis of RNA molecules and enzymatic RNA molecules in enzymatically active form is presented, and the method is used to synthesize RNA molecules.

...read moreread less

Abstract: Method for purification and synthesis of RNA molecules and enzymatic RNA molecules in enzymatically active form.

...read moreread less

Journal Article•DOI•

Design of antisense oligonucleotides stabilized by locked nucleic acids

[...]

Jens Kurreck¹, Eliza Wyszko, Clemens Gillen, Volker A. Erdmann•Institutions (1)

Free University of Berlin¹

01 May 2002-Nucleic Acids Research

TL;DR: These chimeric LNA/DNA oligonucleotides are more stable than isosequential phosphorothioates and 2'-O-methyl gapmers, which have half-lives of 10 and 12 h, respectively.

...read moreread less

Abstract: The design of antisense oligonucleotides containing locked nucleic acids (LNA) was optimized and compared to intensively studied DNA oligonucleotides, phosphorothioates and 2'-O-methyl gapmers. In contradiction to the literature, a stretch of seven or eight DNA monomers in the center of a chimeric DNA/LNA oligonucleotide is necessary for full activation of RNase H to cleave the target RNA. For 2'-O-methyl gapmers a stretch of six DNA monomers is sufficient to recruit RNase H. Compared to the 18mer DNA the oligonucleotides containing LNA have an increased melting temperature of 1.5-4 degrees C per LNA depending on the positions of the modified residues. 2'-O-methyl nucleotides increase the T(m) by only 2'-O-methyl > DNA > phosphorothioate. Three LNAs at each end of the oligonucleotide are sufficient to stabilize the oligonucleotide in human serum 10-fold compared to an unmodified oligodeoxynucleotide (from t(1/2) = approximately 1.5 h to t(1/2) = approximately 15 h). These chimeric LNA/DNA oligonucleotides are more stable than isosequential phosphorothioates and 2'-O-methyl gapmers, which have half-lives of 10 and 12 h, respectively.

...read moreread less

Journal Article•DOI•

DNAWorks: an automated method for designing oligonucleotides for PCR-based gene synthesis

[...]

David M. Hoover¹, Jacek Lubkowski¹•Institutions (1)

National Institutes of Health¹

15 May 2002-Nucleic Acids Research

TL;DR: The approach presented here simplifies the production of proteins from a wide variety of organisms for genomics-based studies and automates the design of oligonucleotides for gene synthesis.

...read moreread less

Abstract: The availability of sequences of entire genomes has dramatically increased the number of protein targets, many of which will need to be overexpressed in cells other than the original source of DNA Gene synthesis often provides a fast and economically efficient approach The synthetic gene can be optimized for expression and constructed for easy mutational manipulation without regard to the parent genome Yet design and construction of synthetic genes, especially those coding for large proteins, can be a slow, difficult and confusing process We have written a computer program that automates the design of oligonucleotides for gene synthesis Our program requires simple input information, ie amino acid sequence of the target protein and melting temperature (needed for the gene assembly) of synthetic oligonucleotides The program outputs a series of oligonucleotide sequences with codons optimized for expression in an organism of choice Those oligonucleotides are characterized by highly homogeneous melting temperatures and a minimized tendency for hairpin formation With the help of this program and a two-step PCR method, we have successfully constructed numerous synthetic genes, ranging from 139 to 1042 bp The approach presented here simplifies the production of proteins from a wide variety of organisms for genomics-based studies

...read moreread less

Journal Article•DOI•

TTD: Therapeutic Target Database

[...]

Xiang Chen¹, Zhiliang Ji¹, Yu Zong Chen¹•Institutions (1)

National University of Singapore¹

01 Jan 2002-Nucleic Acids Research

TL;DR: The Therapeutic Target Database (TTD) is designed to provide information about the known therapeutic protein and nucleic acid targets described in the literature, the targeted disease conditions, the pathway information and the corresponding drugs/ligands directed at each of these targets.

...read moreread less

Abstract: A number of proteins and nucleic acids have been explored as therapeutic targets. These targets are subjects of interest in different areas of biomedical and pharmaceutical research and in the development and evaluation of bioinformatics, molecular modeling, computer-aided drug design and analytical tools. A publicly accessible database that provides comprehensive information about these targets is therefore helpful to the relevant communities. The Therapeutic Target Database (TTD) is designed to provide information about the known therapeutic protein and nucleic acid targets described in the literature, the targeted disease conditions, the pathway information and the corresponding drugs/ligands directed at each of these targets. Cross-links to other databases are also introduced to facilitate the access of information about the sequence, 3D structure, function, nomenclature, drug/ligand binding properties, drug usage and effects, and related literature for each target. This database can be accessed at http://xin.cz3.nus.edu.sg/group/ttd/ttd.asp and it currently contains entries for 433 targets covering 125 disease conditions along with 809 drugs/ligands directed at each of these targets. Each entry can be retrieved through multiple methods including target name, disease name, drug/ligand name, drug/ligand function and drug therapeutic classification.

...read moreread less

Journal Article•DOI•

Accuracy and calibration of commercial oligonucleotide and custom cDNA microarrays

[...]

Tony Yuen¹, Elisa Wurmbach, Robert L. Pfeffer, Barbara J. Ebersole, Stuart C. Sealfon - Show less +1 more•Institutions (1)

Icahn School of Medicine at Mount Sinai¹

15 May 2002-Nucleic Acids Research

TL;DR: Following calibration, fold-change measurements generated by custom cDNA arrays were more accurate than those obtained by commercial oligonucleotide arrays.

...read moreread less

Abstract: We compared the accuracy of microarray measurements obtained with oligonucleotide arrays (GeneChip, Affymetrix) with a laboratory-developed cDNA array by assaying test RNA samples from an experiment using a paradigm known to regulate many genes measured on both arrays. We selected 47 genes represented on both arrays, including both known regulated and unregulated transcripts, and established reference relative expression measurements for these genes in the test RNA samples using quantitative reverse transcriptase real-time PCR (QRTPCR) assays. The validity of the reproducible (average coefficient of variation = 11.8%) QRTPCR measurements were established through application of a new mathematical model. The performance of both array platforms in identifying regulated and non-regulated genes was identical. With either platform, 16 of 17 definitely regulated genes were correctly identified, and no definitely unregulated transcript was falsely identified as regulated. Accuracy of the fold-change measurements obtained with each platform was assessed by determining measurement bias. Both platforms consistently underestimate the relative changes in mRNA expression between experimental and control samples. The bias observed with cDNA arrays was predictable for fold-changes <250-fold by QRTPCR and could be corrected by the calibration function F(c) = F(a(cDNA))(q), where F(a(cDNA)) is the microarray-determined fold-change comparing experimental with control samples, q is the correction factor and F(c) is the calibrated value. The bias observed with the commercial oligonucleotide arrays was less predictable and calibration was unfeasible. Following calibration, fold-change measurements generated by custom cDNA arrays were more accurate than those obtained by commercial oligonucleotide arrays. Our study demonstrates systematic bias of microarray measurements and identifies a calibration function that improves the accuracy of cDNA array data.

...read moreread less

Journal Article•DOI•

The EcoCyc Database

[...]

Peter D. Karp¹, Monica Riley², Milton H. Saier³, Ian T. Paulsen⁴, Julio Collado-Vides⁵, Suzanne M. Paley¹, Alida Pellegrini-Toole², César Bonavides⁵, Socorro Gama-Castro⁵ - Show less +5 more•Institutions (5)

SRI International¹, Marine Biological Laboratory², University of California, San Diego³, J. Craig Venter Institute⁴, National Autonomous University of Mexico⁵

01 Jan 2002-Nucleic Acids Research

TL;DR: EcoCyc is an organism-specific pathway/genome database that describes the metabolic and signal-transduction pathways of Escherichia coli, its enzymes, its transport proteins and its mechanisms of transcriptional control of gene expression.

...read moreread less

Abstract: EcoCyc is an organism-specific pathway/genome database that describes the metabolic and signal-transduction pathways of Escherichia coli, its enzymes, its transport proteins and its mechanisms of transcriptional control of gene expression. EcoCyc is queried using the Pathway Tools graphical user interface, which provides a wide variety of query operations and visualization tools. EcoCyc is available at http://ecocyc.org/.

...read moreread less

Collapse