Showing papers in "Nucleic Acids Research in 2003"

PDF

Open Access

Journal Article•DOI•

Mfold web server for nucleic acid folding and hybridization prediction

[...]

Michael Zuker¹•Institutions (1)

01 Jul 2003-Nucleic Acids Research

TL;DR: The objective of this web server is to provide easy access to RNA and DNA folding and hybridization software to the scientific community at large by making use of universally available web GUIs (Graphical User Interfaces).

...read moreread less

Abstract: The abbreviated name,‘mfold web server’,describes a number of closely related software applications available on the World Wide Web (WWW) for the prediction of the secondary structure of single stranded nucleic acids. The objective of this web server is to provide easy access to RNA and DNA folding and hybridization software to the scientific community at large. By making use of universally available web GUIs (Graphical User Interfaces),the server circumvents the problem of portability of this software. Detailed output,in the form of structure plots with or without reliability information,single strand frequency plots and ‘energy dot plots’, are available for the folding of single sequences. A variety of ‘bulk’ servers give less information,but in a shorter time and for up to hundreds of sequences at once. The portal for the mfold web server is http://www.bioinfo.rpi.edu/applications/ mfold. This URL will be referred to as ‘MFOLDROOT’.

...read moreread less

12,535 citations

Journal Article•DOI•

SIFT: predicting amino acid changes that affect protein function

[...]

Pauline C. Ng¹, Steven Henikoff•Institutions (1)

Fred Hutchinson Cancer Research Center¹

01 Jul 2003-Nucleic Acids Research

TL;DR: SIFT is a program that predicts whether an amino acid substitution affects protein function so that users can prioritize substitutions for further study and can distinguish between functionally neutral and deleterious amino acid changes in mutagenesis studies and on human polymorphisms.

...read moreread less

Abstract: Single nucleotide polymorphism (SNP) studies and random mutagenesis projects identify amino acid substitutions in protein-coding regions. Each substitution has the potential to affect protein function. SIFT (Sorting Intolerant From Tolerant) is a program that predicts whether an amino acid substitution affects protein function so that users can prioritize substitutions for further study. We have shown that SIFT can distinguish between functionally neutral and deleterious amino acid changes in mutagenesis studies and on human polymorphisms. SIFT is available at http://blocks.fhcrc.org/sift/SIFT.html.

...read moreread less

5,318 citations

Journal Article•DOI•

Multiple sequence alignment with the Clustal series of programs

[...]

Ramu Chenna, Hideaki Sugawara, Tadashi Koike, Rodrigo Lopez, Toby J. Gibson, Desmond G. Higgins, Julie D. Thompson - Show less +3 more

01 Jul 2003-Nucleic Acids Research

TL;DR: The Clustal series of programs, widely used in molecular biology for the multiple alignment of both nucleic acid and protein sequences and for preparing phylogenetic trees, are extended.

...read moreread less

Abstract: The Clustal series of programs are widely used in molecular biology for the multiple alignment of both nucleic acid and protein sequences and for preparing phylogenetic trees. The popularity of the programs depends on a number of factors, including not only the accuracy of the results, but also the robustness, portability and user-friendliness of the programs. New features include NEXUS and FASTA format output, printing range numbers and faster tree calculation. Although, Clustal was originally developed to run on a local computer, numerous Web servers have been set up, notably at the EBI (European Bioinformatics Institute) (http://www.ebi.

...read moreread less

5,300 citations

Journal Article•DOI•

SWISS-MODEL: an automated protein homology-modeling server

[...]

Torsten Schwede¹, Jürgen Kopp, Nicolas Guex, Manuel C. Peitsch•Institutions (1)

University of Basel¹

01 Jul 2003-Nucleic Acids Research

TL;DR: The SWISS-MODEL server is under constant development to improve the successful implementation of expert knowledge into an easy-to-use server.

...read moreread less

Abstract: SWISS-MODEL (http://swissmodel.expasy.org) is a server for automated comparative modeling of three-dimensional (3D) protein structures. It pioneered the field of automated modeling starting in 1993 and is the most widely-used free web-based automated modeling facility today. In 2002 the server computed 120 000 user requests for 3D protein models. SWISS-MODEL provides several levels of user interaction through its World Wide Web interface: in the 'first approach mode' only an amino acid sequence of a protein is submitted to build a 3D model. Template selection, alignment and model building are done completely automated by the server. In the 'alignment mode', the modeling process is based on a user-defined target-template alignment. Complex modeling tasks can be handled with the 'project mode' using DeepView (Swiss-PdbViewer), an integrated sequence-to-structure workbench. All models are sent back via email with a detailed modeling report. WhatCheck analyses and ANOLEA evaluations are provided optionally. The reliability of SWISS-MODEL is continuously evaluated in the EVA-CM project. The SWISS-MODEL server is under constant development to improve the successful implementation of expert knowledge into an easy-to-use server.

...read moreread less

5,208 citations

Journal Article•DOI•

Summaries of Affymetrix GeneChip probe level data

[...]

Rafael A. Irizarry¹, Benjamin M. Bolstad², Francois Collin, Leslie Cope, Bridget G. Hobbs³, Terence P. Speed², Terence P. Speed³ - Show less +3 more•Institutions (3)

Johns Hopkins University¹, University of California, Berkeley², Walter and Eliza Hall Institute of Medical Research³

15 Feb 2003-Nucleic Acids Research

TL;DR: It is found that the performance of the current version of the default expression measure provided by Affymetrix Microarray Suite can be significantly improved by the use of probe level summaries derived from empirically motivated statistical models.

...read moreread less

Abstract: High density oligonucleotide array technology is widely used in many areas of biomedical research for quantitative and highly parallel measurements of gene expression. Affymetrix GeneChip arrays are the most popular. In this technology each gene is typically represented by a set of 11–20 pairs of probes. In order to obtain expression measures it is necessary to summarize the probe level data. Using two extensive spike-in studies and a dilution study, we developed a set of tools for assessing the effectiveness of expression measures. We found that the performance of the current version of the default expression measure provided by Affymetrix Microarray Suite can be significantly improved by the use of probe level summaries derived from empirically motivated statistical models. In particular, improvements in the ability to detect differentially expressed genes are demonstrated.

...read moreread less

5,119 citations

Journal Article•DOI•

ExPASy: The proteomics server for in-depth protein knowledge and analysis.

[...]

Elisabeth Gasteiger¹, Alexandre Gattiker, Christine Hoogland, Ivan Ivanyi, Ron D. Appel, Amos Marc Bairoch - Show less +2 more•Institutions (1)

Swiss Institute of Bioinformatics¹

01 Jul 2003-Nucleic Acids Research

TL;DR: The ExPASy (the Expert Protein Analysis System) World Wide Web server, provided as a service to the life science community by a multidisciplinary team at the Swiss Institute of Bioinformatics, provides access to a variety of databases and analytical tools dedicated to proteins and proteomics.

...read moreread less

Abstract: The ExPASy (the Expert Protein Analysis System) World Wide Web server (http://www.expasy.org), is provided as a service to the life science community by a multidisciplinary team at the Swiss Institute of Bioinformatics (SIB). It provides access to a variety of databases and analytical tools dedicated to proteins and proteomics. ExPASy databases include SWISS-PROT and TrEMBL, SWISS-2DPAGE, PROSITE, ENZYME and the SWISS-MODEL repository. Analysis tools are available for specific tasks relevant to proteomics, similarity searches, pattern and profile searches, post-translational modification prediction, topology prediction, primary, secondary and tertiary structure analysis and sequence alignment. These databases and tools are tightly interlinked: a special emphasis is placed on integration of database entries with related resources developed at the SIB and elsewhere, and the proteomics tools have been designed to read the annotations in SWISS-PROT in order to enhance their predictions. ExPASy started to operate in 1993, as the first WWW server in the field of life sciences. In addition to the main site in Switzerland, seven mirror sites in different continents currently serve the user community.

...read moreread less

4,428 citations

Journal Article•DOI•

The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003

[...]

Brigitte Boeckmann¹, Amos Marc Bairoch, Rolf Apweiler, Marie-Claude Blatter, Anne Estreicher, Elisabeth Gasteiger, Maria Jesus Martin, Karine Michoud, Claire O'Donovan, Isabelle Phan, Sandrine Pilbout, Michel Schneider - Show less +8 more•Institutions (1)

Swiss Institute of Bioinformatics¹

01 Jan 2003-Nucleic Acids Research

TL;DR: The SWISS-PROT protein knowledgebase connects amino acid sequences with the current knowledge in the Life Sciences by providing an interdisciplinary overview of relevant information by bringing together experimental results, computed features and sometimes even contradictory conclusions.

...read moreread less

Abstract: The SWISS-PROT protein knowledgebase (http://www.expasy.org/sprot/ and http://www.ebi.ac.uk/swissprot/) connects amino acid sequences with the current knowledge in the Life Sciences. Each protein entry provides an interdisciplinary overview of relevant information by bringing together experimental results, computed features and sometimes even contradictory conclusions. Detailed expertise that goes beyond the scope of SWISS-PROT is made available via direct links to specialised databases. SWISS-PROT provides annotated entries for all species, but concentrates on the annotation of entries from human (the HPI project) and other model organisms to ensure the presence of high quality annotation for representative members of all protein families. Part of the annotation can be transferred to other family members, as is already done for microbes by the High-quality Automated and Manual Annotation of microbial Proteomes (HAMAP) project. Protein families and groups of proteins are regularly reviewed to keep up with current scientific findings. Complementarily, TrEMBL strives to comprise all protein sequences that are not yet represented in SWISS-PROT, by incorporating a perpetually increasing level of mostly automated annotation. Researchers are welcome to contribute their knowledge to the scientific community by submitting relevant findings to SWISS-PROT at swiss-prot@expasy.org.

...read moreread less

3,440 citations

Journal Article•DOI•

Vienna RNA secondary structure server

[...]

Ivo L. Hofacker¹•Institutions (1)

University of Vienna¹

01 Jul 2003-Nucleic Acids Research

TL;DR: The Vienna RNA secondary structure server provides a web interface to the most frequently used functions of the Vienna RNA software package for the analysis of RNA secondary structures.

...read moreread less

Abstract: The Vienna RNA secondary structure server provides a web interface to the most frequently used functions of the Vienna RNA software package for the analysis of RNA secondary structures. It currently offers prediction of secondary structure from a single sequence, prediction of the consensus secondary structure for a set of aligned sequences and the design of sequences that will fold into a predefined structure. All three services can be accessed via the Vienna RNA web server at http://rna.tbi.univie.ac.at/.

...read moreread less

2,236 citations

Journal Article•DOI•

TRANSFAC®: transcriptional regulation, from patterns to profiles

[...]

01 Jan 2003-Nucleic Acids Research

TL;DR: The TRANSFAC database on eukaryotic transcriptional regulation, comprising data on transcription factors, their target genes and regulatory binding sites, has been extended and further developed, both in number of entries and in the scope and structure of the collected data.

...read moreread less

Abstract: The TRANSFAC database on eukaryotic transcriptional regulation, comprising data on transcription factors, their target genes and regulatory binding sites, has been extended and further developed, both in number of entries and in the scope and structure of the collected data. Structured fields for expression patterns have been introduced for transcription factors from human and mouse, using the CYTOMER database on anatomical structures and developmental stages. The functionality of Match, a tool for matrix-based search of transcription factor binding sites, has been enhanced. For instance, the program now comes along with a number of tissue-(or state-)specific profiles and new profiles can be created and modified with Match Profiler. The GENE table was extended and gained in importance, containing amongst others links to LocusLink, RefSeq and OMIM now. Further, (direct) links between factor and target gene on one hand and between gene and encoded factor on the other hand were introduced. The TRANSFAC public release is available at http://www.gene-regulation.com. For yeast an additional release including the latest data was made available separately as TRANSFAC Saccharomyces Module (TSM) at http://transfac.gbf.de. For CYTOMER free download versions are available at http://www.biobase.de:8080/index.html.

...read moreread less

2,143 citations

Journal Article•DOI•

The UCSC Genome Browser Database

[...]

Donna Karolchik¹, Robert Baertsch¹, Mark Diekhans¹, Terrence S. Furey¹, Angie S. Hinrichs¹, Yontao Lu¹, Krishna M. Roskin¹, Michael L. Schwartz¹, Charles W. Sugnet¹, Daryl J. Thomas¹, R. J. Weber¹, David Haussler¹, W. J. Kent¹ - Show less +9 more•Institutions (1)

University of California, Santa Cruz¹

01 Jan 2003-Nucleic Acids Research

TL;DR: The University of California Santa Cruz (UCSC) Genome Browser Database is an up to date source for genome sequence data integrated with a large collection of related annotations that is optimized to support fast interactive performance with the web-based UCSC Genome browser.

...read moreread less

Abstract: The University of California Santa Cruz (UCSC) Genome Browser Database is an up to date source for genome sequence data integrated with a large collection of related annotations. The database is optimized to support fast interactive performance with the web-based UCSC Genome Browser, a tool built on top of the database for rapid visualization and querying of the data at many levels. The annotations for a given genome are displayed in the browser as a series of tracks aligned with the genomic sequence. Sequence data and annotations may also be viewed in a text-based tabular format or downloaded as tab-delimited flat files. The Genome Browser Database, browsing tools and downloadable data files can all be found on the UCSC Genome Bioinformatics website (http://genome.ucsc.edu), which also contains links to documentation and related technical information.

...read moreread less

2,103 citations

Journal Article•DOI•

STRING: a database of predicted functional associations between proteins.

[...]

Christian von Mering, Martijn A. Huynen, Daniel Jaeggi, Steffen Schmidt, Peer Bork, Berend Snel - Show less +2 more

01 Jan 2003-Nucleic Acids Research

TL;DR: STRING contains a unique scoring-framework based on benchmarks of the different types of associations against a common reference set, integrated in a single confidence score per prediction, facilitating the analysis of modularity in biological processes.

...read moreread less

Abstract: Functional links between proteins can often be inferred from genomic associations between the genes that encode them: groups of genes that are required for the same function tend to show similar species coverage, are often located in close proximity on the genome (in prokaryotes), and tend to be involved in gene-fusion events. The database STRING is a precomputed global resource for the exploration and analysis of these associations. Since the three types of evidence differ conceptually, and the number of predicted interactions is very large, it is essential to be able to assess and compare the significance of individual predictions. Thus, STRING contains a unique scoring-framework based on benchmarks of the different types of associations against a common reference set, integrated in a single confidence score per prediction. The graphical representation of the network of inferred, weighted protein interactions provides a high-level view of functional linkage, facilitating the analysis of modularity in biological processes. STRING is updated continuously, and currently contains 261 033 orthologs in 89 fully sequenced genomes. The database predicts functional interactions at an expected level of accuracy of at least 80% for more than half of the genes; it is online at http://www.bork.embl-heidelberg.de/STRING/.

...read moreread less

Journal Article•DOI•

Scansite 2.0: proteome-wide prediction of cell signaling interactions using short sequence motifs

[...]

John C. Obenauer¹, Lewis C. Cantley, Michael B. Yaffe•Institutions (1)

Massachusetts Institute of Technology¹

01 Jul 2003-Nucleic Acids Research

TL;DR: Scansite identifies short protein sequence motifs that are recognized by modular signaling domains, phosphorylated by protein Ser/Thr- or Tyr-kinases or mediate specific interactions with protein or phospholipid ligands, allowing segments of biological pathways to be constructed in silico.

...read moreread less

Abstract: Scansite identifies short protein sequence motifs that are recognized by modular signaling domains, phosphorylated by protein Ser/Thr- or Tyr-kinases or mediate specific interactions with protein or phospholipid ligands. Each sequence motif is represented as a position-specific scoring matrix (PSSM) based on results from oriented peptide library and phage display experiments. Predicted domain-motif interactions from Scansite can be sequentially combined, allowing segments of biological pathways to be constructed in silico. The current release of Scansite, version 2.0, includes 62 motifs characterizing the binding and/or substrate specificities of many families of Ser/Thr- or Tyr-kinases, SH2, SH3, PDZ, 14-3-3 and PTB domains, together with signature motifs for PtdIns(3,4,5)P(3)-specific PH domains. Scansite 2.0 contains significant improvements to its original interface, including a number of new generalized user features and significantly enhanced performance. Searches of all SWISS-PROT, TrEMBL, Genpept and Ensembl protein database entries are now possible with run times reduced by approximately 60% when compared with Scansite version 1.0. Scansite 2.0 allows restricted searching of species-specific proteins, as well as isoelectric point and molecular weight sorting to facilitate comparison of predictions with results from two-dimensional gel electrophoresis experiments. Support for user-defined motifs has been increased, allowing easier input of user-defined matrices and permitting user-defined motifs to be combined with pre-compiled Scansite motifs for dual motif searching. In addition, a new series of Sequence Match programs for non-quantitative user-defined motifs has been implemented. Scansite is available via the World Wide Web at http://scansite.mit.edu.

...read moreread less

Journal Article•DOI•

3DNA: a software package for the analysis, rebuilding and visualization of three‐dimensional nucleic acid structures

[...]

Xiang-Jun Lu¹, Wilma K. Olson¹•Institutions (1)

Rutgers University¹

01 Sep 2003-Nucleic Acids Research

TL;DR: A comprehensive software package for the analysis, reconstruction and visualization of three-dimensional nucleic acid structures that can handle antiparallel and parallel double helices, single-stranded structures, triplexes, quadruplexes and other complex tertiary folding motifs found in both DNA and RNA structures is presented.

...read moreread less

Abstract: We present a comprehensive software package, 3DNA, for the analysis, reconstruction and visualization of three-dimensional nucleic acid structures. Starting from a coordinate file in Protein Data Bank (PDB) format, 3DNA can handle antiparallel and parallel double helices, single-stranded structures, triplexes, quadruplexes and other complex tertiary folding motifs found in both DNA and RNA structures. The analysis routines identify and categorize all base interactions and classify the double helical character of appropriate base pair steps. The program makes use of a recently recommended reference frame for the description of nucleic acid base pair geometry and a rigorous matrix-based scheme to calculate local conformational parameters and rebuild the structure from these parameters. The rebuilding routines produce rectangular block representations of nucleic acids as well as full atomic models with the sugar-phosphate backbone and publication quality 'standardized' base stacking diagrams. Utilities are provided to locate the base pairs and helical regions in a structure and to reorient structures for effective visualization. Regular helical models based on X-ray diffraction measurements of various repeating sequences can also be generated within the program.

...read moreread less

Journal Article•DOI•

ESEfinder: A web resource to identify exonic splicing enhancers.

[...]

Luca Cartegni¹, Jinhua Wang¹, Zhengwei Zhu¹, Michael Q. Zhang¹, Adrian R. Krainer¹ - Show less +1 more•Institutions (1)

Cold Spring Harbor Laboratory¹

01 Jul 2003-Nucleic Acids Research

TL;DR: ESEfinder (http://exon.cshl.edu/ESE/) is a web-based resource that facilitates rapid analysis of exon sequences to identify putative ESEs responsive to the human SR proteins SF2/ASF, SC35, SRp40 and SRp55, and to predict whether exonic mutations disrupt such elements.

...read moreread less

Abstract: Point mutations frequently cause genetic diseases by disrupting the correct pattern of pre-mRNA splicing. The effect of a point mutation within a coding sequence is traditionally attributed to the deduced change in the corresponding amino acid. However, some point mutations can have much more severe effects on the structure of the encoded protein, for example when they inactivate an exonic splicing enhancer (ESE), thereby resulting in exon skipping. ESEs also appear to be especially important in exons that normally undergo alternative splicing. Different classes of ESE consensus motifs have been described, but they are not always easily identified. ESEfinder (http://exon.cshl.edu/ESE/) is a web-based resource that facilitates rapid analysis of exon sequences to identify putative ESEs responsive to the human SR proteins SF2/ASF, SC35, SRp40 and SRp55, and to predict whether exonic mutations disrupt such elements.

...read moreread less

Journal Article•DOI•

The Ribosomal Database Project (RDP-II): previewing a new autoaligner that allows regular updates and the new prokaryotic taxonomy

[...]

James R. Cole, Benli Chai¹, Terry L. Marsh, Ryan J. Farris¹, Qiong Wang¹, S. A. Kulam¹, Sadanandavalli Retnaswami Chandra¹, Donna M. McGarrell¹, Thomas M. Schmidt¹, George M. Garrity, James M. Tiedje - Show less +7 more•Institutions (1)

Michigan State University¹

01 Jan 2003-Nucleic Acids Research

TL;DR: The Ribosomal Database Project-II (RDP-II) pro-vides data, tools and services related to ribosomal RNA sequences to the research community and debuts a new regularly updated alignment of over 50 000 annotated (eu)bacterial sequences.

...read moreread less

Abstract: The Ribosomal Database Project-II (RDP-II) pro-vides data, tools and services related to ribosomal RNA sequences to the research community. Through its website (http://rdp.cme.msu.edu), RDP-II offers aligned and annotated rRNA sequence data, analysis services, and phylogenetic inferences (trees) derived from these data. RDP-II release 8.1 contains 16 277 prokaryotic, 5201 eukaryotic, and 1503 mitochondrial small subunit rRNA sequences in aligned and annotated format. The current public beta release of 9.0 debuts a new regularly updated alignment of over 50 000 annotated (eu)bacterial sequences. New analysis services include a sequence search and selection tool (Hierarchy Browser) and a phylogenetic tree building and visualization tool (Phylip Interface). A new interactive tutorial guides users through the basics of rRNA sequence analysis. Other services include probe checking, phylogenetic placement of user sequences, screening of users' sequences for chimeric rRNA sequences, automated alignment, production of similarity matrices, and services to plan and analyze terminal restriction fragment polymorphism (T-RFLP) experiments. The RDP-II email address for questions or comments is rdpstaff@msu.edu.

...read moreread less

Journal Article•DOI•

Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies

[...]

Brian J. Haas¹, Arthur L. Delcher², Stephen M. Mount³, Jennifer R. Wortman², Roger Smith², Linda Hannick², Rama Maiti², Catherine M. Ronning², Douglas B. Rusch, Christopher D. Town², Steven L. Salzberg², Owen White² - Show less +8 more•Institutions (3)

TigerLogic¹, J. Craig Venter Institute², University of Maryland, College Park³

01 Oct 2003-Nucleic Acids Research

TL;DR: The algorithm of the Program to Assemble Spliced Alignments (PASA) tool is described, as well as the results of automated updates to Arabidopsis gene annotations.

...read moreread less

Abstract: The spliced alignment of expressed sequence data to genomic sequence has proven a key tool in the comprehensive annotation of genes in eukaryotic genomes. A novel algorithm was developed to assemble clusters of overlapping transcript alignments (ESTs and full-length cDNAs) into maximal alignment assemblies, thereby comprehensively incorporating all available transcript data and capturing subtle splicing variations. Complete and partial gene structures identified by this method were used to improve The Institute for Genomic Research Arabidopsis genome annotation (TIGR release v.4.0). The alignment assemblies permitted the automated modeling of several novel genes and >1000 alternative splicing variations as well as updates (including UTR annotations) to nearly half of the ~27 000 annotated protein coding genes. The algorithm of the Program to Assemble Spliced Alignments (PASA) tool is described, as well as the results of automated updates to Arabidopsis gene annotations.

...read moreread less

Journal Article•DOI•

The PredictProtein server

[...]

Burkhard Rost¹, Guy Yachdav, Jinfeng Liu•Institutions (1)

Columbia University¹

01 Jul 2003-Nucleic Acids Research

TL;DR: PredictProtein is an Internet service for sequence analysis and the prediction of protein structure and function that returns multiple sequence alignments, PROSITE sequence motifs, low-complexity regions (SEG), nuclear localization signals, regions lacking regular structure (NORS) and predictions of secondary structure, solvent accessibility, globular regions, transmembrane helices and functional annotations.

...read moreread less

Abstract: PredictProtein (http://www.predictprotein.org) is an Internet service for sequence analysis and the prediction of protein structure and function. Users submit protein sequences or alignments; PredictProtein returns multiple sequence alignments, PROSITE sequence motifs, low-complexity regions (SEG), nuclear localization signals, regions lacking regular structure (NORS) and predictions of secondary structure, solvent accessibility, globular regions, transmembrane helices, coiled-coil regions, structural switch regions, disulfide-bonds, sub-cellular localization and functional annotations. Upon request fold recognition by prediction-based threading, CHOP domain assignments, predictions of transmembrane strands and inter-residue contacts are also available. For all services, users can submit their query either by electronic mail or interactively via the World Wide Web.

...read moreread less

Journal Article•DOI•

Rfam: an RNA family database

[...]

Sam Griffiths-Jones¹, Alex Bateman, Mhairi Marshall, Ajay Khanna, Sean R. Eddy - Show less +1 more•Institutions (1)

Wellcome Trust Sanger Institute¹

01 Jan 2003-Nucleic Acids Research

TL;DR: The first release of Rfam (1.0) contains 25 families, which annotate over 50 000 non-coding RNA genes in the taxonomic divisions of the EMBL nucleotide database.

...read moreread less

Abstract: Rfam is a collection of multiple sequence alignments and covariance models representing non-coding RNA families. Rfam is available on the web in the UK at http://www.sanger.ac.uk/Software/Rfam/ and in the US at http://rfam.wustl.edu/. These websites allow the user to search a query sequence against a library of covariance models, and view multiple sequence alignments and family annotation. The database can also be downloaded in flatfile form and searched locally using the INFERNAL package (http://infernal.wustl.edu/). The first release of Rfam (1.0) contains 25 families, which annotate over 50 000 non-coding RNA genes in the taxonomic divisions of the EMBL nucleotide database.

...read moreread less

Journal Article•DOI•

CASTp: Computed Atlas of Surface Topography of proteins

[...]

T. Andrew Binkowski¹, Shapor Edmund Naghibzadeh¹, Jie Liang•Institutions (1)

University of Illinois at Chicago¹

01 Jul 2003-Nucleic Acids Research

TL;DR: Computed Atlas of Surface Topography of proteins (CASTp) provides an online resource for locating, delineating and measuring concave surface regions on three-dimensional structures of proteins, including pockets located on protein surfaces and voids buried in the interior of proteins.

...read moreread less

Abstract: Computed Atlas of Surface Topography of proteins (CASTp) provides an online resource for locating, delineating and measuring concave surface regions on three-dimensional structures of proteins. These include pockets located on protein surfaces and voids buried in the interior of proteins. The measurement includes the area and volume of pocket or void by solvent accessible surface model (Richards' surface) and by molecular surface model (Connolly's surface), all calculated analytically. CASTp can be used to study surface features and functional regions of proteins. CASTp includes a graphical user interface, flexible interactive visualization, as well as on-the-fly calculation for user uploaded structures. CASTp is updated daily and can be accessed at http://cast.engr.uic.edu.

...read moreread less

Journal Article•DOI•

ESPript/ENDscript: extracting and rendering sequence and 3D information from atomic structures of proteins

[...]

Patrice Gouet¹, Xavier Robert, Emmanuel Courcelle•Institutions (1)

Independent Bank¹

01 Jul 2003-Nucleic Acids Research

TL;DR: The fortran program ESPript was created in 1993, to display on a PostScript figure multiple sequence alignments adorned with secondary structure elements of each sequence of known 3D structure.

...read moreread less

Abstract: The fortran program ESPript was created in 1993, to display on a PostScript figure multiple sequence alignments adorned with secondary structure elements. A web server was made available in 1999 and ESPript has been linked to three major web tools: ProDom which identifies protein domains, PredictProtein which predicts secondary structure elements and NPS@ which runs sequence alignment programs. A web server named ENDscript was created in 2002 to facilitate the generation of ESPript figures containing a large amount of information. ENDscript uses programs such as BLAST, Clustal and PHYLODENDRON to work on protein sequences and such as DSSP, CNS and MOLSCRIPT to work on protein coordinates. It enables the creation, from a single Protein Data Bank identifier, of a multiple sequence alignment figure adorned with secondary structure elements of each sequence of known 3D structure. Similar 3D structures are superimposed in turn with the program PROFIT and a final figure is drawn with BOBSCRIPT, which shows sequence and structure conservation along the Cα trace of the query. ESPript and ENDscript are available at http://genopole.toulouse.inra.fr/ESPript.

...read moreread less

Journal Article•DOI•

Database resources of the National Center for Biotechnology

[...]

David L. Wheeler¹, Deanna M. Church¹, Scott Federhen¹, Alex E. Lash¹, Thomas L. Madden¹, Joan Pontius¹, Gregory D. Schuler¹, Lynn M. Schriml¹, Edwin Sequeira¹, Tatiana Tatusova¹, Lukas Wagner¹ - Show less +7 more•Institutions (1)

National Institutes of Health¹

01 Jan 2003-Nucleic Acids Research

TL;DR: In addition to maintaining the GenBank(R) nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides data analysis and retrieval resources for the data in GenBank and other biological data made available through NCBI's Web site.

...read moreread less

Abstract: In addition to maintaining the GenBank(R) nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides data analysis and retrieval resources for the data in GenBank and other biological data made available through NCBI's Web site. NCBI resources include Entrez, PubMed, PubMed Central (PMC), LocusLink, the NCBITaxonomy Browser, BLAST, BLAST Link (BLink), Electronic PCR (e-PCR), Open Reading Frame (ORF) Finder, References Sequence (RefSeq), UniGene, HomoloGene, ProtEST, Database of Single Nucleotide Polymorphisms (dbSNP), Human/Mouse Homology Map, Cancer Chromosome Aberration Project (CCAP), Entrez Genomes and related tools, the Map Viewer, Model Maker (MM), Evidence Viewer (EV), Clusters of Orthologous Groups (COGs) database, Retroviral Genotyping Tools, SAGEmap, Gene Expression Omnibus (GEO), Online Mendelian Inheritance in Man (OMIM), the Molecular Modeling Database (MMDB), the Conserved Domain Database (CDD), and the Conserved Domain Architecture Retrieval Tool (CDART). Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of the resources can be accessed through the NCBI home page at: http://www.ncbi.nlm.nih.gov.

...read moreread less

Journal Article•DOI•

ArrayExpress—a public repository for microarray gene expression data at the EBI

[...]

Helen Parkinson¹, Ugis Sarkans¹, Mohammadreza Shojatalab¹, Niran Abeygunawardena¹, Sergio Contrino¹, Richard M.R. Coulson¹, Anna Farne¹, Gonzalo Garcia Lara¹, Ele Holloway¹, Misha Kapushesky¹, P. Lilja¹, Gaurab Mukherjee¹, Ahmet Oezcimen¹, Tim F. Rayner¹, Philippe Rocca-Serra¹, Anjan Sharma¹, Susanna-Assunta Sansone¹, Alvis Brazma¹ - Show less +14 more•Institutions (1)

European Bioinformatics Institute¹

01 Jan 2003-Nucleic Acids Research

TL;DR: ArrayExpress is a public repository for microarray data that supports the MIAME (Minimum Informa-tion About a Microarray Experiment) requirements and stores well-annotated raw and normalized data.

...read moreread less

Abstract: ArrayExpress is a new public database of microarray gene expression data at the EBI, which is a generic gene expression database designed to hold data from all microarray platforms. ArrayExpress uses the annotation standard Minimum Information About a Microarray Experiment (MIAME) and the associated XML data exchange format Microarray Gene Expression Markup Language (MAGE-ML) and it is designed to store well annotated data in a structured way. The ArrayExpress infrastructure consists of the database itself, data submissions in MAGE-ML format or via an online submission tool MIAMExpress, online database query interface, and the Expression Profiler online analysis tool. ArrayExpress accepts three types of submission, arrays, experiments and protocols, each of these is assigned an accession number. Help on data submission and annotation is provided by the curation team. The database can be queried on parameters such as author, laboratory, organism, experiment or array types. With an increasing number of organisations adopting MAGE-ML standard, the volume of submissions to ArrayExpress is increasing rapidly. The database can be accessed at http://www.ebi.ac.uk/arrayexpress.

...read moreread less

Journal Article•DOI•

MATCHTM: a tool for searching transcription factor binding sites in DNA sequences

[...]

Alexander E. Kel, Ellen Gößling, Ingmar Reuter, Evgeny Cheremushkin, Olga V. Kel-Margoulis, Edgar Wingender - Show less +2 more

01 Jul 2003-Nucleic Acids Research

TL;DR: MatchTM is a weight matrix-based tool for searching putative transcription factor binding sites in DNA sequences that uses the matrix library collected in TRANSFAC® and therefore provides the possibility to search for a great variety of different transcription factorbinding sites.

...read moreread less

Abstract: MatchTM is a weight matrix-based tool for searching putative transcription factor binding sites in DNA sequences. MatchTM is closely interconnected and distributed together with the TRANSFAC® database. In particular, MatchTM uses the matrix library collected in TRANSFAC® and therefore provides the possibility to search for a great variety of different transcription factor binding sites. Several sets of optimised matrix cut-off values are built in the system to provide a variety of search modes of different stringency. The user may construct and save his/her specific user profiles which are selected subsets of matrices including default or user-defined cut-off values. Furthermore a number of tissue-specific profiles are provided that were compiled by the TRANSFAC® team. A public version of the MatchTM tool is available at: http://www.gene-regulation.com/pub/programs.html#match. The same program with a different web interface can be found at http://compel.bionet.nsc.ru/Match/Match.html. An advanced version of the tool called MatchTM Professional is available at http://www.biobase.de.

...read moreread less

Journal Article•DOI•

GlobPlot: exploring protein sequences for globularity and disorder

[...]

Rune Linding¹, Robert B. Russell, Victor Neduva, Toby J. Gibson•Institutions (1)

European Bioinformatics Institute¹

01 Jul 2003-Nucleic Acids Research

TL;DR: A new tool for discovery of unstructured, or disordered regions within proteins, and examples with known proteins where it successfully identifies inter-domain segments containing linear motifs, and also apparently ordered regions that do not contain any recognised domain are presented.

...read moreread less

Abstract: A major challenge in the proteomics and structural genomics era is to predict protein structure and function, including identification of those proteins that are partially or wholly unstructured. Non-globular sequence segments often contain short linear peptide motifs (e.g. SH3-binding sites) which are important for protein function. We present here a new tool for discovery of such unstructured, or disordered regions within proteins. GlobPlot (http://globplot.embl.de) is a web service that allows the user to plot the tendency within the query protein for order/globularity and disorder. We show examples with known proteins where it successfully identifies inter-domain segments containing linear motifs, and also apparently ordered regions that do not contain any recognised domain. GlobPlot may be useful in domain hunting efforts. The plots indicate that instances of known domains may often contain additional N- or C-terminal segments that appear ordered. Thus GlobPlot may be of use in the design of constructs corresponding to globular proteins, as needed for many biochemical studies, particularly structural biology. GlobPlot has a pipeline interface--GlobPipe--for the advanced user to do whole proteome analysis. GlobPlot can also be used as a generic infrastructure package for graphical displaying of any possible propensity.

...read moreread less

Journal Article•DOI•

LGA: a method for finding 3D similarities in protein structures

[...]

Adam Zemla¹•Institutions (1)

Lawrence Livermore National Laboratory¹

01 Jul 2003-Nucleic Acids Research

TL;DR: Data generated by LGA can be successfully used in a scoring function to rank the level of similarity between two structures and to allow structure classification when many proteins are being analyzed.

...read moreread less

Abstract: We present the LGA (Local-Global Alignment) method, designed to facilitate the comparison of protein structures or fragments of protein structures in sequence dependent and sequence independent modes. The LGA structure alignment program is available as an online service at http://PredictionCenter.llnl.gov/local/lga. Data generated by LGA can be successfully used in a scoring function to rank the level of similarity between two structures and to allow structure classification when many proteins are being analyzed. LGA also allows the clustering of similar fragments of protein structures.

...read moreread less

Journal Article•DOI•

Identification of patterns in biological sequences at the ALGGEN server: PROMO and MALGEN

[...]

Domènec Farré, Roman Roset, Mario Huerta, José-Enrique Adsuara, Llorenç Roselló, M. Mar Albà, Xavier Messeguer - Show less +3 more

01 Jul 2003-Nucleic Acids Research

TL;DR: Details on the functionality of PROMO version 2.0, a program for the prediction of transcription factor binding site in a single sequence or in a group of related sequences and, of MALGEN, a tool to visualize sequence correspondences among long DNA sequences are presented.

...read moreread less

Abstract: In this paper we present several web-based tools to identify conserved patterns in sequences. In particular we present details on the functionality of PROMO version 2.0, a program for the prediction of transcription factor binding site in a single sequence or in a group of related sequences and, of MALGEN, a tool to visualize sequence correspondences among long DNA sequences. The web tools and associated documentation can be accessed at http://www.lsi.upc.es/~alggen (RESEARCH link).

...read moreread less

Journal Article•DOI•

Experimental validation of novel and conventional approaches to quantitative real‐time PCR data analysis

[...]

Stuart N. Peirson¹, Jason N. Butler¹, Russell G. Foster¹•Institutions (1)

Imperial College London¹

15 Jul 2003-Nucleic Acids Research

TL;DR: This study uses a variety of absolute and relative approaches of data analysis to investigate nocturnal c-fos expression in wild-type and retinally degenerate mice, and applies a simple algorithm to calculate the amplification efficiency of every sample from its amplification profile.

...read moreread less

Abstract: Real-time PCR is being used increasingly as the method of choice for mRNA quantification, allowing rapid analysis of gene expression from low quantities of starting template. Despite a wide range of approaches, the same principles underlie all data analysis, with standard approaches broadly classified as either absolute or relative. In this study we use a variety of absolute and relative approaches of data analysis to investigate nocturnal c-fos expression in wild-type and retinally degenerate mice. In addition, we apply a simple algorithm to calculate the amplification efficiency of every sample from its amplification profile. We confirm that nocturnal c-fos expression in the rodent eye originates from the photoreceptor layer, with around a 5-fold reduction in nocturnal c-fos expression in mice lacking rods and cones. Furthermore, we illustrate that differences in the results obtained from absolute and relative approaches are underpinned by differences in the calculated PCR efficiency. By calculating the amplification efficiency from the samples under analysis, comparable results may be obtained without the need for standard curves. We have automated this method to provide a means of streamlining the real-time PCR process, enabling analysis of experimental samples based upon their own reaction kinetics rather than those of artificial standards.

...read moreread less

Journal Article•DOI•

The TIGRFAMs database of protein families

[...]

Daniel H. Haft, Jeremy D. Selengut, Owen White

01 Jan 2003-Nucleic Acids Research

TL;DR: TIGRFAMs is a collection of manually curated protein families consisting of hidden Markov models (HMMs), multiple sequence alignments, commentary, Gene Ontology (GO) assignments, literature references and pointers to related TIG RFAMs, Pfam and InterPro models designed to support both automated and manually curated annotation of genomes.

...read moreread less

Abstract: TIGRFAMs is a collection of manually curated protein families consisting of hidden Markov models (HMMs), multiple sequence alignments, commentary, Gene Ontology (GO) assignments, literature references and pointers to related TIGRFAMs, Pfam and InterPro models. These models are designed to support both automated and manually curated annotation of genomes. TIGRFAMs contains models of full-length proteins and shorter regions at the levels of superfamilies, subfamilies and equivalogs, where equivalogs are sets of homologous proteins conserved with respect to function since their last common ancestor. The scope of each model is set by raising or lowering cutoff scores and choosing members of the seed alignment to group proteins sharing specific function (equivalog) or more general properties. The overall goal is to provide information with maximum utility for the annotation process. TIGRFAMs is thus complementary to Pfam, whose models typically achieve broad coverage across distant homologs but end at the boundaries of conserved structural domains. The database currently contains over 1600 protein families. TIGRFAMs is available for searching or downloading at www.tigr.org/TIGRFAMs.

...read moreread less

Journal Article•DOI•

The Arabidopsis Information Resource (TAIR): a model organism database providing a centralized, curated gateway to Arabidopsis biology, research materials and community

[...]

Seung Y. Rhee¹, William D. Beavis, Tanya Z. Berardini, Guanghong Chen, David A. Dixon, Aisling Doyle, Margarita Garcia-Hernandez, Eva Huala, Gabriel C. Lander, Mary Montoya, Neil A. Miller, Lukas A. Mueller, Suparna Mundodi, Leonore Reiser, Julie Tacklind, Dan C. Weems, Yihe Wu, Iris Xu, Daniel Yoo, Jungwon Yoon, Peifen Zhang - Show less +17 more•Institutions (1)

Carnegie Institution for Science¹

01 Jan 2003-Nucleic Acids Research

TL;DR: New information include sequence polymorphisms including alleles, germplasms and phenotypes, Gene Ontology annotations, gene families, protein information, metabolic pathways, gene expression data from microarray experiments and seed and DNA stocks.

...read moreread less

Abstract: Arabidopsis thaliana is the most widely-studied plant today. The concerted efforts of over 11 000 researchers and 4000 organizations around the world are generating a rich diversity and quantity of information and materials. This information is made available through a comprehensive on-line resource called the Arabidopsis Information Resource (TAIR) (http://arabidopsis.org), which is accessible via commonly used web browsers and can be searched and downloaded in a number of ways. In the last two years, efforts have been focused on increasing data content and diversity, functionally annotating genes and gene products with controlled vocabularies, and improving data retrieval, analysis and visualization tools. New information include sequence polymorphisms including alleles, germplasms and phenotypes, Gene Ontology annotations, gene families, protein information, metabolic pathways, gene expression data from microarray experiments and seed and DNA stocks. New data visualization and analysis tools include SeqViewer, which interactively displays the genome from the whole chromosome down to 10 kb of nucleotide sequence and AraCyc, a metabolic pathway database and map tool that allows overlaying expression data onto the pathway diagrams. Finally, we have recently incorporated seed and DNA stock information from the Arabidopsis Biological Resource Center (ABRC) and implemented a shopping-cart style on-line ordering system.

...read moreread less

Journal Article•DOI•

A PCR primer bank for quantitative gene expression analysis

[...]

Xiaowei Wang¹, Brian Seed¹•Institutions (1)

Harvard University¹

15 Dec 2003-Nucleic Acids Research

TL;DR: An experimentally validated algorithm for the identification of transcript-specific PCR primers on a genomic scale that can be applied to real-time PCR with sequence-independent detection methods is presented.

...read moreread less

Abstract: Although gene expression profiling by microarray analysis is a useful tool for assessing global levels of transcriptional activity, variability associated with the data sets usually requires that observed differences be validated by some other method, such as real-time quantitative polymerase chain reaction (real-time PCR). However, non-specific amplification of non-target genes is frequently observed in the latter, confounding the analysis in approximately 40% of real-time PCR attempts when primer-specific labels are not used. Here we present an experimentally validated algorithm for the identification of transcript-specific PCR primers on a genomic scale that can be applied to real-time PCR with sequence-independent detection methods. An online database, PrimerBank, has been created for researchers to retrieve primer information for their genes of interest. PrimerBank currently contains 147 404 primers encompassing most known human and mouse genes. The primer design algorithm has been tested by conventional and real-time PCR for a subset of 112 primer pairs with a success rate of 98.2%.

...read moreread less

Collapse