scispace - formally typeset
Search or ask a question

Showing papers in "Nucleic Acids Research in 2011"


Journal ArticleDOI
TL;DR: This work has focused on minimizing search times and the ability to rapidly display tabular results, regardless of the number of matches found, developing graphical summaries of the search results to provide quick, intuitive appraisement of them.
Abstract: HMMER is a software suite for protein sequence similarity searches using probabilistic methods Previously, HMMER has mainly been available only as a computationally intensive UNIX command-line tool, restricting its use Recent advances in the software, HMMER3, have resulted in a 100-fold speed gain relative to previous versions It is now feasible to make efficient profile hidden Markov model (profile HMM) searches via the web A HMMER web server (http://hmmerjaneliaorg) has been designed and implemented such that most protein database searches return within a few seconds Methods are available for searching either a single protein sequence, multiple protein sequence alignment or profile HMM against a target sequence database, and for searching a protein sequence against Pfam The web server is designed to cater to a range of different user expertise and accepts batch uploading of multiple queries at once All search methods are also available as RESTful web services, thereby allowing them to be readily integrated as remotely executed tasks in locally scripted workflows We have focused on minimizing search times and the ability to rapidly display tabular results, regardless of the number of matches found, developing graphical summaries of the search results to provide quick, intuitive appraisement of them

4,159 citations


Journal ArticleDOI
TL;DR: This work has mapped reads from short RNA deep-sequencing experiments to microRNAs in miRBase and developed web interfaces to view these mappings, which can be used as a proxy for relative expression levels of microRNA sequences, provide detailed evidence for microRNA annotations and alternative isoforms of mature micro RNAs, and allow us to revisit previous annotations.
Abstract: miRBase is the primary online repository for all microRNA sequences and annotation. The current release (miRBase 16) contains over 15,000 microRNA gene loci in over 140 species, and over 17,000 distinct mature microRNA sequences. Deep-sequencing technologies have delivered a sharp rise in the rate of novel microRNA discovery. We have mapped reads from short RNA deep-sequencing experiments to microRNAs in miRBase and developed web interfaces to view these mappings. The user can view all read data associated with a given microRNA annotation, filter reads by experiment and count, and search for microRNAs by tissue- and stage-specific expression. These data can be used as a proxy for relative expression levels of microRNA sequences, provide detailed evidence for microRNA annotations and alternative isoforms of mature microRNAs, and allow us to revisit previous annotations. miRBase is available online at: http://www.mirbase.org/.

3,618 citations


Journal ArticleDOI
TL;DR: A web server, KOBAS 2.0, is reported, which annotates an input set of genes with putative pathways and disease relationships based on mapping to genes with known annotations, which allows for both ID mapping and cross-species sequence similarity mapping.
Abstract: High-throughput experimental technologies often identify dozens to hundreds of genes related to, or changed in, a biological or pathological process. From these genes one wants to identify biological pathways that may be involved and diseases that may be implicated. Here, we report a web server, KOBAS 2.0, which annotates an input set of genes with putative pathways and disease relationships based on mapping to genes with known annotations. It allows for both ID mapping and cross-species sequence similarity mapping. It then performs statistical tests to identify statistically significantly enriched pathways and diseases. KOBAS 2.0 incorporates knowledge across 1327 species from 5 pathway databases (KEGG PATHWAY, PID, BioCyc, Reactome and Panther) and 5 human disease databases (OMIM, KEGG DISEASE, FunDO, GAD and NHGRI GWAS Catalog). KOBAS 2.0 can be accessed at http://kobas.cbi.pku.edu.cn.

3,293 citations


Journal ArticleDOI
TL;DR: An update on the online database resource Search Tool for the Retrieval of Interacting Genes (STRING), which provides uniquely comprehensive coverage and ease of access to both experimental as well as predicted interaction information.
Abstract: An essential prerequisite for any systems-level understanding of cellular functions is to correctly uncover and annotate all functional interactions among proteins in the cell. Toward this goal, remarkable progress has been made in recent years, both in terms of experimental measurements and computational prediction techniques. However, public efforts to collect and present protein interaction information have struggled to keep up with the pace of interaction discovery, partly because protein-protein interaction information can be error-prone and require considerable effort to annotate. Here, we present an update on the online database resource Search Tool for the Retrieval of Interacting Genes (STRING); it provides uniquely comprehensive coverage and ease of access to both experimental as well as predicted interaction information. Interactions in STRING are provided with a confidence score, and accessory information such as protein domains and 3D structures is made available, all within a stable and consistent identifier space. New features in STRING include an interactive network viewer that can cluster networks on demand, updated on-screen previews of structural information including homology models, extensive data updates and strongly improved connectivity and integration with third-party resources. Version 9.0 of STRING covers more than 1100 completely sequenced organisms; the resource can be reached at http://string-db.org.

3,239 citations


Journal ArticleDOI
TL;DR: NCBI’s Conserved Domain Database (CDD) is a resource for the annotation of protein sequences with the location of conserved domain footprints, and functional sites inferred from these footprints.
Abstract: NCBI's Conserved Domain Database (CDD) is a resource for the annotation of protein sequences with the location of conserved domain footprints, and functional sites inferred from these footprints. CDD includes manually curated domain models that make use of protein 3D structure to refine domain models and provide insights into sequence/structure/function relationships. Manually curated models are organized hierarchically if they describe domain families that are clearly related by common descent. As CDD also imports domain family models from a variety of external sources, it is a partially redundant collection. To simplify protein annotation, redundant models and models describing homologous families are clustered into superfamilies. By default, domain footprints are annotated with the corresponding superfamily designation, on top of which specific annotation may indicate high-confidence assignment of family membership. Pre-computed domain annotation is available for proteins in the Entrez/Protein dataset, and a novel interface, Batch CD-Search, allows the computation and download of annotation for large sets of protein queries. CDD can be accessed via http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml.

2,934 citations


Journal ArticleDOI
TL;DR: With all genomic information recently updated to GRCh37, COSMIC integrates many diverse types of mutation information and is making much closer links with Ensembl and other data resources.
Abstract: COSMIC (http://www.sanger.ac.uk/cosmic) curates comprehensive information on somatic mutations in human cancer. Release v48 (July 2010) describes over 136 000 coding mutations in almost 542 000 tumour samples; of the 18 490 genes documented, 4803 (26%) have one or more mutations. Full scientific literature curations are available on 83 major cancer genes and 49 fusion gene pairs (19 new cancer genes and 30 new fusion pairs this year) and this number is continually increasing. Key amongst these is TP53, now available through a collaboration with the IARC p53 database. In addition to data from the Cancer Genome Project (CGP) at the Sanger Institute, UK, and The Cancer Genome Atlas project (TCGA), large systematic screens are also now curated. Major website upgrades now make these data much more mineable, with many new selection filters and graphics. A Biomart is now available allowing more automated data mining and integration with other biological databases. Annotation of genomic features has become a significant focus; COSMIC has begun curating full-genome resequencing experiments, developing new web pages, export formats and graphics styles. With all genomic information recently updated to GRCh37, COSMIC integrates many diverse types of mutation information and is making much closer links with Ensembl and other data resources.

2,270 citations


Journal ArticleDOI
TL;DR: A method and reagents for efficiently assembling TALEN constructs with custom repeat arrays are presented and design guidelines based on naturally occurring TAL effectors and their binding sites are described.
Abstract: TALENs are important new tools for genome engineering. Fusions of transcription activator-like (TAL) effectors of plant pathogenic Xanthomonas spp. to the FokI nuclease, TALENs bind and cleave DNA in pairs. Binding specificity is determined by customizable arrays of polymorphic amino acid repeats in the TAL effectors. We present a method and reagents for efficiently assembling TALEN constructs with custom repeat arrays. We also describe design guidelines based on naturally occurring TAL effectors and their binding sites. Using software that applies these guidelines, in nine genes from plants, animals and protists, we found candidate cleavage sites on average every 35bp. Each of 15 sites selected from this set was cleaved in a yeast-based assay with TALEN pairs constructed with our reagents. We used two of the TALEN pairs to mutate HPRT1 in human cells and ADH1 in Arabidopsis thaliana protoplasts. Our reagents include a plasmid construct for making custom TAL effectors and one for TAL effector fusions to additional proteins of interest. Using the former, we constructed de novo a functional analog of AvrHah1 of Xanthomonas gardneri. The complete plasmid set is available through the non-profit repository AddGene

2,175 citations


Journal ArticleDOI
TL;DR: The content and structure of the SRA is presented, support for sequencing platforms and recommended data submission levels and formats are provided and the response to the challenge of data growth is outlined.
Abstract: The combination of significantly lower cost and increased speed of sequencing has resulted in an explosive growth of data submitted into the primary next-generation sequence data archive, the Sequence Read Archive (SRA). The preservation of experimental data is an important part of the scientific record, and increasing numbers of journals and funding agencies require that next-generation sequence data are deposited into the SRA. The SRA was established as a public repository for the next-generation sequence data and is operated by the International Nucleotide Sequence Database Collaboration (INSDC). INSDC partners include the National Center for Biotechnology Information (NCBI), the European Bioinformatics Institute (EBI) and the DNA Data Bank of Japan (DDBJ). The SRA is accessible at http://www.ncbi.nlm.nih.gov/Traces/sra from NCBI, at http://www.ebi.ac.uk/ena from EBI and at http://trace.ddbj.nig.ac.jp from DDBJ. In this article, we present the content and structure of the SRA, detail our support for sequencing platforms and provide recommended data submission levels and formats. We also briefly outline our response to the challenge of data growth.

2,169 citations


Journal ArticleDOI
TL;DR: New data highlights include seven new genome assemblies, a Neandertal genome data portal, phenotype and disease association data, a human RNA editing track, and a zebrafish Conservation track.
Abstract: The University of California, Santa Cruz Genome Browser (http://genome.ucsc.edu) offers online access to a database of genomic sequence and annotation data for a wide variety of organisms. The Browser also has many tools for visualizing, comparing and analyzing both publicly available and user-generated genomic data sets, aligning sequences and uploading user data. Among the features released this year are a gene search tool and annotation track drag-reorder functionality as well as support for BAM and BigWig/BigBed file formats. New display enhancements include overlay of multiple wiggle tracks through use of transparent coloring, options for displaying transformed wiggle data, a ‘mean+whiskers’ windowing function for display of wiggle data at high zoom levels, and more color schemes for microarray data. New data highlights include seven new genome assemblies, a Neandertal genome data portal, phenotype and disease association data, a human RNA editing track, and a zebrafish Conservation track. We also describe updates to existing tracks.

1,818 citations


Journal ArticleDOI
TL;DR: PHAge Search Tool (PHAST) is a web server designed to rapidly and accurately identify, annotate and graphically display prophage sequences within bacterial genomes or plasmids.
Abstract: PHAge Search Tool (PHAST) is a web server designed to rapidly and accurately identify, annotate and graphically display prophage sequences within bacterial genomes or plasmids. It accepts either raw DNA sequence data or partially annotated GenBank formatted data and rapidly performs a number of database comparisons as well as phage ‘cornerstone’ feature identification steps to locate, annotate and display prophage sequences and prophage features. Relative to other prophage identification tools, PHAST is up to 40 times faster and up to 15% more sensitive. It is also able to process and annotate both raw DNA sequence data and Genbank files, provide richly annotated tables on prophage features and prophage ‘quality’ and distinguish between intact and incomplete prophage. PHAST also generates downloadable, high quality, interactive graphics that display all identified prophage components in both circular and linear genomic views. PHAST is available at (http://phast.wishartlab.com).

1,767 citations


Journal ArticleDOI
TL;DR: DrugBank 3.0 represents the result of 2 years of manual annotation work aimed at making the database much more useful for a wide range of ‘omics’ applications, particularly with regard to drug target, drug description and drug action data.
Abstract: DrugBank (http://www.drugbank.ca) is a richly annotated database of drug and drug target information. It contains extensive data on the nomenclature, ontology, chemistry, structure, function, action, pharmacology, pharmacokinetics, metabolism and pharmaceutical properties of both small molecule and large molecule (biotech) drugs. It also contains comprehensive information on the target diseases, proteins, genes and organisms on which these drugs act. First released in 2006, DrugBank has become widely used by pharmacists, medicinal chemists, pharmaceutical researchers, clinicians, educators and the general public. Since its last update in 2008, DrugBank has been greatly expanded through the addition of new drugs, new targets and the inclusion of more than 40 new data fields per drug entry (a 40% increase in data ‘depth’). These data field additions include illustrated drug-action pathways, drug transporter data, drug metabolite data, pharmacogenomic data, adverse drug response data, ADMET data, pharmacokinetic data, computed property data and chemical classification data. DrugBank 3.0 also offers expanded database links, improved search tools for drug–drug and food–drug interaction, new resources for querying and viewing drug pathways and hundreds of new drug entries with detailed patent, pricing and manufacturer data. These additions have been complemented by enhancements to the quality and quantity of existing data, particularly with regard to drug target, drug description and drug action data. DrugBank 3.0 represents the result of 2 years of manual annotation work aimed at making the database much more useful for a wide range of ‘omics’ (i.e. pharmacogenomic, pharmacoproteomic, pharmacometabolomic and even pharmacoeconomic) applications.

Journal ArticleDOI
TL;DR: A new functional impact score (FIS) for amino acid residue changes using evolutionary conservation patterns is introduced, estimating that at least 5% of cancer-relevant mutations involve switch of function, rather than simply loss or gain of function.
Abstract: As large-scale re-sequencing of genomes reveals many protein mutations, especially in human cancer tissues, prediction of their likely functional impact becomes important practical goal. Here, we introduce a new functional impact score (FIS) for amino acid residue changes using evolutionary conservation patterns. The information in these patterns is derived from aligned families and sub-families of sequence homologs within and between species using combinatorial entropy formalism. The score performs well on a large set of human protein mutations in separating disease-associated variants (∼19 200), assumed to be strongly functional, from common polymorphisms (∼35 600), assumed to be weakly functional (area under the receiver operating characteristic curve of ∼0.86). In cancer, using recurrence, multiplicity and annotation for ∼10 000 mutations in the COSMIC database, the method does well in assigning higher scores to more likely functional mutations (‘drivers’). To guide experimental prioritization, we report a list of about 1000 top human cancer genes frequently mutated in one or more cancer types ranked by likely functional impact; and, an additional 1000 candidate cancer genes with rare but likely functional mutations. In addition, we estimate that at least 5% of cancer-relevant mutations involve switch of function, rather than simply loss or gain of function.

Journal ArticleDOI
TL;DR: This is the first study to show that extracellular miRNAs are predominantly exosomes/microvesicles free and are associated with Ago proteins, and hypothesize that ext racellular miRNA are in the most part by-products of dead cells that remain in extrace cellular space due to the high stability of the Ago2 protein and Ago2-miRNA complex.
Abstract: MicroRNAs (miRNAs), a class of post-transcriptional gene expression regulators, have recently been detected in human body fluids, including peripheral blood plasma as extracellular nuclease resistant entities. However, the origin and function of extracellular circulating miRNA remain essentially unknown. Here, we confirmed that circulating mature miRNA in contrast to mRNA or snRNA is strikingly stable in blood plasma and cell culture media. Furthermore, we found that most miRNA in plasma and cell culture media completely passed through 0.22 µm filters but remained in the supernatant after ultracentrifugation at 110 000g indicating the non-vesicular origin of the extracellular miRNA. Furthermore, western blot immunoassay revealed that extracellular miRNA ultrafiltrated together with the 96 kDa Ago2 protein, a part of RNA-induced silencing complex. Moreover, miRNAs in both blood plasma and cell culture media co-immunoprecipited with anti-Ago2 antibody in a detergent free environment. This is the first study to show that extracellular miRNAs are predominantly exosomes/microvesicles free and are associated with Ago proteins. We hypothesize that extracellular miRNAs are in the most part by-products of dead cells that remain in extracellular space due to the high stability of the Ago2 protein and Ago2-miRNA complex. Nevertheless, our data does not reject the possibility that some miRNAs can be associated with exosomes.

Journal ArticleDOI
TL;DR: The psRNATarget as mentioned in this paper target analysis server is designed for high-throughput analysis of next-generation data with an efficient distributed computing back-end pipeline that runs on a Linux cluster.
Abstract: Plant endogenous non-coding short small RNAs (20–24 nt), including microRNAs (miRNAs) and a subset of small interfering RNAs (ta-siRNAs), play important role in gene expression regulatory networks (GRNs). For example, many transcription factors and development-related genes have been reported as targets of these regulatory small RNAs. Although a number of miRNA target prediction algorithms and programs have been developed, most of them were designed for animal miRNAs which are significantly different from plant miRNAs in the target recognition process. These differences demand the development of separate plant miRNA (and ta-siRNA) target analysis tool(s). We present psRNATarget, a plant small RNA target analysis server, which features two important analysis functions: (i) reverse complementary matching between small RNA and target transcript using a proven scoring schema, and (ii) target-site accessibility evaluation by calculating unpaired energy (UPE) required to ‘open’ secondary structure around small RNA’s target site on mRNA. The psRNATarget incorporates recent discoveries in plant miRNA target recognition, e.g. it distinguishes translational and post-transcriptional inhibition, and it reports the number of small RNA/target site pairs that may affect small RNA binding activity to target transcript. The psRNATarget server is designed for high-throughput analysis of next-generation data with an efficient distributed computing back-end pipeline that runs on a Linux cluster. The server front-end integrates three simplified user-friendly interfaces to accept user-submitted or preloaded small RNAs and transcript sequences; and outputs a comprehensive list of small RNA/target pairs along with the online tools for batch downloading, key word searching and results sorting. The psRNATarget server is freely available at http://plantgrn.noble.org/psRNATarget/.

Journal ArticleDOI
TL;DR: This work presents the first comprehensive pipeline capable of identifying biosynthetic loci covering the whole range of known secondary metabolite compound classes, and integrates or cross-links all previously available secondary-metabolite specific gene analysis methods in one interactive view.
Abstract: Bacterial and fungal secondary metabolism is a rich source of novel bioactive compounds with potential pharmaceutical applications as antibiotics, anti-tumor drugs or cholesterol-lowering drugs To find new drug candidates, microbiologists are increasingly relying on sequencing genomes of a wide variety of microbes However, rapidly and reliably pinpointing all the potential gene clusters for secondary metabolites in dozens of newly sequenced genomes has been extremely challenging, due to their biochemical heterogeneity, the presence of unknown enzymes and the dispersed nature of the necessary specialized bioinformatics tools and resources Here, we present antiSMASH (antibiotics & Secondary Metabolite Analysis Shell), the first comprehensive pipeline capable of identifying biosynthetic loci covering the whole range of known secondary metabolite compound classes (polyketides, non-ribosomal peptides, terpenes, aminoglycosides, aminocoumarins, indolocarbazoles, lantibiotics, bacteriocins, nucleosides, beta-lactams, butyrolactones, siderophores, melanins and others) It aligns the identified regions at the gene cluster level to their nearest relatives from a database containing all other known gene clusters, and integrates or cross-links all previously available secondary-metabolite specific gene analysis methods in one interactive view antiSMASH is available at http://antismashsecondarymetabolitesorg

Journal ArticleDOI
TL;DR: A new web site with improved tools for pathway browsing and data analysis is developed, and orthology-based inferences of pathways in non-human species are made, applying Ensembl Compara to identify orthologs of curated human proteins in each of 20 other species.
Abstract: Reactome (http://www.reactome.org) is a collaboration among groups at the Ontario Institute for Cancer Research, Cold Spring Harbor Laboratory, New York University School of Medicine and The European Bioinformatics Institute, to develop an open source curated bioinformatics database of human pathways and reactions. Recently, we developed a new web site with improved tools for pathway browsing and data analysis. The Pathway Browser is an Systems Biology Graphical Notation (SBGN)-based visualization system that supports zooming, scrolling and event highlighting. It exploits PSIQUIC web services to overlay our curated pathways with molecular interaction data from the Reactome Functional Interaction Network and external interaction databases such as IntAct, BioGRID, ChEMBL, iRefIndex, MINT and STRING. Our Pathway and Expression Analysis tools enable ID mapping, pathway assignment and overrepresentation analysis of user-supplied data sets. To support pathway annotation and analysis in other species, we continue to make orthology-based inferences of pathways in non-human species, applying Ensembl Compara to identify orthologs of curated human proteins in each of 20 other species. The resulting inferred pathway sets can be browsed and analyzed with our Species Comparison tool. Collaborations are also underway to create manually curated data sets on the Reactome framework for chicken, Drosophila and rice.

Journal ArticleDOI
TL;DR: Current version of iTOL introduces numerous new features and greatly expands the number of supported data set types.
Abstract: Interactive Tree Of Life (http://itol.embl.de) is a web-based tool for the display, manipulation and annotation of phylogenetic trees. It is freely available and open to everyone. In addition to classical tree viewer functions, iTOL offers many novel ways of annotating trees with various additional data. Current version introduces numerous new features and greatly expands the number of supported data set types. Trees can be interactively manipulated and edited. A free personal account system is available, providing management and sharing of trees in user defined workspaces and projects. Export to various bitmap and vector graphics formats is supported. Batch access interface is available for programmatic access or inclusion of interactive trees into other web services.

Journal ArticleDOI
TL;DR: SwissDock, a web server dedicated to the docking of small molecules on target proteins, is presented, based on the EADock DSS engine, combined with setup scripts for curating common problems and for preparing both the target protein and the ligand input files.
Abstract: Most life science processes involve, at the atomic scale, recognition between two molecules. The prediction of such interactions at the molecular level, by so-called docking software, is a non-trivial task. Docking programs have a wide range of applications ranging from protein engineering to drug design. This article presents SwissDock, a web server dedicated to the docking of small molecules on target proteins. It is based on the EADock DSS engine, combined with setup scripts for curating common problems and for preparing both the target protein and the ligand input files. An efficient Ajax/HTML interface was designed and implemented so that scientists can easily submit dockings and retrieve the predicted complexes. For automated docking tasks, a programmatic SOAP interface has been set up and template programs can be downloaded in Perl, Python and PHP. The web site also provides an access to a database of manually curated complexes, based on the Ligand Protein Database. A wiki and a forum are available to the community to promote interactions between users. The SwissDock web site is available online at http://www.swissdock.ch. We believe it constitutes a step toward generalizing the use of docking tools beyond the traditional molecular modeling community.

Journal ArticleDOI
TL;DR: The miRTarBase contains the largest amount of validated MTIs by comparing with other similar, previously developed databases and can also provide a large amount of positive samples to develop computational methods capable of identifying miRNA–target interactions.
Abstract: MicroRNAs (miRNAs), ie small non-coding RNA molecules (∼22 nt), can bind to one or more target sites on a gene transcript to negatively regulate protein expression, subsequently controlling many cellular mechanisms A current and curated collection of miRNA-target interactions (MTIs) with experimental support is essential to thoroughly elucidating miRNA functions under different conditions and in different species As a database, miRTarBase has accumulated more than 3500 MTIs by manually surveying pertinent literature after data mining of the text systematically to filter research articles related to functional studies of miRNAs Generally, the collected MTIs are validated experimentally by reporter assays, western blot, or microarray experiments with overexpression or knockdown of miRNAs miRTarBase curates 3576 experimentally verified MTIs between 657 miRNAs and 2297 target genes among 17 species miRTarBase contains the largest amount of validated MTIs by comparing with other similar, previously developed databases The MTIs collected in the miRTarBase can also provide a large amount of positive samples to develop computational methods capable of identifying miRNA-target interactions miRTarBase is now available on http://miRTarBasembcnctuedutw/, and is updated frequently by continuously surveying research articles

Journal ArticleDOI
TL;DR: Recent database enhancements are described, including new search and data representation tools, as well as a brief review of how the community uses GEO data.
Abstract: A decade ago, the Gene Expression Omnibus (GEO) database was established at the National Center for Biotechnology Information (NCBI). The original objective of GEO was to serve as a public repository for high-throughput gene expression data generated mostly by microarray technology. However, the research community quickly applied microarrays to non-gene-expression studies, including examination of genome copy number variation and genome-wide profiling of DNA-binding proteins. Because the GEO database was designed with a flexible structure, it was possible to quickly adapt the repository to store these data types. More recently, as the microarray community switches to next-generation sequencing technologies, GEO has again adapted to host these data sets. Today, GEO stores over 20 000 microarray- and sequence-based functional genomics studies, and continues to handle the majority of direct high-throughput data submissions from the research community. Multiple mechanisms are provided to help users effectively search, browse, download and visualize the data at the level of individual genes or entire studies. This paper describes recent database enhancements, including new search and data representation tools, as well as a brief review of how the community uses GEO data. GEO is freely accessible at http://www.ncbi.nlm.nih.gov/geo/.

Journal ArticleDOI
TL;DR: A web-based interface that enables biologists to browse and search a comprehensive collection of pathways from multiple sources represented in a common language, a download site that provides integrated bulk sets of pathway information in standard or convenient formats and a web service that software developers can use to conveniently query and access all data.
Abstract: Pathway Commons (http://www.pathwaycommons.org) is a collection of publicly available pathway data from multiple organisms. Pathway Commons provides a web-based interface that enables biologists to browse and search a comprehensive collection of pathways from multiple sources represented in a common language, a download site that provides integrated bulk sets of pathway information in standard or convenient formats and a web service that software developers can use to conveniently query and access all data. Database providers can share their pathway data via a common repository. Pathways include biochemical reactions, complex assembly, transport and catalysis events and physical interactions involving proteins, DNA, RNA, small molecules and complexes. Pathway Commons aims to collect and integrate all public pathway data available in standard formats. Pathway Commons currently contains data from nine databases with over 1400 pathways and 687,000 interactions and will be continually expanded and updated.

Journal ArticleDOI
TL;DR: A series of databases that run parallel to the Protein Data Bank, used for the analysis of properties of protein structures in areas ranging from structural genomics, to cancer biology and protein design, are presented.
Abstract: The Protein Data Bank (PDB) is the world-wide repository of macromolecular structure information. We present a series of databases that run parallel to the PDB. Each database holds one entry, if possible, for each PDB entry. DSSP holds the secondary structure of the proteins. PDBREPORT holds reports on the structure quality and lists errors. HSSP holds a multiple sequence alignment for all proteins. The PDBFINDER holds easy to parse summaries of the PDB file content, augmented with essentials from the other systems. PDB_REDO holds re-refined, and often improved, copies of all structures solved by X-ray. WHY_NOT summarizes why certain files could not be produced. All these systems are updated weekly. The data sets can be used for the analysis of properties of protein structures in areas ranging from structural genomics, to cancer biology and protein design.

Journal ArticleDOI
TL;DR: A new interface for T-Coffee, a consistency-based multiple sequence alignment program, is introduced that provides an easy and intuitive access to the most popular functionality of the package.
Abstract: This article introduces a new interface for T-Coffee, a consistency-based multiple sequence alignment program. This interface provides an easy and intuitive access to the most popular functionality of the package. These include the default T-Coffee mode for protein and nucleic acid sequences, the M-Coffee mode that allows combining the output of any other aligners, and template-based modes of T-Coffee that deliver high accuracy alignments while using structural or homology derived templates. These three available template modes are Expresso for the alignment of protein with a known 3D-Structure, R-Coffee to align RNA sequences with conserved secondary structures and PSI-Coffee to accurately align distantly related sequences using homology extension. The new server benefits from recent improvements of the T-Coffee algorithm and can align up to 150 sequences as long as 10 000 residues and is available from both http:// www.tcoffee.org and its main mirror http://tcoffee .crg.cat.

Journal ArticleDOI
TL;DR: The BioGRID 3.0 web interface contains new search and display features that enable rapid queries across multiple data types and sources that enable insights into conserved networks and pathways that are relevant to human health.
Abstract: The Biological General Repository for Interaction Datasets (BioGRID) is a public database that archives and disseminates genetic and protein interaction data from model organisms and humans (http://www.thebiogrid.org). BioGRID currently holds 347 966 interactions (170 162 genetic, 177 804 protein) curated from both high-throughput data sets and individual focused studies, as derived from over 23 000 publications in the primary literature. Complete coverage of the entire literature is maintained for budding yeast (Saccharomyces cerevisiae), fission yeast (Schizosaccharomyces pombe) and thale cress (Arabidopsis thaliana), and efforts to expand curation across multiple metazoan species are underway. The BioGRID houses 48 831 human protein interactions that have been curated from 10 247 publications. Current curation drives are focused on particular areas of biology to enable insights into conserved networks and pathways that are relevant to human health. The BioGRID 3.0 web interface contains new search and display features that enable rapid queries across multiple data types and sources. An automated Interaction Management System (IMS) is used to prioritize, coordinate and track curation across international sites and projects. BioGRID provides interaction data to several model organism databases, resources such as Entrez-Gene and other interaction meta-databases. The entire BioGRID 3.0 data collection may be downloaded in multiple file formats, including PSI MI XML. Source code for BioGRID 3.0 is freely available without any restrictions.

Journal ArticleDOI
TL;DR: The results show that active CRISPR/Cas systems can be transferred across distant genera and provide heterologous interference against invasive nucleic acids and can be leveraged to develop strains more robust against phage attack, and safer organisms less likely to uptake and disseminate plasmid-encoded undesirable genetic elements.
Abstract: The CRISPR/Cas adaptive immune system provides resistance against phages and plasmids in Archaea and Bacteria. CRISPR loci integrate short DNA sequences from invading genetic elements that provide small RNA-mediated interference in subsequent exposure to matching nucleic acids. In Streptococcus thermophilus, it was previously shown that the CRISPR1/Cas system can provide adaptive immunity against phages and plasmids by integrating novel spacers following exposure to these foreign genetic elements that subsequently direct the specific cleavage of invasive homologous DNA sequences. Here, we show that the S. thermophilus CRISPR3/Cas system can be transferred into Escherichia coli and provide heterologous protection against plasmid transformation and phage infection. We show that interference is sequence-specific, and that mutations in the vicinity or within the proto-spacer adjacent motif (PAM) allow plasmids to escape CRISPR-encoded immunity. We also establish that cas9 is the sole cas gene necessary for CRISPR-encoded interference. Furthermore, mutation analysis revealed that interference relies on the Cas9 McrA/HNH- and RuvC/RNaseH-motifs. Altogether, our results show that active CRISPR/Cas systems can be transferred across distant genera and provide heterologous interference against invasive nucleic acids. This can be leveraged to develop strains more robust against phage attack, and safer organisms less likely to uptake and disseminate plasmid-encoded undesirable genetic elements.

Journal ArticleDOI
TL;DR: The combination of high nuclease activity with reduced cytotoxicity and the simple design process marks TALENs as a key technology platform for targeted modifications of complex genomes.
Abstract: Sequence-specific nucleases represent valuable tools for precision genome engineering. Traditionally, zinc-finger nucleases (ZFNs) and meganucleases have been used to specifically edit complex genomes. Recently, the DNA binding domains of transcription activator-like effectors (TALEs) from the bacterial pathogen Xanthomonas have been harnessed to direct nuclease domains to desired genomic loci. In this study, we tested a panel of truncation variants based on the TALE protein AvrBs4 to identify TALE nucleases (TALENs) with high DNA cleavage activity. The most favorable parameters for efficient DNA cleavage were determined in vitro and in cellular reporter assays. TALENs were designed to disrupt an EGFP marker gene and the human loci CCR5 and IL2RG. Gene editing was achieved in up to 45% of transfected cells. A side-by-side comparison with ZFNs showed similar gene disruption activities by TALENs but significantly reduced nucleaseassociated cytotoxicities. Moreover, the CCR5specific TALEN revealed only minimal off-target activity at the CCR2 locus as compared to the corresponding ZFN, suggesting that the TALEN platform enables the design of nucleases with single-nucleotide specificity. The combination of high nuclease activity with reduced cytotoxicity and the simple design process marks TALENs as a key technology platform for targeted modifications of complex genomes.

Journal ArticleDOI
Anne Morgat1, Rolf Apweiler, Maria Jesus Martin, Claire O'Donovan, Michele Magrane, Yasmin Alam-Faruque, Ricardo Antunes, Daniel Barrell, Benoit Bely, M Bingley, David Binns, Lynette Bower, Paul Browne, Chan Wm, E. Dimmer, Ruth Y. Eberhardt, F. Fazzini, A. Fedotov, Rebecca E. Foulger, John S. Garavelli, Castro Lg, Rachael P. Huntley, Julius O.B. Jacobsen, M. Kleen, Kati Laiho, Duncan Legge, Quan Lin, W Liu, Jie Luo, Sandra Orchard, S. Patient, Klemens Pichler, Diego Poggioli, Nikolas Pontikos, Manuela Pruess, Steven Rosanoff, Tony Sawford, H. Sehra, Edward Turner, M. Corbett, M Donnelly, Van Rensburg P, Ioannis Xenarios, Lydie Bougueleret, Andrea H. Auchincloss, Ghislaine Argoud-Puy, Kristian B. Axelsen, Amos Marc Bairoch, Delphine Baratin, Blatter Mc, Brigitte Boeckmann, Jerven Bolleman, L. Bollondi, Emmanuel Boutet, Quintaje Sb, Lionel Breuza, Alan Bridge, E. Decastro, Elisabeth Coudert, Isabelle Cusin, M Doche, Dolnide Dornevil, Séverine Duvaud, Anne Estreicher, L Famiglietti, M Feuermann, Sebastien Gehant, Serenella Ferro, Elisabeth Gasteiger, Alain Gateau, Vivienne Baillie Gerritsen, Arnaud Gos, Nadine Gruaz-Gumowski, Ursula Hinz, Chantal Hulo, Nicolas Hulo, J. James, S. Jimenez, Florence Jungo, T. Kappler, Guillaume Keller, Vicente Lara, P Lemercier, Damien Lieberherr, Xavier D. Martin, Patrick Masson, M. Moinat, Salvo Paesano, Ivo Pedruzzi, Sandrine Pilbout, Sylvain Poux, Monica Pozzato, Nicole Redaschi, Catherine Rivoire, Bernd Roechert, Maria Victoria Schneider, Christian J. A. Sigrist, K Sonesson, S Staehli, E. Stanley, Andre Stutz, Shyamala Sundaram, Michael Tognolli, Laure Verbregue, Veuthey Al, Wu Ch, Arighi Cn, Leslie Arminski, Barker Wc, Chuming Chen, Yongxing Chen, P. Dubey, He Huang, Raja Mazumder, Peter B. McGarvey, Natale Da, Natarajan Tg, J. Nchoutmboube, Roberts Nv, Suzek Be, U. Ugochukwu, Vinayaka Cr, Qiang Wang, Yuqi Wang, Yeh Ls, Jian Zhang 
TL;DR: The primary mission of Universal Protein Resource (UniProt) is to support biological research by maintaining a stable, comprehensive, fully classified, richly and accurately annotated protein sequence knowledgebase, with extensive cross-references and querying interfaces freely accessible to the scientific community.
Abstract: The primary mission of Universal Protein Resource (UniProt) is to support biological research by maintaining a stable, comprehensive, fully classified, richly and accurately annotated protein sequence knowledgebase, with extensive cross-references and querying interfaces freely accessible to the scientific community. UniProt is produced by the UniProt Consortium which consists of groups from the European Bioinformatics Institute (EBI), the Swiss Institute of Bioinformatics (SIB) and the Protein Information Resource (PIR). UniProt is comprised of four major components, each optimized for different uses: the UniProt Archive, the UniProt Knowledgebase, the UniProt Reference Clusters and the UniProt Metagenomic and Environmental Sequence Database. UniProt is updated and distributed every 4 weeks and can be accessed online for searches or download at http://www.uniprot.org.

Journal ArticleDOI
Jian-Hua Yang1, Jun-Hao Li1, Peng Shao1, Hui Zhou1, Yue-Qin Chen1, Liang-Hu Qu1 
TL;DR: A novel database, starBase (sRNA target Base), is introduced, which is developed to facilitate the comprehensive exploration of miRNA–target interaction maps from CLIP-Seq and Degradome-Sequ data.
Abstract: MicroRNAs (miRNAs) represent an important class of small non-coding RNAs (sRNAs) that regulate gene expression by targeting messenger RNAs. However, assigning miRNAs to their regulatory target genes remains technically challenging. Recently, high-throughput CLIP-Seq and degradome sequencing (Degradome-Seq) methods have been applied to identify the sites of Argonaute interaction and miRNA cleavage sites, respectively. In this study, we introduce a novel database, starBase (sRNA target Base), which we have developed to facilitate the comprehensive exploration of miRNA–target interaction maps from CLIP-Seq and Degradome-Seq data. The current version includes high-throughput sequencing data generated from 21 CLIP-Seq and 10 DegradomeSeq experiments from six organisms. By analyzing millions of mapped CLIP-Seq and Degradome-Seq reads, we identified � 1 million Ago-binding clusters and � 2 million cleaved target clusters in animals and plants, respectively. Analyses of these clusters, and of target sites predicted by 6 miRNA target prediction programs, resulted in our identification of approximately 400 000 and approximately 66 000 miRNA-target regulatory relationships from CLIP-Seq and Degradome-Seq data, respectively. Furthermore, two web servers were provided to discover novel miRNA target sites from CLIP-Seq and Degradome-Seq data. Our web implementation supports diverse query types and exploration of common targets, gene ontologies and pathways. The starBase is available at http://starbase.sysu .edu.cn/.

Journal ArticleDOI
TL;DR: The expanded future role of The RNA Modification Database will be to serve as a primary information portal for researchers across the entire spectrum of RNA-related research.
Abstract: Since its inception in 1994, The RNA Modification Database (RNAMDB, http://rna-mdb.cas.albany.edu/RNAmods/) has served as a focal point for information pertaining to naturally occurring RNA modifications. In its current state, the database employs an easy-to-use, searchable interface for obtaining detailed data on the 109 currently known RNA modifications. Each entry provides the chemical structure, common name and symbol, elemental composition and mass, CA registry numbers and index name, phylogenetic source, type of RNA species in which it is found, and references to the first reported structure determination and synthesis. Though newly transferred in its entirety to The RNA Institute, the RNAMDB continues to grow with two notable additions, agmatidine and 8-methyladenosine, appended in the last year. The RNA Modification Database is staying up-to-date with significant improvements being prepared for inclusion within the next year and the following year. The expanded future role of The RNA Modification Database will be to serve as a primary information portal for researchers across the entire spectrum of RNA-related research.

Journal ArticleDOI
TL;DR: The allele frequency net database is an online repository that contains information on the frequencies of immune genes and their corresponding alleles in different populations that has been used in a wide variety of contexts, including clinical applications, epidemiology and population genetics.
Abstract: The allele frequency net database (http://www.allelefrequencies.net) is an online repository that contains information on the frequencies of immune genes and their corresponding alleles in different populations. The extensive variability observed in genes and alleles related to the immune system response and its significance in transplantation, disease association studies and diversity in populations led to the development of this electronic resource. At present, the system contains data from 1133 populations in 608 813 individuals on the frequency of genes from different polymorphic regions such as human leukocyte antigens, killer-cell immunoglobulin-like receptors, major histocompatibility complex Class I chain-related genes and a number of cytokine gene polymorphisms. The project was designed to create a central source for the storage of frequency data and provide individuals with a set of bioinformatics tools to analyze the occurrence of these variants in worldwide populations. The resource has been used in a wide variety of contexts, including clinical applications (histocompatibility, immunology, epidemiology and pharmacogenetics) and population genetics. Demographic information, frequency data and searching tools can be freely accessed through the website.