Showing papers in "Nucleic Acids Research in 2011"

PDF

Open Access

Journal Article•DOI•

HMMER web server: interactive sequence similarity searching

[...]

Robert D. Finn¹, Jody Clements¹, Sean R. Eddy¹•Institutions (1)

01 Jul 2011-Nucleic Acids Research

TL;DR: This work has focused on minimizing search times and the ability to rapidly display tabular results, regardless of the number of matches found, developing graphical summaries of the search results to provide quick, intuitive appraisement of them.

...read moreread less

Abstract: HMMER is a software suite for protein sequence similarity searches using probabilistic methods Previously, HMMER has mainly been available only as a computationally intensive UNIX command-line tool, restricting its use Recent advances in the software, HMMER3, have resulted in a 100-fold speed gain relative to previous versions It is now feasible to make efficient profile hidden Markov model (profile HMM) searches via the web A HMMER web server (http://hmmerjaneliaorg) has been designed and implemented such that most protein database searches return within a few seconds Methods are available for searching either a single protein sequence, multiple protein sequence alignment or profile HMM against a target sequence database, and for searching a protein sequence against Pfam The web server is designed to cater to a range of different user expertise and accepts batch uploading of multiple queries at once All search methods are also available as RESTful web services, thereby allowing them to be readily integrated as remotely executed tasks in locally scripted workflows We have focused on minimizing search times and the ability to rapidly display tabular results, regardless of the number of matches found, developing graphical summaries of the search results to provide quick, intuitive appraisement of them

...read moreread less

4,159 citations

Journal Article•DOI•

miRBase: integrating microRNA annotation and deep-sequencing data

[...]

Ana Kozomara¹, Sam Griffiths-Jones¹•Institutions (1)

University of Manchester¹

01 Jan 2011-Nucleic Acids Research

TL;DR: This work has mapped reads from short RNA deep-sequencing experiments to microRNAs in miRBase and developed web interfaces to view these mappings, which can be used as a proxy for relative expression levels of microRNA sequences, provide detailed evidence for microRNA annotations and alternative isoforms of mature micro RNAs, and allow us to revisit previous annotations.

...read moreread less

Abstract: miRBase is the primary online repository for all microRNA sequences and annotation. The current release (miRBase 16) contains over 15,000 microRNA gene loci in over 140 species, and over 17,000 distinct mature microRNA sequences. Deep-sequencing technologies have delivered a sharp rise in the rate of novel microRNA discovery. We have mapped reads from short RNA deep-sequencing experiments to microRNAs in miRBase and developed web interfaces to view these mappings. The user can view all read data associated with a given microRNA annotation, filter reads by experiment and count, and search for microRNAs by tissue- and stage-specific expression. These data can be used as a proxy for relative expression levels of microRNA sequences, provide detailed evidence for microRNA annotations and alternative isoforms of mature microRNAs, and allow us to revisit previous annotations. miRBase is available online at: http://www.mirbase.org/.

...read moreread less

3,618 citations

Journal Article•DOI•

KOBAS 2.0: a web server for annotation and identification of enriched pathways and diseases

[...]

Chen Xie¹, Xizeng Mao², Jiaju Huang², Yang Ding², Jianmin Wu², Shan Dong², Lei Kong², Ge Gao², Chuan-Yun Li², Liping Wei² - Show less +6 more•Institutions (2)

Peking University¹, Garvan Institute of Medical Research²

01 Jul 2011-Nucleic Acids Research

TL;DR: A web server, KOBAS 2.0, is reported, which annotates an input set of genes with putative pathways and disease relationships based on mapping to genes with known annotations, which allows for both ID mapping and cross-species sequence similarity mapping.

...read moreread less

Abstract: High-throughput experimental technologies often identify dozens to hundreds of genes related to, or changed in, a biological or pathological process. From these genes one wants to identify biological pathways that may be involved and diseases that may be implicated. Here, we report a web server, KOBAS 2.0, which annotates an input set of genes with putative pathways and disease relationships based on mapping to genes with known annotations. It allows for both ID mapping and cross-species sequence similarity mapping. It then performs statistical tests to identify statistically significantly enriched pathways and diseases. KOBAS 2.0 incorporates knowledge across 1327 species from 5 pathway databases (KEGG PATHWAY, PID, BioCyc, Reactome and Panther) and 5 human disease databases (OMIM, KEGG DISEASE, FunDO, GAD and NHGRI GWAS Catalog). KOBAS 2.0 can be accessed at http://kobas.cbi.pku.edu.cn.

...read moreread less

3,293 citations

Journal Article•DOI•

The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored

[...]

Damian Szklarczyk¹, Andrea Franceschini², Michael Kuhn², Milan Simonovic², Alexander Roth², Pablo Minguez², Tobias Doerks², Manuel Stark², Jean Muller², Peer Bork², Lars Juhl Jensen², Christian von Mering² - Show less +8 more•Institutions (2)

University of Copenhagen¹, Swiss Institute of Bioinformatics²

01 Jan 2011-Nucleic Acids Research

TL;DR: An update on the online database resource Search Tool for the Retrieval of Interacting Genes (STRING), which provides uniquely comprehensive coverage and ease of access to both experimental as well as predicted interaction information.

...read moreread less

Abstract: An essential prerequisite for any systems-level understanding of cellular functions is to correctly uncover and annotate all functional interactions among proteins in the cell. Toward this goal, remarkable progress has been made in recent years, both in terms of experimental measurements and computational prediction techniques. However, public efforts to collect and present protein interaction information have struggled to keep up with the pace of interaction discovery, partly because protein-protein interaction information can be error-prone and require considerable effort to annotate. Here, we present an update on the online database resource Search Tool for the Retrieval of Interacting Genes (STRING); it provides uniquely comprehensive coverage and ease of access to both experimental as well as predicted interaction information. Interactions in STRING are provided with a confidence score, and accessory information such as protein domains and 3D structures is made available, all within a stable and consistent identifier space. New features in STRING include an interactive network viewer that can cluster networks on demand, updated on-screen previews of structural information including homology models, extensive data updates and strongly improved connectivity and integration with third-party resources. Version 9.0 of STRING covers more than 1100 completely sequenced organisms; the resource can be reached at http://string-db.org.

...read moreread less

3,239 citations

Journal Article•DOI•

CDD: a Conserved Domain Database for the functional annotation of proteins

[...]

Aron Marchler-Bauer¹, Shennan Lu¹, John B. Anderson¹, Farideh Chitsaz¹, Myra K. Derbyshire¹, Carol DeWeese-Scott¹, Jessica H. Fong¹, Lewis Y. Geer¹, Renata C. Geer¹, Noreen R. Gonzales¹, Marc Gwadz¹, David I. Hurwitz¹, John D. Jackson¹, Zhaoxi Ke¹, Christopher J. Lanczycki¹, Fu-Ping Lu¹, Gabriele H. Marchler¹, Mikhail Mullokandov¹, Marina V. Omelchenko¹, Cynthia L. Robertson¹, James S. Song¹, Narmada Thanki¹, Roxanne A. Yamashita¹, Dachuan Zhang¹, Naigong Zhang¹, Chanjuan Zheng¹, Stephen H. Bryant¹ - Show less +23 more•Institutions (1)

National Institutes of Health¹

01 Jan 2011-Nucleic Acids Research

TL;DR: NCBI’s Conserved Domain Database (CDD) is a resource for the annotation of protein sequences with the location of conserved domain footprints, and functional sites inferred from these footprints.

...read moreread less

Abstract: NCBI's Conserved Domain Database (CDD) is a resource for the annotation of protein sequences with the location of conserved domain footprints, and functional sites inferred from these footprints. CDD includes manually curated domain models that make use of protein 3D structure to refine domain models and provide insights into sequence/structure/function relationships. Manually curated models are organized hierarchically if they describe domain families that are clearly related by common descent. As CDD also imports domain family models from a variety of external sources, it is a partially redundant collection. To simplify protein annotation, redundant models and models describing homologous families are clustered into superfamilies. By default, domain footprints are annotated with the corresponding superfamily designation, on top of which specific annotation may indicate high-confidence assignment of family membership. Pre-computed domain annotation is available for proteins in the Entrez/Protein dataset, and a novel interface, Batch CD-Search, allows the computation and download of annotation for large sets of protein queries. CDD can be accessed via http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml.

...read moreread less

2,934 citations

Journal Article•DOI•

COSMIC: mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer.

[...]

Simon A. Forbes¹, Nidhi Bindal¹, Sally Bamford¹, Charlotte G. Cole¹, Chai Yin Kok¹, David Beare¹, Mingming Jia¹, Rebecca Shepherd¹, Kenric Leung¹, Andrew Menzies¹, Jon W. Teague¹, Peter J. Campbell¹, Michael R. Stratton¹, P. Andrew Futreal - Show less +10 more•Institutions (1)

Wellcome Trust Sanger Institute¹

01 Jan 2011-Nucleic Acids Research

TL;DR: With all genomic information recently updated to GRCh37, COSMIC integrates many diverse types of mutation information and is making much closer links with Ensembl and other data resources.

...read moreread less

Abstract: COSMIC (http://www.sanger.ac.uk/cosmic) curates comprehensive information on somatic mutations in human cancer. Release v48 (July 2010) describes over 136 000 coding mutations in almost 542 000 tumour samples; of the 18 490 genes documented, 4803 (26%) have one or more mutations. Full scientific literature curations are available on 83 major cancer genes and 49 fusion gene pairs (19 new cancer genes and 30 new fusion pairs this year) and this number is continually increasing. Key amongst these is TP53, now available through a collaboration with the IARC p53 database. In addition to data from the Cancer Genome Project (CGP) at the Sanger Institute, UK, and The Cancer Genome Atlas project (TCGA), large systematic screens are also now curated. Major website upgrades now make these data much more mineable, with many new selection filters and graphics. A Biomart is now available allowing more automated data mining and integration with other biological databases. Annotation of genomic features has become a significant focus; COSMIC has begun curating full-genome resequencing experiments, developing new web pages, export formats and graphics styles. With all genomic information recently updated to GRCh37, COSMIC integrates many diverse types of mutation information and is making much closer links with Ensembl and other data resources.

...read moreread less

2,270 citations

Journal Article•DOI•

Efficient design and assembly of custom TALEN and other TAL effector-based constructs for DNA targeting

[...]

Tomas Cermak¹, Erin L. Doyle², Michelle Christian², Li-Li Wang², Yong Zhang², Clarice Schmidt², Joshua A. Baller², Nikunj V. Somia², Adam J. Bogdanove², Daniel F. Voytas² - Show less +6 more•Institutions (2)

University of Minnesota¹, University of Electronic Science and Technology of China²

01 Jul 2011-Nucleic Acids Research

TL;DR: A method and reagents for efficiently assembling TALEN constructs with custom repeat arrays are presented and design guidelines based on naturally occurring TAL effectors and their binding sites are described.

...read moreread less

Abstract: TALENs are important new tools for genome engineering. Fusions of transcription activator-like (TAL) effectors of plant pathogenic Xanthomonas spp. to the FokI nuclease, TALENs bind and cleave DNA in pairs. Binding specificity is determined by customizable arrays of polymorphic amino acid repeats in the TAL effectors. We present a method and reagents for efficiently assembling TALEN constructs with custom repeat arrays. We also describe design guidelines based on naturally occurring TAL effectors and their binding sites. Using software that applies these guidelines, in nine genes from plants, animals and protists, we found candidate cleavage sites on average every 35bp. Each of 15 sites selected from this set was cleaved in a yeast-based assay with TALEN pairs constructed with our reagents. We used two of the TALEN pairs to mutate HPRT1 in human cells and ADH1 in Arabidopsis thaliana protoplasts. Our reagents include a plasmid construct for making custom TAL effectors and one for TAL effector fusions to additional proteins of interest. Using the former, we constructed de novo a functional analog of AvrHah1 of Xanthomonas gardneri. The complete plasmid set is available through the non-profit repository AddGene

...read moreread less

2,175 citations

Journal Article•DOI•

The sequence read archive.

[...]

Rasko Leinonen¹, Hideaki Sugawara¹, Martin Shumway¹•Institutions (1)

National Institute of Genetics¹

01 Jan 2011-Nucleic Acids Research

TL;DR: The content and structure of the SRA is presented, support for sequencing platforms and recommended data submission levels and formats are provided and the response to the challenge of data growth is outlined.

...read moreread less

Abstract: The combination of significantly lower cost and increased speed of sequencing has resulted in an explosive growth of data submitted into the primary next-generation sequence data archive, the Sequence Read Archive (SRA). The preservation of experimental data is an important part of the scientific record, and increasing numbers of journals and funding agencies require that next-generation sequence data are deposited into the SRA. The SRA was established as a public repository for the next-generation sequence data and is operated by the International Nucleotide Sequence Database Collaboration (INSDC). INSDC partners include the National Center for Biotechnology Information (NCBI), the European Bioinformatics Institute (EBI) and the DNA Data Bank of Japan (DDBJ). The SRA is accessible at http://www.ncbi.nlm.nih.gov/Traces/sra from NCBI, at http://www.ebi.ac.uk/ena from EBI and at http://trace.ddbj.nig.ac.jp from DDBJ. In this article, we present the content and structure of the SRA, detail our support for sequencing platforms and provide recommended data submission levels and formats. We also briefly outline our response to the challenge of data growth.

...read moreread less

2,169 citations

Journal Article•DOI•

The UCSC Genome Browser database: update 2011

[...]

Pauline A. Fujita¹, Brooke Rhead¹, Ann S. Zweig¹, Angie S. Hinrichs¹, Donna Karolchik¹, Melissa S. Cline¹, Mary Goldman¹, Galt P. Barber¹, Hiram Clawson¹, Antonio Coelho¹, Mark Diekhans¹, Timothy R. Dreszer¹, Belinda Giardine¹, Rachel A. Harte¹, Jennifer Hillman-Jackson¹, Fan Hsu¹, Vanessa M. Kirkup¹, Robert M. Kuhn¹, Katrina Learned¹, Chin H. Li¹, Laurence R. Meyer¹, Andy Pohl¹, Brian J. Raney¹, Kate R. Rosenbloom¹, Kayla E. Smith¹, David Haussler¹, W. James Kent¹ - Show less +23 more•Institutions (1)

University of California, Santa Cruz¹

01 Jan 2011-Nucleic Acids Research

TL;DR: New data highlights include seven new genome assemblies, a Neandertal genome data portal, phenotype and disease association data, a human RNA editing track, and a zebrafish Conservation track.

...read moreread less

Abstract: The University of California, Santa Cruz Genome Browser (http://genome.ucsc.edu) offers online access to a database of genomic sequence and annotation data for a wide variety of organisms. The Browser also has many tools for visualizing, comparing and analyzing both publicly available and user-generated genomic data sets, aligning sequences and uploading user data. Among the features released this year are a gene search tool and annotation track drag-reorder functionality as well as support for BAM and BigWig/BigBed file formats. New display enhancements include overlay of multiple wiggle tracks through use of transparent coloring, options for displaying transformed wiggle data, a ‘mean+whiskers’ windowing function for display of wiggle data at high zoom levels, and more color schemes for microarray data. New data highlights include seven new genome assemblies, a Neandertal genome data portal, phenotype and disease association data, a human RNA editing track, and a zebrafish Conservation track. We also describe updates to existing tracks.

...read moreread less

1,818 citations

Journal Article•DOI•

PHAST: A Fast Phage Search Tool

[...]

You-Xing Zhou¹, Yongjie Liang², Karlene H Lynch², Jonathan J. Dennis², David S. Wishart² - Show less +1 more•Institutions (2)

University of Alberta¹, National Institute for Nanotechnology²

01 Jul 2011-Nucleic Acids Research

TL;DR: PHAge Search Tool (PHAST) is a web server designed to rapidly and accurately identify, annotate and graphically display prophage sequences within bacterial genomes or plasmids.

...read moreread less

Abstract: PHAge Search Tool (PHAST) is a web server designed to rapidly and accurately identify, annotate and graphically display prophage sequences within bacterial genomes or plasmids. It accepts either raw DNA sequence data or partially annotated GenBank formatted data and rapidly performs a number of database comparisons as well as phage ‘cornerstone’ feature identification steps to locate, annotate and display prophage sequences and prophage features. Relative to other prophage identification tools, PHAST is up to 40 times faster and up to 15% more sensitive. It is also able to process and annotate both raw DNA sequence data and Genbank files, provide richly annotated tables on prophage features and prophage ‘quality’ and distinguish between intact and incomplete prophage. PHAST also generates downloadable, high quality, interactive graphics that display all identified prophage components in both circular and linear genomic views. PHAST is available at (http://phast.wishartlab.com).

...read moreread less

1,767 citations

Journal Article•DOI•

DrugBank 3.0: a comprehensive resource for ‘Omics’ research on drugs

[...]

Craig Knox¹, Vivian Law², Timothy Jewison², Philip Liu², Son Ly², Alex Frolkis², Allison Pon², Kelly Banco², Christine Mak², Vanessa Neveu², Yannick Djoumbou², Roman Eisner², An Chi Guo², David S. Wishart² - Show less +10 more•Institutions (2)

University of Alberta¹, National Institute for Nanotechnology²

01 Jan 2011-Nucleic Acids Research

TL;DR: DrugBank 3.0 represents the result of 2 years of manual annotation work aimed at making the database much more useful for a wide range of ‘omics’ applications, particularly with regard to drug target, drug description and drug action data.

...read moreread less

Abstract: DrugBank (http://www.drugbank.ca) is a richly annotated database of drug and drug target information. It contains extensive data on the nomenclature, ontology, chemistry, structure, function, action, pharmacology, pharmacokinetics, metabolism and pharmaceutical properties of both small molecule and large molecule (biotech) drugs. It also contains comprehensive information on the target diseases, proteins, genes and organisms on which these drugs act. First released in 2006, DrugBank has become widely used by pharmacists, medicinal chemists, pharmaceutical researchers, clinicians, educators and the general public. Since its last update in 2008, DrugBank has been greatly expanded through the addition of new drugs, new targets and the inclusion of more than 40 new data fields per drug entry (a 40% increase in data ‘depth’). These data field additions include illustrated drug-action pathways, drug transporter data, drug metabolite data, pharmacogenomic data, adverse drug response data, ADMET data, pharmacokinetic data, computed property data and chemical classification data. DrugBank 3.0 also offers expanded database links, improved search tools for drug–drug and food–drug interaction, new resources for querying and viewing drug pathways and hundreds of new drug entries with detailed patent, pricing and manufacturer data. These additions have been complemented by enhancements to the quality and quantity of existing data, particularly with regard to drug target, drug description and drug action data. DrugBank 3.0 represents the result of 2 years of manual annotation work aimed at making the database much more useful for a wide range of ‘omics’ (i.e. pharmacogenomic, pharmacoproteomic, pharmacometabolomic and even pharmacoeconomic) applications.

...read moreread less

Journal Article•DOI•

Predicting the functional impact of protein mutations: application to cancer genomics

[...]

Boris Reva¹, Yevgeniy Antipin¹, Chris Sander¹•Institutions (1)

Memorial Sloan Kettering Cancer Center¹

01 Sep 2011-Nucleic Acids Research

TL;DR: A new functional impact score (FIS) for amino acid residue changes using evolutionary conservation patterns is introduced, estimating that at least 5% of cancer-relevant mutations involve switch of function, rather than simply loss or gain of function.

...read moreread less

Abstract: As large-scale re-sequencing of genomes reveals many protein mutations, especially in human cancer tissues, prediction of their likely functional impact becomes important practical goal. Here, we introduce a new functional impact score (FIS) for amino acid residue changes using evolutionary conservation patterns. The information in these patterns is derived from aligned families and sub-families of sequence homologs within and between species using combinatorial entropy formalism. The score performs well on a large set of human protein mutations in separating disease-associated variants (∼19 200), assumed to be strongly functional, from common polymorphisms (∼35 600), assumed to be weakly functional (area under the receiver operating characteristic curve of ∼0.86). In cancer, using recurrence, multiplicity and annotation for ∼10 000 mutations in the COSMIC database, the method does well in assigning higher scores to more likely functional mutations (‘drivers’). To guide experimental prioritization, we report a list of about 1000 top human cancer genes frequently mutated in one or more cancer types ranked by likely functional impact; and, an additional 1000 candidate cancer genes with rare but likely functional mutations. In addition, we estimate that at least 5% of cancer-relevant mutations involve switch of function, rather than simply loss or gain of function.

...read moreread less

Journal Article•DOI•

Characterization of extracellular circulating microRNA

[...]

Andrey Turchinovich¹, Ludmila Weiz¹, Anne Langheinz¹, Barbara Burwinkel•Institutions (1)

German Cancer Research Center¹

01 Sep 2011-Nucleic Acids Research

TL;DR: This is the first study to show that extracellular miRNAs are predominantly exosomes/microvesicles free and are associated with Ago proteins, and hypothesize that ext racellular miRNA are in the most part by-products of dead cells that remain in extrace cellular space due to the high stability of the Ago2 protein and Ago2-miRNA complex.

...read moreread less

Abstract: MicroRNAs (miRNAs), a class of post-transcriptional gene expression regulators, have recently been detected in human body fluids, including peripheral blood plasma as extracellular nuclease resistant entities. However, the origin and function of extracellular circulating miRNA remain essentially unknown. Here, we confirmed that circulating mature miRNA in contrast to mRNA or snRNA is strikingly stable in blood plasma and cell culture media. Furthermore, we found that most miRNA in plasma and cell culture media completely passed through 0.22 µm filters but remained in the supernatant after ultracentrifugation at 110 000g indicating the non-vesicular origin of the extracellular miRNA. Furthermore, western blot immunoassay revealed that extracellular miRNA ultrafiltrated together with the 96 kDa Ago2 protein, a part of RNA-induced silencing complex. Moreover, miRNAs in both blood plasma and cell culture media co-immunoprecipited with anti-Ago2 antibody in a detergent free environment. This is the first study to show that extracellular miRNAs are predominantly exosomes/microvesicles free and are associated with Ago proteins. We hypothesize that extracellular miRNAs are in the most part by-products of dead cells that remain in extracellular space due to the high stability of the Ago2 protein and Ago2-miRNA complex. Nevertheless, our data does not reject the possibility that some miRNAs can be associated with exosomes.

...read moreread less

Journal Article•DOI•

psRNATarget: a plant small RNA target analysis server

[...]

Xinbin Dai, Patrick X. Zhao

01 Jul 2011-Nucleic Acids Research

TL;DR: The psRNATarget as mentioned in this paper target analysis server is designed for high-throughput analysis of next-generation data with an efficient distributed computing back-end pipeline that runs on a Linux cluster.

...read moreread less

Abstract: Plant endogenous non-coding short small RNAs (20–24 nt), including microRNAs (miRNAs) and a subset of small interfering RNAs (ta-siRNAs), play important role in gene expression regulatory networks (GRNs). For example, many transcription factors and development-related genes have been reported as targets of these regulatory small RNAs. Although a number of miRNA target prediction algorithms and programs have been developed, most of them were designed for animal miRNAs which are significantly different from plant miRNAs in the target recognition process. These differences demand the development of separate plant miRNA (and ta-siRNA) target analysis tool(s). We present psRNATarget, a plant small RNA target analysis server, which features two important analysis functions: (i) reverse complementary matching between small RNA and target transcript using a proven scoring schema, and (ii) target-site accessibility evaluation by calculating unpaired energy (UPE) required to ‘open’ secondary structure around small RNA’s target site on mRNA. The psRNATarget incorporates recent discoveries in plant miRNA target recognition, e.g. it distinguishes translational and post-transcriptional inhibition, and it reports the number of small RNA/target site pairs that may affect small RNA binding activity to target transcript. The psRNATarget server is designed for high-throughput analysis of next-generation data with an efficient distributed computing back-end pipeline that runs on a Linux cluster. The server front-end integrates three simplified user-friendly interfaces to accept user-submitted or preloaded small RNAs and transcript sequences; and outputs a comprehensive list of small RNA/target pairs along with the online tools for batch downloading, key word searching and results sorting. The psRNATarget server is freely available at http://plantgrn.noble.org/psRNATarget/.

...read moreread less

Journal Article•DOI•

antiSMASH: rapid identification, annotation and analysis of secondary metabolite biosynthesis gene clusters in bacterial and fungal genome sequences

[...]

Marnix H. Medema¹, Kai Blin², Peter Cimermancic³, Victor de Jager⁴, Victor de Jager⁵, Piotr Zakrzewski¹, Michael A. Fischbach³, Tilmann Weber², Eriko Takano, Rainer Breitling⁶ - Show less +6 more•Institutions (6)

University of Groningen¹, University of Tübingen², University of California, San Francisco³, Wageningen University and Research Centre⁴, Radboud University Nijmegen⁵, University of Glasgow⁶

01 Jul 2011-Nucleic Acids Research

TL;DR: This work presents the first comprehensive pipeline capable of identifying biosynthetic loci covering the whole range of known secondary metabolite compound classes, and integrates or cross-links all previously available secondary-metabolite specific gene analysis methods in one interactive view.

...read moreread less

Abstract: Bacterial and fungal secondary metabolism is a rich source of novel bioactive compounds with potential pharmaceutical applications as antibiotics, anti-tumor drugs or cholesterol-lowering drugs To find new drug candidates, microbiologists are increasingly relying on sequencing genomes of a wide variety of microbes However, rapidly and reliably pinpointing all the potential gene clusters for secondary metabolites in dozens of newly sequenced genomes has been extremely challenging, due to their biochemical heterogeneity, the presence of unknown enzymes and the dispersed nature of the necessary specialized bioinformatics tools and resources Here, we present antiSMASH (antibiotics & Secondary Metabolite Analysis Shell), the first comprehensive pipeline capable of identifying biosynthetic loci covering the whole range of known secondary metabolite compound classes (polyketides, non-ribosomal peptides, terpenes, aminoglycosides, aminocoumarins, indolocarbazoles, lantibiotics, bacteriocins, nucleosides, beta-lactams, butyrolactones, siderophores, melanins and others) It aligns the identified regions at the gene cluster level to their nearest relatives from a database containing all other known gene clusters, and integrates or cross-links all previously available secondary-metabolite specific gene analysis methods in one interactive view antiSMASH is available at http://antismashsecondarymetabolitesorg

...read moreread less

Journal Article•DOI•

Reactome: a database of reactions, pathways and biological processes

[...]

David Croft¹, Gavin O'Kelly, Guanming Wu², Robin Haw², Marc Gillespie³, Lisa Matthews⁴, Michael Caudy², Phani V. Garapati, Gopal R. Gopinath⁵, Bijay Jassal, S Jupe, Irina Kalatskaya², Shahana S. Mahajan⁶, Shahana S. Mahajan⁴, Bruce May², Nelson Ndegwa, Esther Schmidt, Veronica Shamovsky⁴, Christina K. Yung², Ewan Birney, Henning Hermjakob, Peter D'Eustachio⁴, Lincoln Stein⁷, Lincoln Stein⁸, Lincoln Stein² - Show less +21 more•Institutions (8)

European Bioinformatics Institute¹, Ontario Institute for Cancer Research², St. John's University³, New York University⁴, Food and Drug Administration⁵, City University of New York⁶, Cold Spring Harbor Laboratory⁷, University of Toronto⁸

01 Jan 2011-Nucleic Acids Research

TL;DR: A new web site with improved tools for pathway browsing and data analysis is developed, and orthology-based inferences of pathways in non-human species are made, applying Ensembl Compara to identify orthologs of curated human proteins in each of 20 other species.

...read moreread less

Abstract: Reactome (http://www.reactome.org) is a collaboration among groups at the Ontario Institute for Cancer Research, Cold Spring Harbor Laboratory, New York University School of Medicine and The European Bioinformatics Institute, to develop an open source curated bioinformatics database of human pathways and reactions. Recently, we developed a new web site with improved tools for pathway browsing and data analysis. The Pathway Browser is an Systems Biology Graphical Notation (SBGN)-based visualization system that supports zooming, scrolling and event highlighting. It exploits PSIQUIC web services to overlay our curated pathways with molecular interaction data from the Reactome Functional Interaction Network and external interaction databases such as IntAct, BioGRID, ChEMBL, iRefIndex, MINT and STRING. Our Pathway and Expression Analysis tools enable ID mapping, pathway assignment and overrepresentation analysis of user-supplied data sets. To support pathway annotation and analysis in other species, we continue to make orthology-based inferences of pathways in non-human species, applying Ensembl Compara to identify orthologs of curated human proteins in each of 20 other species. The resulting inferred pathway sets can be browsed and analyzed with our Species Comparison tool. Collaborations are also underway to create manually curated data sets on the Reactome framework for chicken, Drosophila and rice.

...read moreread less

Journal Article•DOI•

Interactive Tree Of Life v2: online annotation and display of phylogenetic trees made easy

[...]

Ivica Letunic¹, Peer Bork¹•Institutions (1)

European Bioinformatics Institute¹

01 Jul 2011-Nucleic Acids Research

TL;DR: Current version of iTOL introduces numerous new features and greatly expands the number of supported data set types.

...read moreread less

Abstract: Interactive Tree Of Life (http://itol.embl.de) is a web-based tool for the display, manipulation and annotation of phylogenetic trees. It is freely available and open to everyone. In addition to classical tree viewer functions, iTOL offers many novel ways of annotating trees with various additional data. Current version introduces numerous new features and greatly expands the number of supported data set types. Trees can be interactively manipulated and edited. A free personal account system is available, providing management and sharing of trees in user defined workspaces and projects. Export to various bitmap and vector graphics formats is supported. Batch access interface is available for programmatic access or inclusion of interactive trees into other web services.

...read moreread less

Journal Article•DOI•

SwissDock, a protein-small molecule docking web service based on EADock DSS

[...]

Aurélien Grosdidier¹, Vincent Zoete², Olivier Michielin²•Institutions (2)

Swiss Institute of Bioinformatics¹, Ludwig Institute for Cancer Research²

01 Jul 2011-Nucleic Acids Research

TL;DR: SwissDock, a web server dedicated to the docking of small molecules on target proteins, is presented, based on the EADock DSS engine, combined with setup scripts for curating common problems and for preparing both the target protein and the ligand input files.

...read moreread less

Abstract: Most life science processes involve, at the atomic scale, recognition between two molecules. The prediction of such interactions at the molecular level, by so-called docking software, is a non-trivial task. Docking programs have a wide range of applications ranging from protein engineering to drug design. This article presents SwissDock, a web server dedicated to the docking of small molecules on target proteins. It is based on the EADock DSS engine, combined with setup scripts for curating common problems and for preparing both the target protein and the ligand input files. An efficient Ajax/HTML interface was designed and implemented so that scientists can easily submit dockings and retrieve the predicted complexes. For automated docking tasks, a programmatic SOAP interface has been set up and template programs can be downloaded in Perl, Python and PHP. The web site also provides an access to a database of manually curated complexes, based on the Ligand Protein Database. A wiki and a forum are available to the community to promote interactions between users. The SwissDock web site is available online at http://www.swissdock.ch. We believe it constitutes a step toward generalizing the use of docking tools beyond the traditional molecular modeling community.

...read moreread less

Journal Article•DOI•

miRTarBase: a database curates experimentally validated microRNA–target interactions

[...]

Sheng Da Hsu¹, Feng-Mao Lin¹, Wei-Yun Wu¹, Chao Liang¹, Wei Chih Huang¹, Wen-Ling Chan¹, Wen-Ting Tsai¹, Goun-Zhou Chen¹, Chia-Jung Lee¹, Chih Min Chiu¹, Chia-Hung Chien¹, Ming-Chia Wu¹, Chi Ying F. Huang¹, Ann-Ping Tsou¹, Hsien Da Huang¹ - Show less +11 more•Institutions (1)

National Chiao Tung University¹

01 Jan 2011-Nucleic Acids Research

TL;DR: The miRTarBase contains the largest amount of validated MTIs by comparing with other similar, previously developed databases and can also provide a large amount of positive samples to develop computational methods capable of identifying miRNA–target interactions.

...read moreread less

Abstract: MicroRNAs (miRNAs), ie small non-coding RNA molecules (∼22 nt), can bind to one or more target sites on a gene transcript to negatively regulate protein expression, subsequently controlling many cellular mechanisms A current and curated collection of miRNA-target interactions (MTIs) with experimental support is essential to thoroughly elucidating miRNA functions under different conditions and in different species As a database, miRTarBase has accumulated more than 3500 MTIs by manually surveying pertinent literature after data mining of the text systematically to filter research articles related to functional studies of miRNAs Generally, the collected MTIs are validated experimentally by reporter assays, western blot, or microarray experiments with overexpression or knockdown of miRNAs miRTarBase curates 3576 experimentally verified MTIs between 657 miRNAs and 2297 target genes among 17 species miRTarBase contains the largest amount of validated MTIs by comparing with other similar, previously developed databases The MTIs collected in the miRTarBase can also provide a large amount of positive samples to develop computational methods capable of identifying miRNA-target interactions miRTarBase is now available on http://miRTarBasembcnctuedutw/, and is updated frequently by continuously surveying research articles

...read moreread less

Journal Article•DOI•

NCBI GEO: archive for functional genomics data sets—10 years on

[...]

Tanya Barrett¹, Dennis B. Troup¹, Stephen E. Wilhite¹, Pierre Ledoux¹, Carlos Evangelista¹, Irene F. Kim¹, Maxim Tomashevsky¹, Kimberly A. Marshall¹, Katherine Phillippy¹, Patti M. Sherman¹, Rolf N. Muertter¹, Michelle Holko¹, Oluwabukunmi Ayanbule¹, Andrey Yefanov¹, Alexandra Soboleva¹ - Show less +11 more•Institutions (1)

National Institutes of Health¹

01 Jan 2011-Nucleic Acids Research

TL;DR: Recent database enhancements are described, including new search and data representation tools, as well as a brief review of how the community uses GEO data.

...read moreread less

Abstract: A decade ago, the Gene Expression Omnibus (GEO) database was established at the National Center for Biotechnology Information (NCBI). The original objective of GEO was to serve as a public repository for high-throughput gene expression data generated mostly by microarray technology. However, the research community quickly applied microarrays to non-gene-expression studies, including examination of genome copy number variation and genome-wide profiling of DNA-binding proteins. Because the GEO database was designed with a flexible structure, it was possible to quickly adapt the repository to store these data types. More recently, as the microarray community switches to next-generation sequencing technologies, GEO has again adapted to host these data sets. Today, GEO stores over 20 000 microarray- and sequence-based functional genomics studies, and continues to handle the majority of direct high-throughput data submissions from the research community. Multiple mechanisms are provided to help users effectively search, browse, download and visualize the data at the level of individual genes or entire studies. This paper describes recent database enhancements, including new search and data representation tools, as well as a brief review of how the community uses GEO data. GEO is freely accessible at http://www.ncbi.nlm.nih.gov/geo/.

...read moreread less

Journal Article•DOI•

Pathway Commons, a web resource for biological pathway data

[...]

Ethan Cerami¹, Benjamin Gross², Emek Demir², Igor Rodchenkov², Özgün Babur², Nadia Anwar², Nikolaus Schultz², Gary D. Bader², Chris Sander² - Show less +5 more•Institutions (2)

Memorial Sloan Kettering Cancer Center¹, University of Toronto²

01 Jan 2011-Nucleic Acids Research

TL;DR: A web-based interface that enables biologists to browse and search a comprehensive collection of pathways from multiple sources represented in a common language, a download site that provides integrated bulk sets of pathway information in standard or convenient formats and a web service that software developers can use to conveniently query and access all data.

...read moreread less

Abstract: Pathway Commons (http://www.pathwaycommons.org) is a collection of publicly available pathway data from multiple organisms. Pathway Commons provides a web-based interface that enables biologists to browse and search a comprehensive collection of pathways from multiple sources represented in a common language, a download site that provides integrated bulk sets of pathway information in standard or convenient formats and a web service that software developers can use to conveniently query and access all data. Database providers can share their pathway data via a common repository. Pathways include biochemical reactions, complex assembly, transport and catalysis events and physical interactions involving proteins, DNA, RNA, small molecules and complexes. Pathway Commons aims to collect and integrate all public pathway data available in standard formats. Pathway Commons currently contains data from nine databases with over 1400 pathways and 687,000 interactions and will be continually expanded and updated.

...read moreread less

Journal Article•DOI•

A series of PDB related databases for everyday needs.

[...]

Robbie P. Joosten¹, Tim A. H. te Beek², Elmar Krieger², Maarten L. Hekkelman², Rob Hooft², Reinhard Schneider², Chris Sander², Gert Vriend² - Show less +4 more•Institutions (2)

Netherlands Cancer Institute¹, Radboud University Nijmegen Medical Centre²

01 Jan 2011-Nucleic Acids Research

TL;DR: A series of databases that run parallel to the Protein Data Bank, used for the analysis of properties of protein structures in areas ranging from structural genomics, to cancer biology and protein design, are presented.

...read moreread less

Abstract: The Protein Data Bank (PDB) is the world-wide repository of macromolecular structure information. We present a series of databases that run parallel to the PDB. Each database holds one entry, if possible, for each PDB entry. DSSP holds the secondary structure of the proteins. PDBREPORT holds reports on the structure quality and lists errors. HSSP holds a multiple sequence alignment for all proteins. The PDBFINDER holds easy to parse summaries of the PDB file content, augmented with essentials from the other systems. PDB_REDO holds re-refined, and often improved, copies of all structures solved by X-ray. WHY_NOT summarizes why certain files could not be produced. All these systems are updated weekly. The data sets can be used for the analysis of properties of protein structures in areas ranging from structural genomics, to cancer biology and protein design.

...read moreread less

Journal Article•DOI•

T-Coffee: a web server for the multiple sequence alignment of protein and RNA sequences using structural information and homology extension

[...]

Paolo Di Tommaso¹, Sébastien Moretti², Ioannis Xenarios², Miquel Orobitg², Alberto Montanyola², Jia-Ming Chang², Jean-François Taly², Cedric Notredame² - Show less +4 more•Institutions (2)

Pompeu Fabra University¹, Swiss Institute of Bioinformatics²

01 Jul 2011-Nucleic Acids Research

TL;DR: A new interface for T-Coffee, a consistency-based multiple sequence alignment program, is introduced that provides an easy and intuitive access to the most popular functionality of the package.

...read moreread less

Abstract: This article introduces a new interface for T-Coffee, a consistency-based multiple sequence alignment program. This interface provides an easy and intuitive access to the most popular functionality of the package. These include the default T-Coffee mode for protein and nucleic acid sequences, the M-Coffee mode that allows combining the output of any other aligners, and template-based modes of T-Coffee that deliver high accuracy alignments while using structural or homology derived templates. These three available template modes are Expresso for the alignment of protein with a known 3D-Structure, R-Coffee to align RNA sequences with conserved secondary structures and PSI-Coffee to accurately align distantly related sequences using homology extension. The new server benefits from recent improvements of the T-Coffee algorithm and can align up to 150 sequences as long as 10 000 residues and is available from both http:// www.tcoffee.org and its main mirror http://tcoffee .crg.cat.

...read moreread less

Journal Article•DOI•

The BioGRID Interaction Database: 2011 update

[...]

Chris Stark¹, Bobby-Joe Breitkreutz², Andrew Chatr-aryamontri², Lorrie Boucher², Rose Oughtred², Michael S. Livstone², Julie Nixon², Kimberly Van Auken², Xiaodong Wang², Xiaoqi Shi², Teresa Reguly², Jennifer M. Rust², Andrew G. Winter², Kara Dolinski², Mike Tyers² - Show less +11 more•Institutions (2)

Mount Sinai Hospital, Toronto¹, Ontario Institute for Cancer Research²

01 Jan 2011-Nucleic Acids Research

TL;DR: The BioGRID 3.0 web interface contains new search and display features that enable rapid queries across multiple data types and sources that enable insights into conserved networks and pathways that are relevant to human health.

...read moreread less

Abstract: The Biological General Repository for Interaction Datasets (BioGRID) is a public database that archives and disseminates genetic and protein interaction data from model organisms and humans (http://www.thebiogrid.org). BioGRID currently holds 347 966 interactions (170 162 genetic, 177 804 protein) curated from both high-throughput data sets and individual focused studies, as derived from over 23 000 publications in the primary literature. Complete coverage of the entire literature is maintained for budding yeast (Saccharomyces cerevisiae), fission yeast (Schizosaccharomyces pombe) and thale cress (Arabidopsis thaliana), and efforts to expand curation across multiple metazoan species are underway. The BioGRID houses 48 831 human protein interactions that have been curated from 10 247 publications. Current curation drives are focused on particular areas of biology to enable insights into conserved networks and pathways that are relevant to human health. The BioGRID 3.0 web interface contains new search and display features that enable rapid queries across multiple data types and sources. An automated Interaction Management System (IMS) is used to prioritize, coordinate and track curation across international sites and projects. BioGRID provides interaction data to several model organism databases, resources such as Entrez-Gene and other interaction meta-databases. The entire BioGRID 3.0 data collection may be downloaded in multiple file formats, including PSI MI XML. Source code for BioGRID 3.0 is freely available without any restrictions.

...read moreread less

Journal Article•DOI•

The Streptococcus thermophilus CRISPR/Cas system provides immunity in Escherichia coli

[...]

Rimantas Sapranauskas¹, Giedrius Gasiunas¹, Christophe Fremaux¹, Rodolphe Barrangou¹, Philippe Horvath¹, Virginijus Siksnys¹ - Show less +2 more•Institutions (1)

Vilnius University¹

01 Nov 2011-Nucleic Acids Research

TL;DR: The results show that active CRISPR/Cas systems can be transferred across distant genera and provide heterologous interference against invasive nucleic acids and can be leveraged to develop strains more robust against phage attack, and safer organisms less likely to uptake and disseminate plasmid-encoded undesirable genetic elements.

...read moreread less

Abstract: The CRISPR/Cas adaptive immune system provides resistance against phages and plasmids in Archaea and Bacteria. CRISPR loci integrate short DNA sequences from invading genetic elements that provide small RNA-mediated interference in subsequent exposure to matching nucleic acids. In Streptococcus thermophilus, it was previously shown that the CRISPR1/Cas system can provide adaptive immunity against phages and plasmids by integrating novel spacers following exposure to these foreign genetic elements that subsequently direct the specific cleavage of invasive homologous DNA sequences. Here, we show that the S. thermophilus CRISPR3/Cas system can be transferred into Escherichia coli and provide heterologous protection against plasmid transformation and phage infection. We show that interference is sequence-specific, and that mutations in the vicinity or within the proto-spacer adjacent motif (PAM) allow plasmids to escape CRISPR-encoded immunity. We also establish that cas9 is the sole cas gene necessary for CRISPR-encoded interference. Furthermore, mutation analysis revealed that interference relies on the Cas9 McrA/HNH- and RuvC/RNaseH-motifs. Altogether, our results show that active CRISPR/Cas systems can be transferred across distant genera and provide heterologous interference against invasive nucleic acids. This can be leveraged to develop strains more robust against phage attack, and safer organisms less likely to uptake and disseminate plasmid-encoded undesirable genetic elements.

...read moreread less

Journal Article•DOI•

A novel TALE nuclease scaffold enables high genome editing activity in combination with low toxicity

[...]

Claudio Mussolino¹, Robert Morbitzer², Fabienne Lütge², Nadine Dannemann², Thomas Lahaye², Toni Cathomen² - Show less +2 more•Institutions (2)

Hannover Medical School¹, Ludwig Maximilian University of Munich²

01 Nov 2011-Nucleic Acids Research

TL;DR: The combination of high nuclease activity with reduced cytotoxicity and the simple design process marks TALENs as a key technology platform for targeted modifications of complex genomes.

...read moreread less

Abstract: Sequence-specific nucleases represent valuable tools for precision genome engineering. Traditionally, zinc-finger nucleases (ZFNs) and meganucleases have been used to specifically edit complex genomes. Recently, the DNA binding domains of transcription activator-like effectors (TALEs) from the bacterial pathogen Xanthomonas have been harnessed to direct nuclease domains to desired genomic loci. In this study, we tested a panel of truncation variants based on the TALE protein AvrBs4 to identify TALE nucleases (TALENs) with high DNA cleavage activity. The most favorable parameters for efficient DNA cleavage were determined in vitro and in cellular reporter assays. TALENs were designed to disrupt an EGFP marker gene and the human loci CCR5 and IL2RG. Gene editing was achieved in up to 45% of transfected cells. A side-by-side comparison with ZFNs showed similar gene disruption activities by TALENs but significantly reduced nucleaseassociated cytotoxicities. Moreover, the CCR5specific TALEN revealed only minimal off-target activity at the CCR2 locus as compared to the corresponding ZFN, suggesting that the TALEN platform enables the design of nucleases with single-nucleotide specificity. The combination of high nuclease activity with reduced cytotoxicity and the simple design process marks TALENs as a key technology platform for targeted modifications of complex genomes.

...read moreread less

Journal Article•DOI•

Ongoing and future developments at the Universal Protein Resource

[...]

Anne Morgat¹, Rolf Apweiler, Maria Jesus Martin, Claire O'Donovan, Michele Magrane, Yasmin Alam-Faruque, Ricardo Antunes, Daniel Barrell, Benoit Bely, M Bingley, David Binns, Lynette Bower, Paul Browne, Chan Wm, E. Dimmer, Ruth Y. Eberhardt, F. Fazzini, A. Fedotov, Rebecca E. Foulger, John S. Garavelli, Castro Lg, Rachael P. Huntley, Julius O.B. Jacobsen, M. Kleen, Kati Laiho, Duncan Legge, Quan Lin, W Liu, Jie Luo, Sandra Orchard, S. Patient, Klemens Pichler, Diego Poggioli, Nikolas Pontikos, Manuela Pruess, Steven Rosanoff, Tony Sawford, H. Sehra, Edward Turner, M. Corbett, M Donnelly, Van Rensburg P, Ioannis Xenarios, Lydie Bougueleret, Andrea H. Auchincloss, Ghislaine Argoud-Puy, Kristian B. Axelsen, Amos Marc Bairoch, Delphine Baratin, Blatter Mc, Brigitte Boeckmann, Jerven Bolleman, L. Bollondi, Emmanuel Boutet, Quintaje Sb, Lionel Breuza, Alan Bridge, E. Decastro, Elisabeth Coudert, Isabelle Cusin, M Doche, Dolnide Dornevil, Séverine Duvaud, Anne Estreicher, L Famiglietti, M Feuermann, Sebastien Gehant, Serenella Ferro, Elisabeth Gasteiger, Alain Gateau, Vivienne Baillie Gerritsen, Arnaud Gos, Nadine Gruaz-Gumowski, Ursula Hinz, Chantal Hulo, Nicolas Hulo, J. James, S. Jimenez, Florence Jungo, T. Kappler, Guillaume Keller, Vicente Lara, P Lemercier, Damien Lieberherr, Xavier D. Martin, Patrick Masson, M. Moinat, Salvo Paesano, Ivo Pedruzzi, Sandrine Pilbout, Sylvain Poux, Monica Pozzato, Nicole Redaschi, Catherine Rivoire, Bernd Roechert, Maria Victoria Schneider, Christian J. A. Sigrist, K Sonesson, S Staehli, E. Stanley, Andre Stutz, Shyamala Sundaram, Michael Tognolli, Laure Verbregue, Veuthey Al, Wu Ch, Arighi Cn, Leslie Arminski, Barker Wc, Chuming Chen, Yongxing Chen, P. Dubey, He Huang, Raja Mazumder, Peter B. McGarvey, Natale Da, Natarajan Tg, J. Nchoutmboube, Roberts Nv, Suzek Be, U. Ugochukwu, Vinayaka Cr, Qiang Wang, Yuqi Wang, Yeh Ls, Jian Zhang - Show less +122 more•Institutions (1)

Swiss Institute of Bioinformatics¹

12 Mar 2011-Nucleic Acids Research

TL;DR: The primary mission of Universal Protein Resource (UniProt) is to support biological research by maintaining a stable, comprehensive, fully classified, richly and accurately annotated protein sequence knowledgebase, with extensive cross-references and querying interfaces freely accessible to the scientific community.

...read moreread less

Abstract: The primary mission of Universal Protein Resource (UniProt) is to support biological research by maintaining a stable, comprehensive, fully classified, richly and accurately annotated protein sequence knowledgebase, with extensive cross-references and querying interfaces freely accessible to the scientific community. UniProt is produced by the UniProt Consortium which consists of groups from the European Bioinformatics Institute (EBI), the Swiss Institute of Bioinformatics (SIB) and the Protein Information Resource (PIR). UniProt is comprised of four major components, each optimized for different uses: the UniProt Archive, the UniProt Knowledgebase, the UniProt Reference Clusters and the UniProt Metagenomic and Environmental Sequence Database. UniProt is updated and distributed every 4 weeks and can be accessed online for searches or download at http://www.uniprot.org.

...read moreread less

Journal Article•DOI•

starBase: a database for exploring microRNA–mRNA interaction maps from Argonaute CLIP-Seq and Degradome-Seq data

[...]

Jian-Hua Yang¹, Jun-Hao Li¹, Peng Shao¹, Hui Zhou¹, Yue-Qin Chen¹, Liang-Hu Qu¹ - Show less +2 more•Institutions (1)

Sun Yat-sen University¹

01 Jan 2011-Nucleic Acids Research

TL;DR: A novel database, starBase (sRNA target Base), is introduced, which is developed to facilitate the comprehensive exploration of miRNA–target interaction maps from CLIP-Seq and Degradome-Sequ data.

...read moreread less

Abstract: MicroRNAs (miRNAs) represent an important class of small non-coding RNAs (sRNAs) that regulate gene expression by targeting messenger RNAs. However, assigning miRNAs to their regulatory target genes remains technically challenging. Recently, high-throughput CLIP-Seq and degradome sequencing (Degradome-Seq) methods have been applied to identify the sites of Argonaute interaction and miRNA cleavage sites, respectively. In this study, we introduce a novel database, starBase (sRNA target Base), which we have developed to facilitate the comprehensive exploration of miRNA–target interaction maps from CLIP-Seq and Degradome-Seq data. The current version includes high-throughput sequencing data generated from 21 CLIP-Seq and 10 DegradomeSeq experiments from six organisms. By analyzing millions of mapped CLIP-Seq and Degradome-Seq reads, we identified � 1 million Ago-binding clusters and � 2 million cleaved target clusters in animals and plants, respectively. Analyses of these clusters, and of target sites predicted by 6 miRNA target prediction programs, resulted in our identification of approximately 400 000 and approximately 66 000 miRNA-target regulatory relationships from CLIP-Seq and Degradome-Seq data, respectively. Furthermore, two web servers were provided to discover novel miRNA target sites from CLIP-Seq and Degradome-Seq data. Our web implementation supports diverse query types and exploration of common targets, gene ontologies and pathways. The starBase is available at http://starbase.sysu .edu.cn/.

...read moreread less

Journal Article•DOI•

The RNA modification database, RNAMDB: 2011 update

[...]

William A. Cantara¹, Pamela F. Crain, Jef Rozenski, James A. McCloskey, Kimberly A. Harris, Xiaonong Zhang, Franck A. P. Vendeix, Daniele Fabris, Paul F. Agris - Show less +5 more•Institutions (1)

State University of New York System¹

01 Jan 2011-Nucleic Acids Research

TL;DR: The expanded future role of The RNA Modification Database will be to serve as a primary information portal for researchers across the entire spectrum of RNA-related research.

...read moreread less

Abstract: Since its inception in 1994, The RNA Modification Database (RNAMDB, http://rna-mdb.cas.albany.edu/RNAmods/) has served as a focal point for information pertaining to naturally occurring RNA modifications. In its current state, the database employs an easy-to-use, searchable interface for obtaining detailed data on the 109 currently known RNA modifications. Each entry provides the chemical structure, common name and symbol, elemental composition and mass, CA registry numbers and index name, phylogenetic source, type of RNA species in which it is found, and references to the first reported structure determination and synthesis. Though newly transferred in its entirety to The RNA Institute, the RNAMDB continues to grow with two notable additions, agmatidine and 8-methyladenosine, appended in the last year. The RNA Modification Database is staying up-to-date with significant improvements being prepared for inclusion within the next year and the following year. The expanded future role of The RNA Modification Database will be to serve as a primary information portal for researchers across the entire spectrum of RNA-related research.

...read moreread less

Journal Article•DOI•

Allele frequency net: a database and online repository for immune gene frequencies in worldwide populations

[...]

Faviel F. Gonzalez-Galarza¹, Stephen E. Christmas¹, Derek Middleton¹, Andrew R. Jones¹•Institutions (1)

University of Liverpool¹

01 Jan 2011-Nucleic Acids Research

TL;DR: The allele frequency net database is an online repository that contains information on the frequencies of immune genes and their corresponding alleles in different populations that has been used in a wide variety of contexts, including clinical applications, epidemiology and population genetics.

...read moreread less

Abstract: The allele frequency net database (http://www.allelefrequencies.net) is an online repository that contains information on the frequencies of immune genes and their corresponding alleles in different populations. The extensive variability observed in genes and alleles related to the immune system response and its significance in transplantation, disease association studies and diversity in populations led to the development of this electronic resource. At present, the system contains data from 1133 populations in 608 813 individuals on the frequency of genes from different polymorphic regions such as human leukocyte antigens, killer-cell immunoglobulin-like receptors, major histocompatibility complex Class I chain-related genes and a number of cytokine gene polymorphisms. The project was designed to create a central source for the storage of frequency data and provide individuals with a set of bioinformatics tools to analyze the occurrence of these variants in worldwide populations. The resource has been used in a wide variety of contexts, including clinical applications (histocompatibility, immunology, epidemiology and pharmacogenetics) and population genetics. Demographic information, frequency data and searching tools can be freely accessed through the website.

...read moreread less

Collapse