Showing papers in "Nucleic Acids Research in 2014"

PDF

Open Access

Journal Article•DOI•

[...]

Robert D. Finn¹, Alex Bateman², Jody Clements¹, Penelope Coggill², Ruth Y. Eberhardt², Sean R. Eddy¹, Andreas Heger, Kirstie Hetherington³, Liisa Holm, Jaina Mistry², Erik L. L. Sonnhammer⁴, John Tate², Marco Punta² - Show less +9 more•Institutions (4)

Howard Hughes Medical Institute¹, European Bioinformatics Institute², Wellcome Trust Sanger Institute³, Stockholm University⁴

01 Jan 2014-Nucleic Acids Research

TL;DR: Pfam as discussed by the authors is a widely used database of protein families, containing 14 831 manually curated entries in the current version, version 27.0, and has been updated several times since 2012.

...read moreread less

Abstract: Pfam, available via servers in the UK (http://pfam.sanger.ac.uk/) and the USA (http://pfam.janelia.org/), is a widely used database of protein families, containing 14 831 manually curated entries in the current release, version 27.0. Since the last update article 2 years ago, we have generated 1182 new families and maintained sequence coverage of the UniProt Knowledgebase (UniProtKB) at nearly 80%, despite a 50% increase in the size of the underlying sequence database. Since our 2012 article describing Pfam, we have also undertaken a comprehensive review of the features that are provided by Pfam over and above the basic family data. For each feature, we determined the relevance, computational burden, usage statistics and the functionality of the feature in a website context. As a consequence of this review, we have removed some features, enhanced others and developed new ones to meet the changing demands of computational biology. Here, we describe the changes to Pfam content. Notably, we now provide family alignments based on four different representative proteome sequence data sets and a new interactive DNA search interface. We also discuss the mapping between Pfam and known 3D structures.

...read moreread less

9,415 citations

Journal Article•DOI•

The Reactome Pathway Knowledgebase.

[...]

Antonio Fabregat¹, Konstantinos Sidiropoulos¹, Phani V. Garapati¹, Marc Gillespie², Marc Gillespie³, Kerstin Hausmann¹, Robin Haw³, Bijay Jassal³, S Jupe¹, Florian Korninger¹, Sheldon J. McKay³, Lisa Matthews⁴, Bruce May³, Marija Milacic³, Karen Rothfels³, Veronica Shamovsky⁴, Marissa Webber³, Joel Weiser³, Mark Williams¹, Guanming Wu³, Lincoln Stein³, Lincoln Stein⁵, Lincoln Stein⁶, Henning Hermjakob⁷, Henning Hermjakob¹, Peter D'Eustachio⁴ - Show less +22 more•Institutions (7)

European Bioinformatics Institute¹, St. John's University², Ontario Institute for Cancer Research³, New York University⁴, Cold Spring Harbor Laboratory⁵, University of Toronto⁶, Protein Sciences⁷

01 Jan 2014-Nucleic Acids Research

TL;DR: The Reactome Knowledgebase provides molecular details of signal transduction, transport, DNA replication, metabolism and other cellular processes as an ordered network of molecular transformations—an extended version of a classic metabolic map, in a single consistent data model.

...read moreread less

Abstract: The Reactome Knowledgebase (www.reactome.org) provides molecular details of signal transduction, transport, DNA replication, metabolism and other cellular processes as an ordered network of molecular transformations-an extended version of a classic metabolic map, in a single consistent data model. Reactome functions both as an archive of biological processes and as a tool for discovering unexpected functional relationships in data such as gene expression pattern surveys or somatic mutation catalogues from tumour cells. Over the last two years we redeveloped major components of the Reactome web interface to improve usability, responsiveness and data visualization. A new pathway diagram viewer provides a faster, clearer interface and smooth zooming from the entire reaction network to the details of individual reactions. Tool performance for analysis of user datasets has been substantially improved, now generating detailed results for genome-wide expression datasets within seconds. The analysis module can now be accessed through a RESTFul interface, facilitating its inclusion in third party applications. A new overview module allows the visualization of analysis results on a genome-wide Reactome pathway hierarchy using a single screen page. The search interface now provides auto-completion as well as a faceted search to narrow result lists efficiently.

...read moreread less

5,065 citations

Journal Article•DOI•

The carbohydrate-active enzymes database (CAZy) in 2013

[...]

Vincent Lombard¹, Hemalatha Golaconda Ramulu¹, Elodie Drula¹, Pedro M. Coutinho¹, Bernard Henrissat¹ - Show less +1 more•Institutions (1)

Centre national de la recherche scientifique¹

01 Jan 2014-Nucleic Acids Research

TL;DR: The changes that have occurred in CAZy during the past 5 years are outlined and a novel effort to display the resolution and the carbohydrate ligands in crystallographic complexes of CAZymes is presented.

...read moreread less

Abstract: The Carbohydrate-Active Enzymes database (CAZy; http://www.cazy.org) provides online and continuously updated access to a sequence-based family classification linking the sequence to the specificity and 3D structure of the enzymes that assemble, modify and breakdown oligo- and polysaccharides. Functional and 3D structural information is added and curated on a regular basis based on the available literature. In addition to the use of the database by enzymologists seeking curated information on CAZymes, the dissemination of a stable nomenclature for these enzymes is probably a major contribution of CAZy. The past few years have seen the expansion of the CAZy classification scheme to new families, the development of subfamilies in several families and the power of CAZy for the analysis of genomes and metagenomes. This article outlines the changes that have occurred in CAZy during the past 5 years and presents our novel effort to display the resolution and the carbohydrate ligands in crystallographic complexes of CAZymes.

...read moreread less

4,997 citations

Journal Article•DOI•

Deciphering key features in protein structures with the new ENDscript server

[...]

Xavier Robert¹, Patrice Gouet¹•Institutions (1)

University of Lyon¹

01 Jul 2014-Nucleic Acids Research

TL;DR: This major upgrade has been fully re-engineered to enhance speed, accuracy and usability with interactive 3D visualization of ENDscript 2 and ESPript 3 to handle a large number of data with reduced computation time.

...read moreread less

Abstract: ENDscript 2 is a friendly Web server for extracting and rendering a comprehensive analysis of primary to quaternary protein structure information in an automated way. This major upgrade has been fully re-engineered to enhance speed, accuracy and usability with interactive 3D visualization. It takes advantage of the new version 3 of ESPript, our well-known sequence alignment renderer, improved to handle a large number of data with reduced computation time. From a single PDB entry or file, ENDscript produces high quality figures displaying multiple sequence alignment of proteins homologous to the query, colored according to residue conservation. Furthermore, the experimental secondary structure elements and a detailed set of relevant biophysical and structural data are depicted. All this information and more are now mapped on interactive 3D PyMOL representations. Thanks to its adaptive and rigorous algorithm, beginner to expert users can modify settings to fine-tune ENDscript to their needs. ENDscript has also been upgraded as an open platform for the visualization of multiple biochemical and structural data coming from external biotool Web servers, with both 2D and 3D representations. ENDscript 2 and ESPript 3 are freely available at http://endscript.ibcp.fr and http://espript.ibcp.fr, respectively.

...read moreread less

4,722 citations

Journal Article•DOI•

miRBase: annotating high confidence microRNAs using deep sequencing data.

[...]

Ana Kozomara¹, Sam Griffiths-Jones¹•Institutions (1)

University of Manchester¹

01 Jan 2014-Nucleic Acids Research

TL;DR: An update of the miRBase database is described, including the collation and use of deep sequencing data sets to assign levels of confidence to miR base entries, and a high confidence subset of miR Base entries are provided, based on the pattern of mapped reads.

...read moreread less

Abstract: We describe an update of the miRBase database (http://www.mirbase.org/), the primary microRNA sequence repository. The latest miRBase release (v20, June 2013) contains 24 521 microRNA loci from 206 species, processed to produce 30 424 mature microRNA products. The rate of deposition of novel microRNAs and the number of researchers involved in their discovery continue to increase, driven largely by small RNA deep sequencing experiments. In the face of these increases, and a range of microRNA annotation methods and criteria, maintaining the quality of the microRNA sequence data set is a significant challenge. Here, we describe recent developments of the miRBase database to address this issue. In particular, we describe the collation and use of deep sequencing data sets to assign levels of confidence to miRBase entries. We now provide a high confidence subset of miRBase entries, based on the pattern of mapped reads. The high confidence microRNA data set is available alongside the complete microRNA collection at http://www.mirbase.org/. We also describe embedding microRNA-specific Wikipedia pages on the miRBase website to encourage the microRNA community to contribute and share textual and functional information.

...read moreread less

4,705 citations

Journal Article•DOI•

SWISS-MODEL: modelling protein tertiary and quaternary structure using evolutionary information

[...]

Marco Biasini¹, Stefan Bienert¹, Andrew Waterhouse¹, Konstantin Arnold¹, Gabriel Studer¹, Tobias Schmidt¹, Florian Kiefer¹, Tiziano Gallo Cassarino¹, Martino Bertoni¹, Lorenza Bordoli¹, Torsten Schwede², Torsten Schwede¹ - Show less +8 more•Institutions (2)

Swiss Institute of Bioinformatics¹, University of Basel²

01 Jul 2014-Nucleic Acids Research

TL;DR: The latest version of the SWISS-MODEL expert system for protein structure modelling is described, which makes extensive use of model quality estimation for selection of the most suitable templates and provides estimates of the expected accuracy of the resulting models.

...read moreread less

Abstract: Protein structure homology modelling has become a routine technique to generate 3D models for proteins when experimental structures are not available. Fully automated servers such as SWISS-MODEL with user-friendly web interfaces generate reliable models without the need for complex software packages or downloading large databases. Here, we describe the latest version of the SWISS-MODEL expert system for protein structure modelling. The SWISS-MODEL template library provides annotation of quaternary structure and essential ligands and co-factors to allow for building of complete structural models, including their oligomeric structure. The improved SWISS-MODEL pipeline makes extensive use of model quality estimation for selection of the most suitable templates and provides estimates of the expected accuracy of the resulting models. The accuracy of the models generated by SWISS-MODEL is continuously evaluated by the CAMEO system. The new web site allows users to interactively search for templates, cluster them by sequence similarity, structurally compare alternative templates and select the ones to be used for model building. In cases where multiple alternative template structures are available for a protein of interest, a user-guided template selection step allows building models in different functional states. SWISS-MODEL is available at http://swissmodel.expasy.org/.

...read moreread less

4,235 citations

Journal Article•DOI•

starBase v2.0: decoding miRNA-ceRNA, miRNA-ncRNA and protein–RNA interaction networks from large-scale CLIP-Seq data

[...]

Jun-Hao Li¹, Shun Liu¹, Hui Zhou¹, Liang-Hu Qu¹, Jian-Hua Yang¹ - Show less +1 more•Institutions (1)

Sun Yat-sen University¹

01 Jan 2014-Nucleic Acids Research

TL;DR: This study developed starBase v2.0, which has been updated to provide the most comprehensive CLIP-Seq experimentally supported miRNA-mRNA and mi RNA-lncRNA interaction networks to date, and developed miRFunction and ceRNAFunction web servers to predict the function of miRNAs and other ncRNAs from themiRNA-mediated regulatory networks.

...read moreread less

Abstract: Although microRNAs (miRNAs), other non-coding RNAs (ncRNAs) (e.g. lncRNAs, pseudogenes and circRNAs) and competing endogenous RNAs (ceRNAs) have been implicated in cell-fate determination and in various human diseases, surprisingly little is known about the regulatory interaction networks among the multiple classes of RNAs. In this study, we developed starBase v2.0 (http://starbase.sysu.edu.cn/) to systematically identify the RNA-RNA and protein-RNA interaction networks from 108 CLIP-Seq (PAR-CLIP, HITS-CLIP, iCLIP, CLASH) data sets generated by 37 independent studies. By analyzing millions of RNA-binding protein binding sites, we identified ∼9000 miRNA-circRNA, 16 000 miRNA-pseudogene and 285,000 protein-RNA regulatory relationships. Moreover, starBase v2.0 has been updated to provide the most comprehensive CLIP-Seq experimentally supported miRNA-mRNA and miRNA-lncRNA interaction networks to date. We identified ∼10,000 ceRNA pairs from CLIP-supported miRNA target sites. By combining 13 functional genomic annotations, we developed miRFunction and ceRNAFunction web servers to predict the function of miRNAs and other ncRNAs from the miRNA-mediated regulatory networks. Finally, we developed interactive web implementations to provide visualization, analysis and downloading of the aforementioned large-scale data sets. This study will greatly expand our understanding of ncRNA functions and their coordinated regulatory networks.

...read moreread less

3,597 citations

Journal Article•DOI•

Ribosomal Database Project: data and tools for high throughput rRNA analysis

[...]

James R. Cole¹, Qiong Wang¹, Jordan A. Fish¹, Benli Chai¹, Donna M. McGarrell¹, Yanni Sun¹, C. Titus Brown¹, Andrea Porras-Alfaro¹, Cheryl R. Kuske¹, James M. Tiedje¹ - Show less +6 more•Institutions (1)

Los Alamos National Laboratory¹

01 Jan 2014-Nucleic Acids Research

TL;DR: RDP now includes a collection of fungal large subunit rRNA genes, and most tools are now available as open source packages for download and local use by researchers with high-volume needs or who would like to develop custom analysis pipelines.

...read moreread less

Abstract: Ribosomal Database Project (RDP; http://rdp.cme.msu.edu/) provides the research community with aligned and annotated rRNA gene sequence data, along with tools to allow researchers to analyze their own rRNA gene sequences in the RDP framework. RDP data and tools are utilized in fields as diverse as human health, microbial ecology, environmental microbiology, nucleic acid chemistry, taxonomy and phylogenetics. In addition to aligned and annotated collections of bacterial and archaeal small subunit rRNA genes, RDP now includes a collection of fungal large subunit rRNA genes. RDP tools, including Classifier and Aligner, have been updated to work with this new fungal collection. The use of high-throughput sequencing to characterize environmental microbial populations has exploded in the past several years, and as sequence technologies have improved, the sizes of environmental datasets have increased. With release 11, RDP is providing an expanded set of tools to facilitate analysis of high-throughput data, including both single-stranded and paired-end reads. In addition, most tools are now available as open source packages for download and local use by researchers with high-volume needs or who would like to develop custom analysis pipelines.

...read moreread less

3,443 citations

Journal Article•DOI•

The SEED and the Rapid Annotation of microbial genomes using Subsystems Technology (RAST)

[...]

Ross Overbeek¹, Robert Olson¹, Gordon D. Pusch¹, Gary J. Olsen¹, James J. Davis¹, Terry Disz¹, Robert Edwards², Svetlana Gerdes¹, Bruce Parrello¹, Maulik Shukla³, Veronika Vonstein¹, Alice R. Wattam³, Fangfang Xia¹, Rick Stevens¹ - Show less +10 more•Institutions (3)

University of Illinois at Urbana–Champaign¹, San Diego State University², Virginia Tech³

01 Jan 2014-Nucleic Acids Research

TL;DR: The interconnectedness of the SEED database and RAST, the RAST annotation pipeline and updates to both resources are described.

...read moreread less

Abstract: In 2004, the SEED (http://pubseed.theseed.org/) was created to provide consistent and accurate genome annotations across thousands of genomes and as a platform for discovering and developing de novo annotations. The SEED is a constantly updated integration of genomic data with a genome database, web front end, API and server scripts. It is used by many scientists for predicting gene functions and discovering new pathways. In addition to being a powerful database for bioinformatics research, the SEED also houses subsystems (collections of functionally related protein families) and their derived FIGfams (protein families), which represent the core of the RAST annotation engine (http://rast.nmpdr.org/). When a new genome is submitted to RAST, genes are called and their annotations are made by comparison to the FIGfam collection. If the genome is made public, it is then housed within the SEED and its proteins populate the FIGfam collection. This annotation cycle has proven to be a robust and scalable solution to the problem of annotating the exponentially increasing number of genomes. To date, >12 000 users worldwide have annotated >60 000 distinct genomes using RAST. Here we describe the interconnectedness of the SEED database and RAST, the RAST annotation pipeline and updates to both resources.

...read moreread less

3,415 citations

Journal Article•DOI•

Data, information, knowledge and principle: back to metabolism in KEGG

[...]

Minoru Kanehisa¹, Susumu Goto¹, Yoko Sato¹, Masayuki Kawashima¹, Miho Furumichi¹, Mao Tanabe¹ - Show less +2 more•Institutions (1)

Kyoto University¹

01 Jan 2014-Nucleic Acids Research

TL;DR: The reaction modules, which represent chemical units of reactions, have been used to analyze design principles of metabolic networks and also to improve the definition of K numbers and associated annotations for translational bioinformatics.

...read moreread less

Abstract: In the hierarchy of data, information and knowledge, computational methods play a major role in the initial processing of data to extract information, but they alone become less effective to compile knowledge from information. The Kyoto Encyclopedia of Genes and Genomes (KEGG) resource (http://www.kegg.jp/ or http://www.genome.jp/kegg/) has been developed as a reference knowledge base to assist this latter process. In particular, the KEGG pathway maps are widely used for biological interpretation of genome sequences and other high-throughput data. The link from genomes to pathways is made through the KEGG Orthology system, a collection of manually defined ortholog groups identified by K numbers. To better automate this interpretation process the KEGG modules defined by Boolean expressions of K numbers have been expanded and improved. Once genes in a genome are annotated with K numbers, the KEGG modules can be computationally evaluated revealing metabolic capacities and other phenotypic features. The reaction modules, which represent chemical units of reactions, have been used to analyze design principles of metabolic networks and also to improve the definition of K numbers and associated annotations. For translational bioinformatics, the KEGG MEDICUS resource has been developed by integrating drug labels (package inserts) used in society.

...read moreread less

2,808 citations

Journal Article•DOI•

The NHGRI GWAS Catalog, a curated resource of SNP-trait associations

[...]

Danielle Welter¹, Jacqueline A. L. MacArthur¹, Joannella Morales¹, Tony Burdett¹, Peggy Hall¹, Heather Junkins¹, Alan Klemm¹, Paul Flicek¹, Teri A. Manolio¹, Lucia A. Hindorff¹, Helen Parkinson¹ - Show less +7 more•Institutions (1)

National Institutes of Health¹

01 Jan 2014-Nucleic Acids Research

TL;DR: A number of recent improvements to theNHGRI Catalog of Published Genome-Wide Association Studies are presented, including novel ways for users to interact with the Catalog and changes to the curation infrastructure.

...read moreread less

Abstract: The National Human Genome Research Institute (NHGRI) Catalog of Published Genome-Wide Association Studies (GWAS) Catalog provides a publicly available manually curated collection of published GWAS assaying at least 100000 singlenucleotide polymorphisms (SNPs) and all SNP-trait associations with P <110 5 . The Catalog includes 1751 curated publications of 11912 SNPs. In addition to the SNP-trait association data, the Catalog also publishes a quarterly diagram of all SNP-trait associations mapped to the SNPs’ chromosomal locations. The Catalog can be accessed via a tabular web interface, via a dynamic visualization on the human karyotype, as a downloadable tab-delimited file and as an OWL knowledge base. This article presents a number of recent improvements to the Catalog, including novel ways for users to interact with the Catalog and changes to the curation infrastructure.

...read moreread less

Journal Article•DOI•

ClinVar: public archive of relationships among sequence variation and human phenotype

[...]

Melissa J. Landrum¹, Jennifer M. Lee¹, George R. Riley¹, Wonhee Jang¹, Wendy S. Rubinstein¹, Deanna M. Church¹, Donna Maglott¹ - Show less +3 more•Institutions (1)

National Institutes of Health¹

01 Jan 2014-Nucleic Acids Research

TL;DR: To facilitate evaluation of the medical importance of each variant, ClinVar aggregates submissions with the same variation/phenotype combination, adds value from other NCBI databases, assigns a distinct accession of the format RCV000000000.0 and reports if there are conflicting clinical interpretations.

...read moreread less

Abstract: ClinVar (http://www.ncbi.nlm.nih.gov/clinvar/) provides a freely available archive of reports of relationships among medically important variants and phenotypes. ClinVar accessions submissions reporting human variation, interpretations of the relationship of that variation to human health and the evidence supporting each interpretation. The database is tightly coupled with dbSNP and dbVar, which maintain information about the location of variation on human assemblies. ClinVar is also based on the phenotypic descriptions maintained in MedGen (http://www.ncbi.nlm.nih.gov/medgen). Each ClinVar record represents the submitter, the variation and the phenotype, i.e. the unit that is assigned an accession of the format SCV000000000.0. The submitter can update the submission at any time, in which case a new version is assigned. To facilitate evaluation of the medical importance of each variant, ClinVar aggregates submissions with the same variation/phenotype combination, adds value from other NCBI databases, assigns a distinct accession of the format RCV000000000.0 and reports if there are conflicting clinical interpretations. Data in ClinVar are available in multiple formats, including html, download as XML, VCF or tab-delimited subsets. Data from ClinVar are provided as annotation tracks on genomic RefSeqs and are used in tools such as Variation Reporter (http://www.ncbi.nlm.nih.gov/variation/tools/reporter), which reports what is known about variation based on user-supplied locations.

...read moreread less

Journal Article•DOI•

The SILVA and "All-species Living Tree Project (LTP)" taxonomic frameworks.

[...]

Pelin Yilmaz¹, Laura Wegener Parfrey¹, Pablo Yarza², Jan E. Gerken², Elmar Pruesse², Christian Quast², Timmy Schweer², Jörg Peplies², Wolfgang Ludwig¹, Frank Oliver Glöckner² - Show less +6 more•Institutions (2)

University of British Columbia¹, Max Planck Society²

01 Jan 2014-Nucleic Acids Research

TL;DR: The improvements the SILVA taxonomy has undergone in the last 3 years are described, focusing on the curation process, the various resources used for curation and the comparison of the SILva taxonomy with Greengenes and RDP-II taxonomies.

...read moreread less

Abstract: SILVA (from Latin silva, forest, http://www.arb-silva.de) is a comprehensive resource for up-to-date quality-controlled databases of aligned ribosomal RNA (rRNA) gene sequences from the Bacteria, Archaea and Eukaryota domains and supplementary online services. SILVA provides a manually curated taxonomy for all three domains of life, based on representative phylogenetic trees for the small- and large-subunit rRNA genes. This article describes the improvements the SILVA taxonomy has undergone in the last 3 years. Specifically we are focusing on the curation process, the various resources used for curation and the comparison of the SILVA taxonomy with Greengenes and RDP-II taxonomies. Our comparisons not only revealed a reasonable overlap between the taxa names, but also points to significant differences in both names and numbers of taxa between the three resources.

...read moreread less

Journal Article•DOI•

deepTools: a flexible platform for exploring deep-sequencing data

[...]

Fidel Ramírez¹, Friederike Dündar¹, Sarah Diehl¹, Björn Grüning², Thomas Manke¹ - Show less +1 more•Institutions (2)

Max Planck Society¹, University of Freiburg²

01 Jul 2014-Nucleic Acids Research

TL;DR: A Galaxy based web server for processing and visualizing deeply sequenced data, called deepTools, that enables users with little bioinformatic background to explore the results of their sequencing experiments in a standardized setting and can be used without registration.

...read moreread less

Abstract: We present a Galaxy based web server for processing and visualizing deeply sequenced data. The web server’s core functionality consists of a suite of newly developed tools, called deepTools, that enable users with little bioinformatic background to explore the results of their sequencing experiments in a standardized setting. Users can upload pre-processed files with continuous data in standard formats and generate heatmaps and summary plots in a straightforward, yet highly customizable manner. In addition, we offer several tools for the analysis of files containing aligned reads and enable efficient and reproducible generation of normalized coverage files. As a modular and open-source platform, deepTools can easily be expanded and customized to future demands and developments. The deepTools webserver is freely available at http://deeptools.ie-freiburg.mpg. de and is accompanied by extensive documentation and tutorials aimed at conveying the principles of deep-sequencing data analysis. The web server can be used without registration. deepTools can be installed locally either stand-alone or as part of Galaxy.

...read moreread less

Journal Article•DOI•

DrugBank 4.0: shedding new light on drug metabolism

[...]

Vivian Law¹, Craig Knox¹, Yannick Djoumbou¹, Timothy Jewison¹, An Chi Guo¹, Yifeng Liu¹, Adam Maciejewski¹, David Arndt¹, Michael Wilson¹, Vanessa Neveu¹, Alexandra Tang¹, Geraldine Gabriel¹, Carol Ly¹, Sakina Adamjee¹, Zerihun T. Dame¹, Beomsoo Han¹, You Zhou¹, David S. Wishart¹ - Show less +14 more•Institutions (1)

National Institute for Nanotechnology¹

01 Jan 2014-Nucleic Acids Research

TL;DR: The latest update of DrugBank, DrugBank 4.0, has been further expanded to contain data on drug metabolism, absorption, distribution, metabolism, excretion and toxicity (ADMET) and other kinds of quantitative structure activity relationships (QSAR) information.

...read moreread less

Abstract: DrugBank (http://www.drugbank.ca) is a comprehensive online database containing extensive biochemical and pharmacological information about drugs, their mechanisms and their targets. Since it was first described in 2006, DrugBank has rapidly evolved, both in response to user requests and in response to changing trends in drug research and development. Previous versions of DrugBank have been widely used to facilitate drug and in silico drug target discovery. The latest update, DrugBank 4.0, has been further expanded to contain data on drug metabolism, absorption, distribution, metabolism, excretion and toxicity (ADMET) and other kinds of quantitative structure activity relationships (QSAR) information. These enhancements are intended to facilitate research in xenobiotic metabolism (both prediction and characterization), pharmacokinetics, pharmacodynamics and drug design/discovery. For this release, >1200 drug metabolites (including their structures, names, activity, abundance and other detailed data) have been added along with >1300 drug metabolism reactions (including metabolizing enzymes and reaction types) and dozens of drug metabolism pathways. Another 30 predicted or measured ADMET parameters have been added to each DrugCard, bringing the average number of quantitative ADMET values for Food and Drug Administration-approved drugs close to 40. Referential nuclear magnetic resonance and MS spectra have been added for almost 400 drugs as well as spectral and mass matching tools to facilitate compound identification. This expanded collection of drug information is complemented by a number of new or improved search tools, including one that provides a simple analyses of drug-target, -enzyme and -transporter associations to provide insight on drug-drug interactions.

...read moreread less

Journal Article•DOI•

Easy quantitative assessment of genome editing by sequence trace decomposition

[...]

Eva K. Brinkman¹, Tao Chen¹, Mario Amendola¹, Bas van Steensel¹•Institutions (1)

Netherlands Cancer Institute¹

16 Dec 2014-Nucleic Acids Research

TL;DR: TIDE, a method that requires only a pair of PCR reactions and two standard capillary sequencing runs to identify the major induced mutations in the projected editing site and accurately determines their frequency in a cell population, is presented.

...read moreread less

Abstract: The efficacy and the mutation spectrum of genome editing methods can vary substantially depending on the targeted sequence. A simple, quick assay to accurately characterize and quantify the induced mutations is therefore needed. Here we present TIDE, a method for this purpose that requires only a pair of PCR reactions and two standard capillary sequencing runs. The sequence traces are then analyzed by a specially developed decomposition algorithm that identifies the major induced mutations in the projected editing site and accurately determines their frequency in a cell population. This method is cost-effective and quick, and it provides much more detailed information than current enzyme-based assays. An interactive web tool for automated decomposition of the sequence traces is available. TIDE greatly facilitates the testing and rational design of genome editing strategies.

...read moreread less

Journal Article•DOI•

The MIntAct project--IntAct as a common curation platform for 11 molecular interaction databases.

[...]

Sandra Orchard, Mais G. Ammari¹, Bruno Aranda, Lionel Breuza², Leonardo Briganti³, Fiona Broackes-Carter⁴, Nancy H. Campbell⁵, Gayatri Chavali, Carol Chen⁶, Noemi del-Toro, Margaret Duesbury, Marine Dumousseau, Eugenia Galeota³, Ursula Hinz², Marta Iannuccelli³, Sruthi Jagannathan⁷, Rafael C. Jimenez, Jyoti Khadake, Astrid Lagreid⁸, Luana Licata³, Ruth C. Lovering⁵, Birgit H M Meldal, Anna N. Melidoni⁵, Mila Milagros, Daniele Peluso, Livia Perfetto³, Pablo Porras, Arathi Raghunath, Sylvie Ricard-Blum⁹, Bernd Roechert², Andre Stutz², Michael Tognolli², Kim Van Roey, Gianni Cesareni, Henning Hermjakob - Show less +31 more•Institutions (9)

University of Arizona¹, University of Geneva², University of Rome Tor Vergata³, University of Toronto⁴, University College London⁵, University of British Columbia⁶, National University of Singapore⁷, Norwegian University of Science and Technology⁸, Claude Bernard University Lyon 1⁹

01 Jan 2014-Nucleic Acids Research

TL;DR: All data manually curated by the MINT curators have been moved into the IntAct database at EMBL-EBI and are merged with the existing IntAct dataset.

...read moreread less

Abstract: IntAct (freely available at http://www.ebi.ac.uk/intact) is an open-source, open data molecular interaction database populated by data either curated from the literature or from direct data depositions. IntAct has developed a sophisticated web-based curation tool, capable of supporting both IMEx- and MIMIx-level curation. This tool is now utilized by multiple additional curation teams, all of whom annotate data directly into the IntAct database. Members of the IntAct team supply appropriate levels of training, perform quality control on entries and take responsibility for long-term data maintenance. Recently, the MINT and IntAct databases decided to merge their separate efforts to make optimal use of limited developer resources and maximize the curation output. All data manually curated by the MINT curators have been moved into the IntAct database at EMBL-EBI and are merged with the existing IntAct dataset. Both IntAct and MINT are active contributors to the IMEx consortium (http://www.imexconsortium.org).

...read moreread less

Journal Article•DOI•

The ChEMBL bioactivity database: an update

[...]

A. Patrícia Bento¹, Anna Gaulton¹, Anne Hersey¹, Louisa J. Bellis¹, Jon Chambers¹, Mark Davies¹, Felix A. Kruger¹, Yvonne Light¹, Lora Mak¹, Shaun McGlinchey¹, Michal Nowotka¹, George Papadatos¹, Rita Santos¹, John P. Overington¹ - Show less +10 more•Institutions (1)

European Bioinformatics Institute¹

01 Jan 2014-Nucleic Acids Research

TL;DR: More comprehensive tracking of compounds from research stages through clinical development to market is provided through the inclusion of data from United States Adopted Name applications and a new richer data model for representing drug targets has been developed.

...read moreread less

Abstract: ChEMBL is an open large-scale bioactivity database (https://www.ebi.ac.uk/chembl), previously described in the 2012 Nucleic Acids Research Database Issue. Since then, a variety of new data sources and improvements in functionality have contributed to the growth and utility of the resource. In particular, more comprehensive tracking of compounds from research stages through clinical development to market is provided through the inclusion of data from United States Adopted Name applications; a new richer data model for representing drug targets has been developed; and a number of methods have been put in place to allow users to more easily identify reliable data. Finally, access to ChEMBL is now available via a new Resource Description Framework format, in addition to the web-based interface, data downloads and web services.

...read moreread less

Journal Article•DOI•

LPSN—list of prokaryotic names with standing in nomenclature

[...]

Aidan C. Parte

01 Jan 2014-Nucleic Acids Research

TL;DR: The List of Prokaryotic Names with Standing in Nomenclature (LPSN) is a database that lists the names of prokaryotes (Bacteria and Archaea) that have been validly published in the International Journal of Systematic and Evolutionary Microbiology directly or by inclusion in a Validation List.

...read moreread less

Abstract: The List of Prokaryotic Names with Standing in Nomenclature (LPSN; http://www.bacterio.net) is a database that lists the names of prokaryotes (Bacteria and Archaea) that have been validly published in the International Journal of Systematic and Evolutionary Microbiology directly or by inclusion in a Validation List, under the Rules of International Code of Nomenclature of Bacteria. Currently there are 15 974 taxa listed. In addition, LPSN has an up-to-date classification of prokaryotes and information on prokaryotic nomenclature and culture collections.

...read moreread less

Journal Article•DOI•

PATRIC, the bacterial bioinformatics database and analysis resource

[...]

Alice R. Wattam¹, David Abraham¹, Oral Dalay¹, Terry Disz¹, Timothy P. Driscoll¹, Joseph L. Gabbard¹, Joseph J. Gillespie¹, Roger Gough¹, Deborah Hix¹, Ronald W. Kenyon¹, Dustin Machi¹, Chunhong Mao¹, Eric K. Nordberg¹, Robert Olson¹, Ross Overbeek¹, Gordon D. Pusch¹, Maulik Shukla¹, Julie R. Schulman¹, Rick Stevens¹, Daniel E. Sullivan¹, Veronika Vonstein¹, Andrew S. Warren¹, Rebecca Will¹, Meredith J. C. Wilson¹, Hyunseung Yoo¹, Chengdong Zhang¹, Yan Zhang¹, Bruno W. S. Sobral¹ - Show less +24 more•Institutions (1)

University of Maryland, Baltimore¹

01 Jan 2014-Nucleic Acids Research

TL;DR: The Pathosystems Resource Integration Center (PATRIC) is the all-bacterial Bioinformatics Resource Center (BRC) and describes updates to the PATRIC since its initial report in the 2007 NAR Database Issue.

...read moreread less

Abstract: The Pathosystems Resource Integration Center (PATRIC) is the all-bacterial Bioinformatics Resource Center (BRC) (http://www.patricbrc.org). A joint effort by two of the original National Institute of Allergy and Infectious Diseases-funded BRCs, PATRIC provides researchers with an online resource that stores and integrates a variety of data types [e.g. genomics, transcriptomics, protein–protein interactions (PPIs), three-dimensional protein structures and sequence typing data] and associated metadata. Datatypes are summarized for individual genomes and across taxonomic levels. All genomes in PATRIC, currently more than 10 000, are consistently annotated using RAST, the Rapid Annotations using Subsystems Technology. Summaries of different data types are also provided for individual genes, where comparisons of different annotations are available, and also include available transcriptomic data. PATRIC provides a variety of ways for researchers to find data of interest and a private workspace where they can store both genomic and gene associations, and their own private data. Both private and public data can be analyzed together using a suite of tools to perform comparative genomic or transcriptomic analysis. PATRIC also includes integrated information related to disease and PPIs. All the data and integrated analysis and visualization tools are freely available. This manuscript describes updates to the PATRIC since its initial report in the 2007 NAR Database Issue.

...read moreread less

Journal Article•DOI•

The Database of Genomic Variants: a curated collection of structural variation in the human genome

[...]

Jeffrey R. MacDonald¹, Robert Ziman¹, Ryan K. C. Yuen¹, Lars Feuk¹, Stephen W. Scherer¹ - Show less +1 more•Institutions (1)

The Centre for Applied Genomics¹

01 Jan 2014-Nucleic Acids Research

TL;DR: The core visualization tool (gbrowse) has been upgraded with additional functions to facilitate data analysis and comparison, and a new query tool has been developed to provide flexible and interactive access to the data.

...read moreread less

Abstract: Over the past decade, the Database of Genomic Variants (DGV; http://dgv.tcag.ca/) has provided a publicly accessible, comprehensive curated catalogue of structural variation (SV) found in the genomes of control individuals from worldwide populations. Here, we describe updates and new features, which have expanded the utility of DGV for both the basic research and clinical diagnostic communities. The current version of DGV consists of 55 published studies, comprising >2.5 million entries identified in >22 300 genomes. Studies included in DGV are selected from the accessioned data sets in the archival SV databases dbVar (NCBI) and DGVa (EBI), and then further curated for accuracy and validity. The core visualization tool (gbrowse) has been upgraded with additional functions to facilitate data analysis and comparison, and a new query tool has been developed to provide flexible and interactive access to the data. The content from DGV is regularly incorporated into other large-scale genome reference databases and represents a standard data resource for new product and database development, in particular for copy number variation testing in clinical labs. The accurate cataloguing of variants in DGV will continue to enable medical genetics and genome sequencing research.

...read moreread less

Journal Article•DOI•

MycoCosm portal: gearing up for 1000 fungal genomes

[...]

Igor V. Grigoriev¹, Roman Nikitin¹, Sajeet Haridas¹, Alan Kuo¹, Robin A. Ohm¹, Robert Otillar¹, Robert Riley¹, Asaf Salamov¹, Xueling Zhao¹, Frank Korzeniewski¹, Tatyana Smirnova¹, Henrik P. Nordberg¹, Inna Dubchak¹, Igor Shabalov¹ - Show less +10 more•Institutions (1)

United States Department of Energy¹

01 Jan 2014-Nucleic Acids Research

TL;DR: MycoCosm is a fungal genomics portal developed by the US Department of Energy Joint Genome Institute to support integration, analysis and dissemination of fungal genome sequences and other 'omics' data by providing interactive web-based tools.

...read moreread less

Abstract: MycoCosm is a fungal genomics portal (http://jgi.doe.gov/fungi), developed by the US Department of Energy Joint Genome Institute to support integration, analysis and dissemination of fungal genome sequences and other 'omics' data by providing interactive web-based tools. MycoCosm also promotes and facilitates user community participation through the nomination of new species of fungi for sequencing, and the annotation and analysis of resulting data. By efficiently filling gaps in the Fungal Tree of Life, MycoCosm will help address important problems associated with energy and the environment, taking advantage of growing fungal genomics resources.

...read moreread less

Journal Article•DOI•

JASPAR 2014: an extensively expanded and updated open-access database of transcription factor binding profiles.

[...]

Anthony Mathelier, Xiaobei Zhao, Allen W. Zhang, François Parcy, Rebecca Worsley-Hunt, David J. Arenillas, Sorana Buchman, Chih-Yu Chen, Alice Yi Chou, Hans Ienasescu, Jonathan S. Lim, Casper Shyr, Ge Tan, Michelle Zhou, Boris Lenhard, Albin Sandelin, Wyeth W. Wasserman - Show less +13 more

01 Jan 2014-Nucleic Acids Research

TL;DR: The fifth major release greatly expands the heart of JASPAR—the JAS PAR CORE subcollection, which contains curated, non-redundant profiles—with 135 new curated profiles, mainly derived from published chromatin immunoprecipitation-seq experimental datasets.

...read moreread less

Abstract: JASPAR (http://jaspar.genereg.net) is the largest open-access database of matrix-based nucleotide profiles describing the binding preference of transcription factors from multiple species. The fifth major release greatly expands the heart of JASPAR-the JASPAR CORE subcollection, which contains curated, non-redundant profiles-with 135 new curated profiles (74 in vertebrates, 8 in Drosophila melanogaster, 10 in Caenorhabditis elegans and 43 in Arabidopsis thaliana; a 30% increase in total) and 43 older updated profiles (36 in vertebrates, 3 in D. melanogaster and 4 in A. thaliana; a 9% update in total). The new and updated profiles are mainly derived from published chromatin immunoprecipitation-seq experimental datasets. In addition, the web interface has been enhanced with advanced capabilities in browsing, searching and subsetting. Finally, the new JASPAR release is accompanied by a new BioPython package, a new R tool package and a new R/Bioconductor data package to facilitate access for both manual and automated methods.

...read moreread less

Journal Article•DOI•

RefSeq: an update on mammalian reference sequences

[...]

Kim D. Pruitt¹, Garth Brown¹, Susan M. Hiatt¹, Françoise Thibaud-Nissen¹, Alexander Astashyn¹, Olga Ermolaeva¹, Catherine M. Farrell¹, Jennifer Hart¹, Melissa J. Landrum¹, Kelly M. McGarvey¹, Michael R. Murphy¹, Nuala A. O'Leary¹, Shashikant Pujar¹, Bhanu Rajput¹, Sanjida H. Rangwala¹, Lillian D. Riddick¹, Andrei Shkeda¹, Hanzhen Sun¹, Pamela Tamez¹, Raymond E. Tully¹, Craig Wallin¹, David Webb¹, Janet Weber¹, Wendy Wu¹, Michael DiCuccio¹, Paul Kitts¹, Donna Maglott¹, Terence Murphy¹, James Ostell¹ - Show less +25 more•Institutions (1)

National Institutes of Health¹

01 Jan 2014-Nucleic Acids Research

TL;DR: The National Center for Biotechnology Information (NCBI) Reference Sequence (RefSeq) database is a collection of annotated genomic, transcript and protein sequence records derived from data in public sequence archives and from computation, curation and collaboration.

...read moreread less

Abstract: The National Center for Biotechnology Information (NCBI) Reference Sequence (RefSeq) database is a collection of annotated genomic, transcript and protein sequence records derived from data in public sequence archives and from computation, curation and collaboration (http://wwwncbinlmnihgov/refseq/) We report here on growth of the mammalian and human subsets, changes to NCBI’s eukaryotic annotation pipeline and modifications affecting transcript and protein records Recent changes to NCBI’s eukaryotic genome annotation pipeline provide higher throughput, and the addition of RNAseq data to the pipeline results in a significant expansion of the number of transcripts and novel exons annotated on mammalian RefSeq genomes Recent annotation changes include reporting supporting evidence for transcript records, modification of exon feature annotation and the addition of a structured report of gene and sequence attributes of biological interest We also describe a revised protein annotation policy for alternatively spliced transcripts with more divergent predicted proteins and we summarize the current status of the RefSeqGene project

...read moreread less

Journal Article•DOI•

CHOPCHOP: a CRISPR/Cas9 and TALEN web tool for genome editing

[...]

Tessa G. Montague¹, José M. Cruz¹, James A. Gagnon¹, George M. Church², Eivind Valen¹ - Show less +1 more•Institutions (2)

Harvard University¹, Wyss Institute for Biologically Inspired Engineering²

01 Jul 2014-Nucleic Acids Research

TL;DR: An online tool, CHOPCHOP, that uses efficient sequence alignment algorithms to minimize search times, and rigorously predicts off-target binding of single-guide RNAs (sgRNAs) and TALENs, making it a valuable tool for genome engineering.

...read moreread less

Abstract: Major advances in genome editing have recently been made possible with the development of the TALEN and CRISPR/Cas9 methods. The speed and ease of implementing these technologies has led to an explosion of mutant and transgenic organisms. A rate-limiting step in efficiently applying TALEN and CRISPR/Cas9 methods is the selection and design of targeting constructs. We have developed an online tool, CHOPCHOP (https://chopchop.rc.fas.harvard.edu), to expedite the design process. CHOPCHOP accepts a wide range of inputs (gene identifiers, genomic regions or pasted sequences) and provides an array of advanced options for target selection. It uses efficient sequence alignment algorithms to minimize search times, and rigorously predicts off-target binding of single-guide RNAs (sgRNAs) and TALENs. Each query produces an interactive visualization of the gene with candidate target sites displayed at their genomic positions and color-coded according to quality scores. In addition, for each possible target site, restriction sites and primer candidates are visualized, facilitating a streamlined pipeline of mutant generation and validation. The ease-of-use and speed of CHOPCHOP make it a valuable tool for genome engineering.

...read moreread less

Journal Article•DOI•

HMDD v2.0: a database for experimentally supported human microRNA and disease associations

[...]

Yang Li¹, Chengxiang Qiu¹, Jian Tu¹, Bin Geng¹, Jichun Yang¹, Tianzi Jiang¹, Qinghua Cui¹ - Show less +3 more•Institutions (1)

Chinese Academy of Sciences¹

01 Jan 2014-Nucleic Acids Research

TL;DR: The Human microRNA Disease Database (HMDD) v2.0 update presented several novel options for users to facilitate exploration of the data in the database, and presented more data that were generated based on concepts derived from the miRNA–disease association data, including disease spectrum width of miRNAs and miRNA spectrumwidth of human diseases.

...read moreread less

Abstract: The Human microRNA Disease Database (HMDD; available via the Web site at http://cmbi.bjmu.edu.cn/hmdd and http://202.38.126.151/hmdd/tools/hmdd2.html) is a collection of experimentally supported human microRNA (miRNA) and disease associations. Here, we describe the HMDD v2.0 update that presented several novel options for users to facilitate exploration of the data in the database. In the updated database, miRNA-disease association data were annotated in more details. For example, miRNA-disease association data from genetics, epigenetics, circulating miRNAs and miRNA-target interactions were integrated into the database. In addition, HMDD v2.0 presented more data that were generated based on concepts derived from the miRNA-disease association data, including disease spectrum width of miRNAs and miRNA spectrum width of human diseases. Moreover, we provided users a link to download all the data in the HMDD v2.0 and a link to submit novel data into the database. Meanwhile, we also maintained the old version of HMDD. By keeping data sets up-to-date, HMDD should continue to serve as a valuable resource for investigating the roles of miRNAs in human disease.

...read moreread less

Journal Article•DOI•

The IUPHAR/BPS Guide to PHARMACOLOGY: an expert-driven knowledgebase of drug targets and their ligands

[...]

Adam J. Pawson¹, Joanna L. Sharman¹, Helen E. Benson¹, Elena Faccenda, Stephen P.H. Alexander, Peter Buneman, Anthony P. Davenport, John C. McGrath, John A. Peters, Christopher Southan, Michael Spedding, Wenyuan Yu, Anthony J. Harmar, Nc-Iuphar - Show less +10 more•Institutions (1)

University of Edinburgh¹

01 Jan 2014-Nucleic Acids Research

TL;DR: The International Union of Basic and Clinical Pharmacology/British Pharmacological Society (IUPHAR/BPS) Guide to PHARMACOLOGY is a new open access resource providing pharmacological, chemical, genetic, functional and pathophysiological data on the targets of approved and experimental drugs.

...read moreread less

Abstract: The International Union of Basic and Clinical Pharmacology/British Pharmacological Society (IUPHAR/BPS) Guide to PHARMACOLOGY (http://www.guidetopharmacology.org) is a new open access resource providing pharmacological, chemical, genetic, functional and pathophysiological data on the targets of approved and experimental drugs. Created under the auspices of the IUPHAR and the BPS, the portal provides concise, peer-reviewed overviews of the key properties of a wide range of established and potential drug targets, with in-depth information for a subset of important targets. The resource is the result of curation and integration of data from the IUPHAR Database (IUPHAR-DB) and the published BPS 'Guide to Receptors and Channels' (GRAC) compendium. The data are derived from a global network of expert contributors, and the information is extensively linked to relevant databases, including ChEMBL, DrugBank, Ensembl, PubChem, UniProt and PubMed. Each of the ∼6000 small molecule and peptide ligands is annotated with manually curated 2D chemical structures or amino acid sequences, nomenclature and database links. Future expansion of the resource will complete the coverage of all the targets of currently approved drugs and future candidate targets, alongside educational resources to guide scientists and students in pharmacological principles and techniques.

...read moreread less

Journal Article•DOI•

SwissTargetPrediction: A web server for target prediction of bioactive small molecules

[...]

David Gfeller¹, Aurélien Grosdidier¹, Matthias Wirth¹, Antoine Daina¹, Olivier Michielin², Vincent Zoete¹ - Show less +2 more•Institutions (2)

Swiss Institute of Bioinformatics¹, Ludwig Institute for Cancer Research²

01 Jul 2014-Nucleic Acids Research

TL;DR: SwissTargetPrediction is introduced, a web server to accurately predict the targets of bioactive molecules based on a combination of 2D and 3D similarity measures with known ligands, which can be carried out in five different organisms.

...read moreread less

Abstract: Bioactive small molecules, such as drugs or metabolites, bind to proteins or other macro-molecular targets to modulate their activity, which in turn results in the observed phenotypic effects. For this reason, mapping the targets of bioactive small molecules is a key step toward unraveling the molecular mechanisms underlying their bioactivity and predicting potential side effects or cross-reactivity. Recently, large datasets of protein-small molecule interactions have become available, providing a unique source of information for the development of knowledge-based approaches to computationally identify new targets for uncharacterized molecules or secondary targets for known molecules. Here, we introduce SwissTargetPrediction, a web server to accurately predict the targets of bioactive molecules based on a combination of 2D and 3D similarity measures with known ligands. Predictions can be carried out in five different organisms, and mapping predictions by homology within and between different species is enabled for close paralogs and orthologs. SwissTargetPrediction is accessible free of charge and without login requirement at http://www.swisstargetprediction.ch.

...read moreread less

Journal Article•DOI•

The Human Phenotype Ontology project: linking molecular biology and disease through phenotype data

[...]

Sebastian Köhler¹, Sandra C. Doelken¹, Christopher J. Mungall², Sebastian Bauer¹, Helen V. Firth³, Helen V. Firth⁴, Isabelle Bailleul-Forestier⁵, Graeme C.M. Black⁶, Danielle L. Brown⁷, Michael Brudno⁸, Jennifer Campbell⁷, Jennifer Campbell⁹, David R. FitzPatrick, Janan T. Eppig, Andrew P. Jackson, Kathleen Freson¹⁰, Marta Girdea⁸, Ingo Helbig¹¹, Jane A. Hurst¹², Johanna A. Jähn¹¹, Laird G. Jackson¹³, Anne M. Kelly¹⁴, David H. Ledbetter¹⁵, Sahar Mansour¹⁶, Christa Lese Martin¹⁵, Celia Moss, Andrew D Mumford¹⁷, Willem H. Ouwehand¹⁴, Willem H. Ouwehand³, Soo Mi Park⁴, Erin Rooney Riggs¹⁵, Richard H. Scott¹², Sanjay M. Sisodiya¹², Steven Van Vooren, Ronald J. Wapner¹⁸, Andrew O.M. Wilkie¹⁹, Caroline F. Wright³, Anneke T. Vulto-van Silfhout²⁰, Nicole de Leeuw²⁰, Bert B.A. de Vries²⁰, Nicole L. Washingthon², Cynthia L. Smith, Monte Westerfield²¹, Paul N. Schofield¹⁴, Barbara J. Ruef²¹, Georgios V. Gkoutos²², Melissa A. Haendel, Damian Smedley³, Suzanna E. Lewis², Peter N. Robinson¹, Peter N. Robinson²³ - Show less +47 more•Institutions (23)

01 Jan 2014-Nucleic Acids Research

TL;DR: The updated HPO database is described, which provides annotations of 7,278 human hereditary syndromes listed in OMIM, Orphanet and DECIPHER to classes of the HPO, allowing integration of existing datasets and interoperability with multiple biomedical resources.

...read moreread less

Abstract: The Human Phenotype Ontology (HPO) project, available at http://www.human-phenotype-ontology.org, provides a structured, comprehensive and well-defined set of 10,088 classes (terms) describing human phenotypic abnormalities and 13,326 subclass relations between the HPO classes. In addition we have developed logical definitions for 46% of all HPO classes using terms from ontologies for anatomy, cell types, function, embryology, pathology and other domains. This allows interoperability with several resources, especially those containing phenotype information on model organisms such as mouse and zebrafish. Here we describe the updated HPO database, which provides annotations of 7,278 human hereditary syndromes listed in OMIM, Orphanet and DECIPHER to classes of the HPO. Various meta-attributes such as frequency, references and negations are associated with each annotation. Several large-scale projects worldwide utilize the HPO for describing phenotype information in their datasets. We have therefore generated equivalence mappings to other phenotype vocabularies such as LDDB, Orphanet, MedDRA, UMLS and phenoDB, allowing integration of existing datasets and interoperability with multiple biomedical resources. We have created various ways to access the HPO database content using flat files, a MySQL database, and Web-based tools. All data and documentation on the HPO project can be found online.

...read moreread less

Journal Article•DOI•

PlantTFDB 3.0: a portal for the functional and evolutionary study of plant transcription factors

[...]

Jinpu Jin¹, He Zhang¹, Lei Kong¹, Ge Gao¹, Jingchu Luo¹ - Show less +1 more•Institutions (1)

Peking University¹

01 Jan 2014-Nucleic Acids Research

TL;DR: The plant TF database PlantTFDB is updated to version 3.0, with more annotations for identified TFs, including expression, regulation, interaction, conserved elements, phenotype information, expert-curated descriptions derived from UniProt, TAIR and NCBI GeneRIF, as well as references to provide clues for functional studies of TFs.

...read moreread less

Abstract: With the aim to provide a resource for functional and evolutionary study of plant transcription factors (TFs), we updated the plant TF database PlantTFDB to version 3.0 (http://planttfdb.cbi.pku.edu.cn). After refining the TF classification pipeline, we systematically identified 129 288 TFs from 83 species, of which 67 species have genome sequences, covering main lineages of green plants. Besides the abundant annotation provided in the previous version, we generated more annotations for identified TFs, including expression, regulation, interaction, conserved elements, phenotype information, expert-curated descriptions derived from UniProt, TAIR and NCBI GeneRIF, as well as references to provide clues for functional studies of TFs. To help identify evolutionary relationship among identified TFs, we assigned 69 450 TFs into 3924 orthologous groups, and constructed 9217 phylogenetic trees for TFs within the same families or same orthologous groups, respectively. In addition, we set up a TF prediction server in this version for users to identify TFs from their own sequences.

...read moreread less

Collapse