Showing papers in "Nucleic Acids Research in 2019"

PDF

Open Access

Journal Article•DOI•

STRING v11: protein-protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets.

[...]

Damian Szklarczyk¹, Annika L. Gable¹, David Lyon¹, Alexander Junge², Stefan Wyder¹, Jaime Huerta-Cepas³, Milan Simonovic¹, Nadezhda Tsankova Doncheva², John H. Morris⁴, Peer Bork, Lars Juhl Jensen², Christian von Mering¹ - Show less +8 more•Institutions (4)

Swiss Institute of Bioinformatics¹, University of Copenhagen², Technical University of Madrid³, University of California, San Francisco⁴

08 Jan 2019-Nucleic Acids Research

TL;DR: The latest version of STRING more than doubles the number of organisms it covers, and offers an option to upload entire, genome-wide datasets as input, allowing users to visualize subsets as interaction networks and to perform gene-set enrichment analysis on the entire input.

...read moreread less

Abstract: Proteins and their functional interactions form the backbone of the cellular machinery. Their connectivity network needs to be considered for the full understanding of biological phenomena, but the available information on protein-protein associations is incomplete and exhibits varying levels of annotation granularity and reliability. The STRING database aims to collect, score and integrate all publicly available sources of protein-protein interaction information, and to complement these with computational predictions. Its goal is to achieve a comprehensive and objective global network, including direct (physical) as well as indirect (functional) interactions. The latest version of STRING (11.0) more than doubles the number of organisms it covers, to 5090. The most important new feature is an option to upload entire, genome-wide datasets as input, allowing users to visualize subsets as interaction networks and to perform gene-set enrichment analysis on the entire input. For the enrichment analysis, STRING implements well-known classification systems such as Gene Ontology and KEGG, but also offers additional, new classification systems based on high-throughput text-mining as well as on a hierarchical clustering of the association network itself. The STRING resource is available online at https://string-db.org/.

...read moreread less

10,584 citations

Journal Article•DOI•

The PRIDE database and related tools and resources in 2019: improving support for quantification data.

[...]

Yasset Perez-Riverol¹, Attila Csordas¹, Jingwen Bai¹, Manuel Bernal-Llinares¹, Suresh Hewapathirana¹, Deepti J. Kundu¹, Avinash Inuganti¹, Johannes Griss², Johannes Griss¹, Gerhard Mayer³, Martin Eisenacher³, Enrique Perez¹, Julian Uszkoreit³, Julianus Pfeuffer⁴, Timo Sachsenberg⁴, Şule Yılmaz⁵, Shivani Tiwary⁵, Juergen Cox⁵, Enrique Audain, Mathias Walzer¹, Andrew F. Jarnuczak¹, Tobias Ternent¹, Alvis Brazma¹, Juan Antonio Vizcaíno¹ - Show less +20 more•Institutions (5)

European Bioinformatics Institute¹, Medical University of Vienna², Ruhr University Bochum³, University of Tübingen⁴, Max Planck Society⁵

08 Jan 2019-Nucleic Acids Research

TL;DR: Key statistics on the current data contents and volume of downloads are outlined, and how PRIDE data are starting to be disseminated to added-value resources including Ensembl, UniProt and Expression Atlas are outlined.

...read moreread less

Abstract: The PRoteomics IDEntifications (PRIDE) database (https://www.ebi.ac.uk/pride/) is the world’s largest data repository of mass spectrometry-based proteomics data, and is one of the founding members of the global ProteomeXchange (PX) consortium. In this manuscript, we summarize the developments in PRIDE resources and related tools since the previous update manuscript was published in Nucleic Acids Research in 2016. In the last 3 years, public data sharing through PRIDE (as part of PX) has definitely become the norm in the field. In parallel, data re-use of public proteomics data has increased enormously, with multiple applications. We first describe the new architecture of PRIDE Archive, the archival component of PRIDE. PRIDE Archive and the related data submission framework have been further developed to support the increase in submitted data volumes and additional data types. A new scalable and fault tolerant storage backend, Application Programming Interface and web interface have been implemented, as a part of an ongoing process. Additionally, we emphasize the improved support for quantitative proteomics data through the mzTab format. At last, we outline key statistics on the current data contents and volume of downloads, and how PRIDE data are starting to be disseminated to added-value resources including Ensembl, UniProt and Expression Atlas.

...read moreread less

5,735 citations

Journal Article•DOI•

UniProt: A worldwide hub of protein knowledge

[...]

Alex Bateman

01 Jan 2019-Nucleic Acids Research

5,284 citations

Journal Article•DOI•

Interactive Tree Of Life (iTOL) v4: recent updates and new developments.

[...]

Ivica Letunic, Peer Bork¹•Institutions (1)

European Bioinformatics Institute¹

02 Jul 2019-Nucleic Acids Research

TL;DR: The current version of iTOL v4 introduces four new dataset types, together with numerous new features, and is the first tool which supports direct visualization of Qiime 2 trees and associated annotations.

...read moreread less

Abstract: The Interactive Tree Of Life (https://itol.embl.de) is an online tool for the display, manipulation and annotation of phylogenetic and other trees. It is freely available and open to everyone. The current version introduces four new dataset types, together with numerous new features. Annotation options have been expanded and new control options added for many display elements. An interactive spreadsheet-like editor has been implemented, providing dataset creation and editing directly in the web interface. Font support has been rewritten with full support for UTF-8 character encoding throughout the user interface. Google Web Fonts are now fully supported in the tree text labels. iTOL v4 is the first tool which supports direct visualization of Qiime 2 trees and associated annotations. The user account system has been streamlined and expanded with new navigation options, and currently handles >700 000 trees from more than 40 000 individual users. Full batch access has been implemented allowing programmatic upload and export of trees and annotations.

...read moreread less

4,233 citations

Journal Article•DOI•

The Pfam protein families database in 2019.

[...]

Sara El-Gebali¹, Jaina Mistry¹, Alex Bateman¹, Sean R. Eddy², Aurelien Luciani¹, Simon C. Potter¹, Matloob Qureshi¹, Lorna Richardson¹, Gustavo A. Salazar¹, Alfredo Smart¹, Erik L. L. Sonnhammer³, Layla Hirsh⁴, Layla Hirsh⁵, Lisanna Paladin⁴, Damiano Piovesan⁴, Silvio C. E. Tosatto⁴, Robert D. Finn¹ - Show less +13 more•Institutions (5)

European Bioinformatics Institute¹, Harvard University², Science for Life Laboratory³, University of Padua⁴, Pontifical Catholic University of Peru⁵

08 Jan 2019-Nucleic Acids Research

TL;DR: A significant comparison to the structural classification database that led to the creation of 825 new families based on their set of uncharacterized families (EUFs) was carried out and Pfam entries were connected to the Sequence Ontology (SO) through mapping of the Pfam type definitions to SO terms.

...read moreread less

Abstract: The last few years have witnessed significant changes in Pfam (https://pfam.xfam.org). The number of families has grown substantially to a total of 17,929 in release 32.0. New additions have been coupled with efforts to improve existing families, including refinement of domain boundaries, their classification into Pfam clans, as well as their functional annotation. We recently began to collaborate with the RepeatsDB resource to improve the definition of tandem repeat families within Pfam. We carried out a significant comparison to the structural classification database, namely the Evolutionary Classification of Protein Domains (ECOD) that led to the creation of 825 new families based on their set of uncharacterized families (EUFs). Furthermore, we also connected Pfam entries to the Sequence Ontology (SO) through mapping of the Pfam type definitions to SO terms. Since Pfam has many community contributors, we recently enabled the linking between authorship of all Pfam entries with the corresponding authors' ORCID identifiers. This effectively permits authors to claim credit for their Pfam curation and link them to their ORCID record.

...read moreread less

3,617 citations

Journal Article•DOI•

The EMBL-EBI search and sequence analysis tools APIs in 2019

[...]

Fábio Madeira¹, Youngmi Park¹, Joon Lee¹, Nicola Buso¹, Tamer Gur¹, Nandana Madhusoodanan¹, Prasad Basutkar¹, Adrian R N Tivey¹, Simon C. Potter¹, Robert D. Finn¹, Rodrigo Lopez¹ - Show less +7 more•Institutions (1)

European Bioinformatics Institute¹

02 Jul 2019-Nucleic Acids Research

TL;DR: The latest improvements made to the frameworks which enhance the interconnectivity between public EMBL-EBI resources and ultimately enhance biological data discoverability, accessibility, interoperability and reusability are described.

...read moreread less

Abstract: The EMBL-EBI provides free access to popular bioinformatics sequence analysis applications as well as to a full-featured text search engine with powerful cross-referencing and data retrieval capabilities. Access to these services is provided via user-friendly web interfaces and via established RESTful and SOAP Web Services APIs (https://www.ebi.ac.uk/seqdb/confluence/display/JDSAT/EMBL-EBI+Web+Services+APIs+-+Data+Retrieval). Both systems have been developed with the same core principles that allow them to integrate an ever-increasing volume of biological data, making them an integral part of many popular data resources provided at the EMBL-EBI. Here, we describe the latest improvements made to the frameworks which enhance the interconnectivity between public EMBL-EBI resources and ultimately enhance biological data discoverability, accessibility, interoperability and reusability.

...read moreread less

3,529 citations

Journal Article•DOI•

g:Profiler: a web server for functional enrichment analysis and conversions of gene lists (2019 update).

[...]

Uku Raudvere¹, Liis Kolberg¹, Ivan Kuzmin¹, Tambet Arak¹, Priit Adler¹, Hedi Peterson¹, Jaak Vilo¹ - Show less +3 more•Institutions (1)

University of Tartu¹

02 Jul 2019-Nucleic Acids Research

TL;DR: G:Profiler is now capable of analysing data from any organism, including vertebrates, plants, fungi, insects and parasites, and the 2019 update introduces an extensive technical rewrite making the services faster and more flexible.

...read moreread less

Abstract: Biological data analysis often deals with lists of genes arising from various studies. The g:Profiler toolset is widely used for finding biological categories enriched in gene lists, conversions between gene identifiers and mappings to their orthologs. The mission of g:Profiler is to provide a reliable service based on up-to-date high quality data in a convenient manner across many evidence types, identifier spaces and organisms. g:Profiler relies on Ensembl as a primary data source and follows their quarterly release cycle while updating the other data sources simultaneously. The current update provides a better user experience due to a modern responsive web interface, standardised API and libraries. The results are delivered through an interactive and configurable web design. Results can be downloaded as publication ready visualisations or delimited text files. In the current update we have extended the support to 467 species and strains, including vertebrates, plants, fungi, insects and parasites. By supporting user uploaded custom GMT files, g:Profiler is now capable of analysing data from any organism. All past releases are maintained for reproducibility and transparency. The 2019 update introduces an extensive technical rewrite making the services faster and more flexible. g:Profiler is freely available at https://biit.cs.ut.ee/gprofiler.

...read moreread less

2,959 citations

Journal Article•DOI•

The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019.

[...]

Annalisa Buniello¹, Jacqueline A. L. MacArthur¹, Maria Cerezo¹, Laura W. Harris¹, James D. Hayhurst¹, Cinzia Malangone¹, Aoife McMahon¹, Joannella Morales¹, Edward Mountjoy², Edward Mountjoy³, Elliot Sollis¹, Daniel Suveges¹, Olga Vrousgou¹, Patricia L. Whetzel¹, M. Ridwan Amode¹, Jose A. Guillen¹, Harpreet Singh Riat¹, Stephen J. Trevanion¹, Peggy Hall⁴, Heather Junkins⁴, Paul Flicek¹, Tony Burdett¹, Lucia A. Hindorff⁴, Fiona Cunningham¹, Helen Parkinson¹ - Show less +21 more•Institutions (4)

European Bioinformatics Institute¹, Wellcome Trust Sanger Institute², University of Oxford³, National Institutes of Health⁴

08 Jan 2019-Nucleic Acids Research

TL;DR: Improved data access is improved with the release of a new RESTful API to support high-throughput programmatic access, an improved web interface and a new summary statistics database.

...read moreread less

Abstract: The GWAS Catalog delivers a high-quality curated collection of all published genome-wide association studies enabling investigations to identify causal variants, understand disease mechanisms, and establish targets for novel therapies. The scope of the Catalog has also expanded to targeted and exome arrays with 1000 new associations added for these technologies. As of September 2018, the Catalog contains 5687 GWAS comprising 71673 variant-trait associations from 3567 publications. New content includes 284 full P-value summary statistics datasets for genome-wide and new targeted array studies, representing 6 × 109 individual variant-trait statistics. In the last 12 months, the Catalog's user interface was accessed by ∼90000 unique users who viewed >1 million pages. We have improved data access with the release of a new RESTful API to support high-throughput programmatic access, an improved web interface and a new summary statistics database. Summary statistics provision is supported by a new format proposed as a community standard for summary statistics data representation. This format was derived from our experience in standardizing heterogeneous submissions, mapping formats and in harmonizing content. Availability: https://www.ebi.ac.uk/gwas/.

...read moreread less

2,878 citations

Journal Article•DOI•

COSMIC: the Catalogue Of Somatic Mutations In Cancer

[...]

John Tate¹, Sally Bamford¹, Harry Jubb², Harry Jubb¹, Zbyslaw Sondka¹, David Beare¹, Nidhi Bindal¹, Harry Boutselakis¹, Charlotte G. Cole¹, Celestino Creatore¹, Elisabeth Dawson¹, Peter Fish¹, Bhavana Harsha¹, Charlie Hathaway¹, Steve C Jupe¹, Chai Yin Kok¹, Kate Noble¹, Laura Ponting¹, Christopher C Ramshaw¹, Claire Rye¹, Helen E. Speedy¹, Raymund Stefancsik¹, Sam Thompson¹, Shicai Wang¹, Sari Ward¹, Peter J. Campbell¹, Simon A. Forbes¹ - Show less +23 more•Institutions (2)

Wellcome Trust Sanger Institute¹, Astex²

08 Jan 2019-Nucleic Acids Research

TL;DR: Improvements to the public website and data-download systems and new functionality in COSMIC-3D allows exploration of mutations within three-dimensional protein structures, their protein structural and functional impacts, and implications for druggability.

...read moreread less

Abstract: COSMIC, the Catalogue Of Somatic Mutations In Cancer (https://cancer.sanger.ac.uk) is the most detailed and comprehensive resource for exploring the effect of somatic mutations in human cancer. The latest release, COSMIC v86 (August 2018), includes almost 6 million coding mutations across 1.4 million tumour samples, curated from over 26 000 publications. In addition to coding mutations, COSMIC covers all the genetic mechanisms by which somatic mutations promote cancer, including non-coding mutations, gene fusions, copy-number variants and drug-resistance mutations. COSMIC is primarily hand-curated, ensuring quality, accuracy and descriptive data capture. Building on our manual curation processes, we are introducing new initiatives that allow us to prioritize key genes and diseases, and to react more quickly and comprehensively to new findings in the literature. Alongside improvements to the public website and data-download systems, new functionality in COSMIC-3D allows exploration of mutations within three-dimensional protein structures, their protein structural and functional impacts, and implications for druggability. In parallel with COSMIC's deep and broad variant coverage, the Cancer Gene Census (CGC) describes a curated catalogue of genes driving every form of human cancer. Currently describing 719 genes, the CGC has recently introduced functional descriptions of how each gene drives disease, summarized into the 10 cancer Hallmarks.

...read moreread less

2,626 citations

Journal Article•DOI•

miRBase: from microRNA sequences to function

[...]

Ana Kozomara¹, Maria Birgaoanu¹, Sam Griffiths-Jones¹•Institutions (1)

University of Manchester¹

08 Jan 2019-Nucleic Acids Research

TL;DR: Improvements to the database and website are described to provide more information about the quality of micro RNA gene annotations, and the cellular functions of their products, and to improve the availability of microRNA functional information.

...read moreread less

Abstract: miRBase catalogs, names and distributes microRNA gene sequences. The latest release of miRBase (v22) contains microRNA sequences from 271 organisms: 38 589 hairpin precursors and 48 860 mature microRNAs. We describe improvements to the database and website to provide more information about the quality of microRNA gene annotations, and the cellular functions of their products. We have collected 1493 small RNA deep sequencing datasets and mapped a total of 5.5 billion reads to microRNA sequences. The read mapping patterns provide strong support for the validity of between 20% and 65% of microRNA annotations in different well-studied animal genomes, and evidence for the removal of >200 sequences from the database. To improve the availability of microRNA functional information, we are disseminating Gene Ontology terms annotated against miRBase sequences. We have also used a text-mining approach to search for microRNA gene names in the full-text of open access articles. Over 500 000 sentences from 18 542 papers contain microRNA names. We score these sentences for functional information and link them with 12 519 microRNA entries. The sentences themselves, and word clouds built from them, provide effective summaries of the functional information about specific microRNAs. miRBase is publicly and freely available at http://mirbase.org/.

...read moreread less

2,508 citations

Journal Article•DOI•

PANTHER version 14: more genomes, a new PANTHER GO-slim and improvements in enrichment analysis tools

[...]

Huaiyu Mi¹, Anushya Muruganujan¹, Dustin Ebert¹, Xiaosong Huang², Xiaosong Huang¹, Paul Thomas¹ - Show less +2 more•Institutions (2)

University of Southern California¹, Guangzhou University²

08 Jan 2019-Nucleic Acids Research

TL;DR: Protein Analysis Through Evolutionary Relationships is a resource for the evolutionary and functional classification of genes from organisms across the tree of life, and an entirely new PANTHER GO-slim is developed, containing over four times as many Gene Ontology terms as the previous GO- slim.

...read moreread less

Abstract: PANTHER (Protein Analysis Through Evolutionary Relationships, http://pantherdb.org) is a resource for the evolutionary and functional classification of genes from organisms across the tree of life. We report the improvements we have made to the resource during the past two years. For evolutionary classifications, we have added more prokaryotic and plant genomes to the phylogenetic gene trees, expanding the representation of gene evolution in these lineages. We have refined many protein family boundaries, and have aligned PANTHER with the MEROPS resource for protease and protease inhibitor families. For functional classifications, we have developed an entirely new PANTHER GO-slim, containing over four times as many Gene Ontology terms as our previous GO-slim, as well as curated associations of genes to these terms. Lastly, we have made substantial improvements to the enrichment analysis tools available on the PANTHER website: users can now analyze over 900 different genomes, using updated statistical tests with false discovery rate corrections for multiple testing. The overrepresentation test is also available as a web service, for easy addition to third-party sites.

...read moreread less

Journal Article•DOI•

The Gene Ontology Resource: 20 years and still GOing strong

[...]

Seth Carbon¹, Eric Douglass¹, Nathan Dunn¹, Benjamin M. Good¹ +189 more•Institutions (19)

08 Jan 2019-Nucleic Acids Research

TL;DR: GO-CAM, a new framework for representing gene function that is more expressive than standard GO annotations, has been released, and users can now explore the growing repository of these models.

...read moreread less

Abstract: The Gene Ontology resource (GO; http://geneontology.org) provides structured, computable knowledge regarding the functions of genes and gene products. Founded in 1998, GO has become widely adopted in the life sciences, and its contents are under continual improvement, both in quantity and in quality. Here, we report the major developments of the GO resource during the past two years. Each monthly release of the GO resource is now packaged and given a unique identifier (DOI), enabling GO-based analyses on a specific release to be reproduced in the future. The molecular function ontology has been refactored to better represent the overall activities of gene products, with a focus on transcription regulator activities. Quality assurance efforts have been ramped up to address potentially out-of-date or inaccurate annotations. New evidence codes for high-throughput experiments now enable users to filter out annotations obtained from these sources. GO-CAM, a new framework for representing gene function that is more expressive than standard GO annotations, has been released, and users can now explore the growing repository of these models. We also provide the ‘GO ribbon’ widget for visualizing GO annotations to a gene; the widget can be easily embedded in any web page.

...read moreread less

Journal Article•DOI•

GENCODE reference annotation for the human and mouse genomes.

[...]

Adam Frankish¹, Mark Diekhans², Anne-Maud Ferreira³, Rory Johnson⁴, Irwin Jungreis⁵, Irwin Jungreis⁶, Jane E. Loveland¹, Jonathan M. Mudge¹, Cristina Sisu⁷, Cristina Sisu⁸, James C. Wright, Joel Armstrong², If Barnes¹, Andrew Berry¹, Alexandra Bignell¹, Silvia Carbonell Sala, Jacqueline Chrast³, Fiona Cunningham¹, Tomás Di Domenico, Sarah Donaldson¹, Ian T. Fiddes², Carlos García Girón¹, Jose Manuel Gonzalez¹, Tiago Grego¹, Matthew P. Hardy¹, Thibaut Hourlier¹, Toby Hunt¹, Osagie G. Izuogu¹, Julien Lagarde, Fergal J. Martin¹, Laura Martinez, Shamika Mohanan¹, Paul R. Muir⁸, Fabio C. P. Navarro⁸, Anne Parker¹, Baikang Pei⁸, Fernando Pozo, Magali Ruffier¹, Bianca M. Schmitt¹, Eloise Stapleton¹, Marie-Marthe Suner¹, Irina Sycheva¹, Barbara Uszczynska-Ratajczak⁹, Jinuri Xu⁸, Andrew D. Yates¹, Daniel R. Zerbino¹, Yan Zhang¹⁰, Yan Zhang⁸, Bronwen Aken¹, Jyoti S. Choudhary, Mark Gerstein⁸, Roderic Guigó¹¹, Tim Hubbard¹², Manolis Kellis⁶, Manolis Kellis⁵, Benedict Paten², Alexandre Reymond³, Michael L. Tress, Paul Flicek¹ - Show less +55 more•Institutions (12)

European Bioinformatics Institute¹, University of California, Santa Cruz², University of Lausanne³, University of Bern⁴, Massachusetts Institute of Technology⁵, Broad Institute⁶, Brunel University London⁷, Yale University⁸, University of Warsaw⁹, Ohio State University¹⁰, Pompeu Fabra University¹¹, King's College London¹²

08 Jan 2019-Nucleic Acids Research

TL;DR: This work generates primary data, creates bioinformatics tools and provides analysis to support the work of expert manual gene annotators and automated gene annotation pipelines to identify and characterise gene loci to the highest standard.

...read moreread less

Abstract: The accurate identification and description of the genes in the human and mouse genomes is a fundamental requirement for high quality analysis of data informing both genome biology and clinical genomics. Over the last 15 years, the GENCODE consortium has been producing reference quality gene annotations to provide this foundational resource. The GENCODE consortium includes both experimental and computational biology groups who work together to improve and extend the GENCODE gene annotation. Specifically, we generate primary data, create bioinformatics tools and provide analysis to support the work of expert manual gene annotators and automated gene annotation pipelines. In addition, manual and computational annotation workflows use any and all publicly available data and analysis, along with the research literature to identify and characterise gene loci to the highest standard. GENCODE gene annotations are accessible via the Ensembl and UCSC Genome Browsers, the Ensembl FTP site, Ensembl Biomart, Ensembl Perl and REST APIs as well as https://www.gencodegenes.org.

...read moreread less

Journal Article•DOI•

CADD: predicting the deleteriousness of variants throughout the human genome.

[...]

Philipp Rentzsch¹, Daniela Witten², Gregory M. Cooper, Jay Shendure², Martin Kircher¹, Martin Kircher² - Show less +2 more•Institutions (2)

Charité¹, University of Washington²

08 Jan 2019-Nucleic Acids Research

TL;DR: The latest updates to CADD are reviewed, including the most recent version, 1.4, which supports the human genome build GRCh38, and also present updates to the website that include simplified variant lookup, extended documentation, an Application Program Interface and improved mechanisms for integrating CADD scores into other tools or applications.

...read moreread less

Abstract: Combined Annotation-Dependent Depletion (CADD) is a widely used measure of variant deleteriousness that can effectively prioritize causal variants in genetic analyses, particularly highly penetrant contributors to severe Mendelian disorders. CADD is an integrative annotation built from more than 60 genomic features, and can score human single nucleotide variants and short insertion and deletions anywhere in the reference assembly. CADD uses a machine learning model trained on a binary distinction between simulated de novo variants and variants that have arisen and become fixed in human populations since the split between humans and chimpanzees; the former are free of selective pressure and may thus include both neutral and deleterious alleles, while the latter are overwhelmingly neutral (or, at most, weakly deleterious) by virtue of having survived millions of years of purifying selection. Here we review the latest updates to CADD, including the most recent version, 1.4, which supports the human genome build GRCh38. We also present updates to our website that include simplified variant lookup, extended documentation, an Application Program Interface and improved mechanisms for integrating CADD scores into other tools or applications. CADD scores, software and documentation are available at https://cadd.gs.washington.edu.

...read moreread less

Journal Article•DOI•

antiSMASH 5.0: updates to the secondary metabolite genome mining pipeline

[...]

Kai Blin¹, Simon Shaw¹, Katharina Steinke², Rasmus Villebro¹, Nadine Ziemert², Sang Yup Lee¹, Sang Yup Lee³, Marnix H. Medema⁴, Tilmann Weber¹ - Show less +5 more•Institutions (4)

Technical University of Denmark¹, University of Tübingen², KAIST³, Wageningen University and Research Centre⁴

02 Jul 2019-Nucleic Acids Research

TL;DR: AntiSMASH 5 adds detection rules for clusters encoding the biosynthesis of acyl-amino acids, β-lactones, fungal RiPPs, RaS-Ri PPs, polybrominated diphenyl ethers, C-nucleosides, PPY-like ketones and lipolanthines and provides more detailed predictions for type II polyketide synthase-encoding gene clusters.

...read moreread less

Abstract: Secondary metabolites produced by bacteria and fungi are an important source of antimicrobials and other bioactive compounds. In recent years, genome mining has seen broad applications in identifying and characterizing new compounds as well as in metabolic engineering. Since 2011, the 'antibiotics and secondary metabolite analysis shell-antiSMASH' (https://antismash.secondarymetabolites.org) has assisted researchers in this, both as a web server and a standalone tool. It has established itself as the most widely used tool for identifying and analysing biosynthetic gene clusters (BGCs) in bacterial and fungal genome sequences. Here, we present an entirely redesigned and extended version 5 of antiSMASH. antiSMASH 5 adds detection rules for clusters encoding the biosynthesis of acyl-amino acids, β-lactones, fungal RiPPs, RaS-RiPPs, polybrominated diphenyl ethers, C-nucleosides, PPY-like ketones and lipolanthines. For type II polyketide synthase-encoding gene clusters, antiSMASH 5 now offers more detailed predictions. The HTML output visualization has been redesigned to improve the navigation and visual representation of annotations. We have again improved the runtime of analysis steps, making it possible to deliver comprehensive annotations for bacterial genomes within a few minutes. A new output file in the standard JavaScript object notation (JSON) format is aimed at downstream tools that process antiSMASH results programmatically.

...read moreread less

Journal Article•DOI•

PubChem 2019 update: improved access to chemical data

[...]

Sunghwan Kim¹, Jie Chen¹, Tiejun Cheng¹, Asta Gindulyte¹, Jia He¹, Siqian He¹, Qingliang Li¹, Benjamin A. Shoemaker¹, Paul A. Thiessen¹, Bo Yu¹, Leonid Zaslavsky¹, Jian Zhang¹, Evan E Bolton¹ - Show less +9 more•Institutions (1)

National Institutes of Health¹

08 Jan 2019-Nucleic Acids Research

TL;DR: This paper describes the new developments in PubChem, a key chemical information resource for the biomedical research community, which released new web interfaces, such as PubChem Target View page, Sources page, Bioactivity dyad pages and Patent View page.

...read moreread less

Abstract: PubChem (https://pubchem.ncbi.nlm.nih.gov) is a key chemical information resource for the biomedical research community. Substantial improvements were made in the past few years. New data content was added, including spectral information, scientific articles mentioning chemicals, and information for food and agricultural chemicals. PubChem released new web interfaces, such as PubChem Target View page, Sources page, Bioactivity dyad pages and Patent View page. PubChem also released a major update to PubChem Widgets and introduced a new programmatic access interface, called PUG-View. This paper describes these new developments in PubChem.

...read moreread less

Journal Article•DOI•

GEPIA2: an enhanced web server for large-scale expression profiling and interactive analysis.

[...]

Zefang Tang¹, Boxi Kang¹, Chenwei Li¹, Tianxiang Chen¹, Zemin Zhang¹ - Show less +1 more•Institutions (1)

Peking University¹

02 Jul 2019-Nucleic Acids Research

TL;DR: GEPIA2 has adopted new analysis techniques of gene signature quantification inspired by single-cell sequencing studies, and provides customized analysis where users can upload their own RNA-seq data and compare them with TCGA and GTEx samples.

...read moreread less

Abstract: Introduced in 2017, the GEPIA (Gene Expression Profiling Interactive Analysis) web server has been a valuable and highly cited resource for gene expression analysis based on tumor and normal samples from the TCGA and the GTEx databases. Here, we present GEPIA2, an updated and enhanced version to provide insights with higher resolution and more functionalities. Featuring 198 619 isoforms and 84 cancer subtypes, GEPIA2 has extended gene expression quantification from the gene level to the transcript level, and supports analysis of a specific cancer subtype, and comparison between subtypes. In addition, GEPIA2 has adopted new analysis techniques of gene signature quantification inspired by single-cell sequencing studies, and provides customized analysis where users can upload their own RNA-seq data and compare them with TCGA and GTEx samples. We also offer an API for batch process and easy retrieval of the analysis results. The updated web server is publicly accessible at http://gepia2.cancer-pku.cn/.

...read moreread less

Journal Article•DOI•

eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses.

[...]

Jaime Huerta-Cepas¹, Damian Szklarczyk², Davide Heller², Ana Hernández-Plaza¹, Sofia K. Forslund³, Helen Cook⁴, Daniel R. Mende⁵, Ivica Letunic, Thomas Rattei⁶, Lars Juhl Jensen⁴, Christian von Mering², Peer Bork - Show less +8 more•Institutions (6)

Technical University of Madrid¹, Swiss Institute of Bioinformatics², Max Delbrück Center for Molecular Medicine³, University of Copenhagen⁴, University of Hawaii⁵, University of Vienna⁶

08 Jan 2019-Nucleic Acids Research

TL;DR: eggNOG as discussed by the authors is a public database of orthology relationships, gene evolutionary histories and functional annotations, with a major update of the underlying genome sets, which have been expanded to 4445 representative bacteria and 168 archaea derived from 25 038 genomes.

...read moreread less

Abstract: eggNOG is a public database of orthology relationships, gene evolutionary histories and functional annotations. Here, we present version 5.0, featuring a major update of the underlying genome sets, which have been expanded to 4445 representative bacteria and 168 archaea derived from 25 038 genomes, as well as 477 eukaryotic organisms and 2502 viral proteomes that were selected for diversity and filtered by genome quality. In total, 4.4M orthologous groups (OGs) distributed across 379 taxonomic levels were computed together with their associated sequence alignments, phylogenies, HMM models and functional descriptors. Precomputed evolutionary analysis provides fine-grained resolution of duplication/speciation events within each OG. Our benchmarks show that, despite doubling the amount of genomes, the quality of orthology assignments and functional annotations (80% coverage) has persisted without significant changes across this update. Finally, we improved eggNOG online services for fast functional annotation and orthology prediction of custom genomics or metagenomics datasets. All precomputed data are publicly available for downloading or via API queries at http://eggnog.embl.de.

...read moreread less

Journal Article•DOI•

WebGestalt 2019: gene set analysis toolkit with revamped UIs and APIs.

[...]

Yuxing Liao¹, Jing Wang¹, Eric J. Jaehnig¹, Zhiao Shi¹, Bing Zhang¹ - Show less +1 more•Institutions (1)

Baylor College of Medicine¹

02 Jul 2019-Nucleic Acids Research

TL;DR: In the 2019 update, WebGestalt supports 12 organisms, 342 gene identifiers and 155 175 functional categories, as well as user-uploaded functional databases and has completely redesigned result visualizations and user interfaces to improve user-friendliness and to provide multiple types of interactive and publication-ready figures.

...read moreread less

Abstract: WebGestalt is a popular tool for the interpretation of gene lists derived from large scale -omics studies. In the 2019 update, WebGestalt supports 12 organisms, 342 gene identifiers and 155 175 functional categories, as well as user-uploaded functional databases. To address the growing and unique need for phosphoproteomics data interpretation, we have implemented phosphosite set analysis to identify important kinases from phosphoproteomics data. We have completely redesigned result visualizations and user interfaces to improve user-friendliness and to provide multiple types of interactive and publication-ready figures. To facilitate comprehension of the enrichment results, we have implemented two methods to reduce redundancy between enriched gene sets. We introduced a web API for other applications to get data programmatically from the WebGestalt server or pass data to WebGestalt for analysis. We also wrapped the core computation into an R package called WebGestaltR for users to perform analysis locally or in third party workflows. WebGestalt can be freely accessed at http://www.webgestalt.org.

...read moreread less

Journal Article•DOI•

The UNITE database for molecular identification of fungi: handling dark taxa and parallel taxonomic classifications.

[...]

Rolf Henrik Nilsson¹, Karl-Henrik Larsson², Andy F. S. Taylor³, Johan Bengtsson-Palme¹, Johan Bengtsson-Palme⁴, Thomas Stjernegaard Jeppesen⁵, Dmitry Schigel⁵, Peter G. Kennedy⁶, Kathryn T. Picard⁷, Frank Oliver Glöckner⁸, Leho Tedersoo⁹, Irja Saar⁹, Urmas Kõljalg⁹, Kessy Abarenkov² - Show less +10 more•Institutions (9)

University of Gothenburg¹, American Museum of Natural History², University of Aberdeen³, University of Wisconsin-Madison⁴, Global Biodiversity Information Facility⁵, University of Minnesota⁶, National Museum of Natural History⁷, Jacobs University Bremen⁸, University of Tartu⁹

08 Jan 2019-Nucleic Acids Research

TL;DR: UNITE is a web-based database and sequence management environment for the molecular identification of fungi that targets the formal fungal barcode—the nuclear ribosomal internal transcribed spacer region—and offers all public fungal ITS sequences for reference.

...read moreread less

Abstract: Alfred P. Sloan Foundation [G-2015-14062]; Swedish Research Council of Environment, Agricultural Sciences, and Spatial Planning [FORMAS, 215-2011-498]; European Regional Development Fund (Centre of Excellence EcolChange) [TK131]; Estonian Research Council [IUT20-30]. Funding for open access charge: Swedish Research Council of Environment, Agricultural Sciences and Spatial Planning.

...read moreread less

Journal Article•DOI•

CARD 2020: antibiotic resistome surveillance with the comprehensive antibiotic resistance database

[...]

Brian Alcock¹, Amogelang R. Raphenya¹, Tammy T. Y. Lau¹, Kara K. Tsang¹, Mégane Bouchard¹, Arman Edalatmand¹, William Huynh¹, Anna-Lisa V. Nguyen¹, Annie A. Cheng¹, Sihan Liu¹, Sally Y. Min¹, Anatoly Miroshnichenko¹, Hiu-Ki R Tran¹, Rafik El Werfalli¹, Jalees A. Nasir¹, Martins Oloni¹, David Speicher¹, Alexandra Florescu¹, Bhavya Singh¹, Mateusz Faltyn¹, Anastasia Hernández-Koutoucheva², Arjun N. Sharma¹, Emily Bordeleau¹, Andrew C. Pawlowski³, Haley L. Zubyk¹, Damion M. Dooley⁴, Emma Griffiths⁵, Finlay Maguire⁶, Geoffrey L. Winsor⁵, Robert G. Beiko⁶, Fiona S. L. Brinkman⁵, William W. L. Hsiao⁵, William W. L. Hsiao⁴, Gary Van Domselaar⁷, Gary Van Domselaar⁸, Andrew G. McArthur¹ - Show less +32 more•Institutions (8)

McMaster University¹, National Autonomous University of Mexico², Harvard University³, University of British Columbia⁴, Simon Fraser University⁵, Dalhousie University⁶, Public Health Agency of Canada⁷, University of Manitoba⁸

29 Oct 2019-Nucleic Acids Research

TL;DR: A new Resistomes & Variants module provides analysis and statistical summary of in silico predicted resistance variants from 82 pathogens and over 100 000 genomes, able to summarize predicted resistance using the information included in CARD, identify trends in AMR mobility and determine previously undescribed and novel resistance variants.

...read moreread less

Abstract: The Comprehensive Antibiotic Resistance Database (CARD; https://card.mcmaster.ca) is a curated resource providing reference DNA and protein sequences, detection models and bioinformatics tools on the molecular basis of bacterial antimicrobial resistance (AMR). CARD focuses on providing high-quality reference data and molecular sequences within a controlled vocabulary, the Antibiotic Resistance Ontology (ARO), designed by the CARD biocuration team to integrate with software development efforts for resistome analysis and prediction, such as CARD's Resistance Gene Identifier (RGI) software. Since 2017, CARD has expanded through extensive curation of reference sequences, revision of the ontological structure, curation of over 500 new AMR detection models, development of a new classification paradigm and expansion of analytical tools. Most notably, a new Resistomes & Variants module provides analysis and statistical summary of in silico predicted resistance variants from 82 pathogens and over 100 000 genomes. By adding these resistance variants to CARD, we are able to summarize predicted resistance using the information included in CARD, identify trends in AMR mobility and determine previously undescribed and novel resistance variants. Here, we describe updates and recent expansions to CARD and its biocuration process, including new resources for community biocuration of AMR molecular reference data.

...read moreread less

Journal Article•DOI•

The R package Rsubread is easier, faster, cheaper and better for alignment and quantification of RNA sequencing reads.

[...]

Yang Liao¹, Yang Liao², Gordon K. Smyth², Gordon K. Smyth¹, Wei Shi¹, Wei Shi² - Show less +2 more•Institutions (2)

Walter and Eliza Hall Institute of Medical Research¹, University of Melbourne²

07 May 2019-Nucleic Acids Research

TL;DR: Rsubread is presented, a Bioconductor software package that provides high-performance alignment and read counting functions for RNA-seq reads that integrates read mapping and quantification in a single package and has no software dependencies other than R itself.

...read moreread less

Abstract: We present Rsubread, a Bioconductor software package that provides high-performance alignment and read counting functions for RNA-seq reads. Rsubread is based on the successful Subread suite with the added ease-of-use of the R programming environment, creating a matrix of read counts directly as an R object ready for downstream analysis. It integrates read mapping and quantification in a single package and has no software dependencies other than R itself. We demonstrate Rsubread's ability to detect exon-exon junctions de novo and to quantify expression at the level of either genes, exons or exon junctions. The resulting read counts can be input directly into a wide range of downstream statistical analyses using other Bioconductor packages. Using SEQC data and simulations, we compare Rsubread to TopHat2, STAR and HTSeq as well as to counting functions in the Bioconductor infrastructure packages. We consider the performance of these tools on the combined quantification task starting from raw sequence reads through to summary counts, and in particular evaluate the performance of different combinations of alignment and counting algorithms. We show that Rsubread is faster and uses less memory than competitor tools and produces read count summaries that more accurately correlate with true values.

...read moreread less

Journal Article•DOI•

New approach for understanding genome variations in KEGG.

[...]

Minoru Kanehisa¹, Yoko Sato², Miho Furumichi¹, Kanae Morishima¹, Mao Tanabe¹ - Show less +1 more•Institutions (2)

Kyoto University¹, Fujitsu²

08 Jan 2019-Nucleic Acids Research

TL;DR: This work has introduced a new approach where human gene variants are explicitly incorporated into what it calls ‘network variants’ in the recently released KEGG NETWORK database, which allows accumulation of knowledge about disease-related perturbed molecular networks caused not only by gene variants, but also by viruses and other pathogens, environmental factors and drugs.

...read moreread less

Abstract: KEGG (Kyoto Encyclopedia of Genes and Genomes; https://www.kegg.jp/ or https://www.genome.jp/kegg/) is a reference knowledge base for biological interpretation of genome sequences and other high-throughput data. It is an integrated database consisting of three generic categories of systems information, genomic information and chemical information, and an additional human-specific category of health information. KEGG pathway maps, BRITE hierarchies and KEGG modules have been developed as generic molecular networks with KEGG Orthology nodes of functional orthologs so that KEGG pathway mapping and other procedures can be applied to any cellular organism. Unfortunately, however, this generic approach was inadequate for knowledge representation in the health information category, where variations of human genomes, especially disease-related variations, had to be considered. Thus, we have introduced a new approach where human gene variants are explicitly incorporated into what we call 'network variants' in the recently released KEGG NETWORK database. This allows accumulation of knowledge about disease-related perturbed molecular networks caused not only by gene variants, but also by viruses and other pathogens, environmental factors and drugs. We expect that KEGG NETWORK will become another reference knowledge base for the basic understanding of disease mechanisms and practical use in clinical sequencing and drug development.

...read moreread less

Journal Article•DOI•

SwissTargetPrediction: updated data and new features for efficient prediction of protein targets of small molecules

[...]

Antoine Daina¹, Olivier Michielin¹, Olivier Michielin², Vincent Zoete¹, Vincent Zoete³ - Show less +1 more•Institutions (3)

Swiss Institute of Bioinformatics¹, University Hospital of Lausanne², University of Lausanne³

02 Jul 2019-Nucleic Acids Research

TL;DR: The 2019 version of SwissTargetPrediction is described, which represents a major update in terms of underlying data, backend and web interface, and high levels of predictive performance were maintained despite more extended biological and chemical spaces to be explored.

...read moreread less

Abstract: SwissTargetPrediction is a web tool, on-line since 2014, that aims to predict the most probable protein targets of small molecules. Predictions are based on the similarity principle, through reverse screening. Here, we describe the 2019 version, which represents a major update in terms of underlying data, backend and web interface. The bioactivity data were updated, the model retrained and similarity thresholds redefined. In the new version, the predictions are performed by searching for similar molecules, in 2D and 3D, within a larger collection of 376 342 compounds known to be experimentally active on an extended set of 3068 macromolecular targets. An efficient backend implementation allows to speed up the process that returns results for a druglike molecule on human proteins in 15-20 s. The refreshed web interface enhances user experience with new features for easy input and improved analysis. Interoperability capacity enables straightforward submission of any input or output molecule to other on-line computer-aided drug design tools, developed by the SIB Swiss Institute of Bioinformatics. High levels of predictive performance were maintained despite more extended biological and chemical spaces to be explored, e.g. achieving at least one correct human target in the top 15 predictions for >70% of external compounds. The new SwissTargetPrediction is available free of charge (www.swisstargetprediction.ch).

...read moreread less

Journal Article•DOI•

JASPAR 2020: update of the open-access database of transcription factor binding profiles

[...]

Oriol Fornes¹, Jaime A. Castro-Mondragon², Aziz Khan², Robin van der Lee¹, Xi Zhang¹, Phillip A. Richmond¹, Bhavi P. Modi¹, Solenne Correard¹, Marius Gheorghe², Damir Baranasic³, Walter Santana-Garcia⁴, Ge Tan⁵, Jeanne Chèneby⁶, Benoit Ballester⁶, François Parcy⁷, Albin Sandelin⁸, Boris Lenhard⁹, Boris Lenhard³, Wyeth W. Wasserman¹, Anthony Mathelier¹⁰, Anthony Mathelier² - Show less +17 more•Institutions (10)

University of British Columbia¹, University of Oslo², Imperial College London³, École Normale Supérieure⁴, ETH Zurich⁵, Aix-Marseille University⁶, University of Grenoble⁷, University of Copenhagen⁸, University of Bergen⁹, Oslo University Hospital¹⁰

08 Nov 2019-Nucleic Acids Research

TL;DR: In this 8th release of JASPAR, the CORE collection has been expanded with 245 new PFMs, and 156 PFMs were updated, and the genomic tracks, inference tool, and TF-binding profile similarity clusters were updated.

...read moreread less

Abstract: JASPAR (http://jaspar.genereg.net) is an open-access database of curated, non-redundant transcription factor (TF)-binding profiles stored as position frequency matrices (PFMs) for TFs across multiple species in six taxonomic groups. In this 8th release of JASPAR, the CORE collection has been expanded with 245 new PFMs (169 for vertebrates, 42 for plants, 17 for nematodes, 10 for insects, and 7 for fungi), and 156 PFMs were updated (125 for vertebrates, 28 for plants and 3 for insects). These new profiles represent an 18% expansion compared to the previous release. JASPAR 2020 comes with a novel collection of unvalidated TF-binding profiles for which our curators did not find orthogonal supporting evidence in the literature. This collection has a dedicated web form to engage the community in the curation of unvalidated TF-binding profiles. Moreover, we created a Q&A forum to ease the communication between the user community and JASPAR curators. Finally, we updated the genomic tracks, inference tool, and TF-binding profile similarity clusters. All the data is available through the JASPAR website, its associated RESTful API, and through the JASPAR2020 R/Bioconductor package.

...read moreread less

Journal Article•DOI•

The DisGeNET knowledge platform for disease genomics: 2019 update.

[...]

Janet Piñero¹, Juan Manuel Ramírez-Anguita¹, Josep Saüch-Pitarch¹, Francesco Ronzano¹, Emilio Centeno¹, Ferran Sanz¹, Laura I. Furlong¹ - Show less +3 more•Institutions (1)

Pompeu Fabra University¹

04 Nov 2019-Nucleic Acids Research

TL;DR: The DisGeNET platform, a knowledge management platform integrating and standardizing data about disease associated genes and variants from multiple sources, is an interoperable resource supporting a variety of applications in genomic medicine and drug R&D.

...read moreread less

Abstract: One of the most pressing challenges in genomic medicine is to understand the role played by genetic variation in health and disease. Thanks to the exploration of genomic variants at large scale, hundreds of thousands of disease-associated loci have been uncovered. However, the identification of variants of clinical relevance is a significant challenge that requires comprehensive interrogation of previous knowledge and linkage to new experimental results. To assist in this complex task, we created DisGeNET (http://www.disgenet.org/), a knowledge management platform integrating and standardizing data about disease associated genes and variants from multiple sources, including the scientific literature. DisGeNET covers the full spectrum of human diseases as well as normal and abnormal traits. The current release covers more than 24 000 diseases and traits, 17 000 genes and 117 000 genomic variants. The latest developments of DisGeNET include new sources of data, novel data attributes and prioritization metrics, a redesigned web interface and recently launched APIs. Thanks to the data standardization, the combination of expert curated information with data automatically mined from the scientific literature, and a suite of tools for accessing its publicly available data, DisGeNET is an interoperable resource supporting a variety of applications in genomic medicine and drug R&D.

...read moreread less

Journal Article•DOI•

InterPro in 2019: improving coverage, classification and access to protein sequence annotations.

[...]

Alex L. Mitchell¹, Teresa K. Attwood², Patricia C. Babbitt³, Matthias Blum¹, Peer Bork, Alan Bridge⁴, Shoshana D. Brown³, Hsin-Yu Chang¹, Sara El-Gebali¹, Matthew Fraser¹, Julian Gough⁵, David R. Haft⁶, Hongzhan Huang⁷, Ivica Letunic, Rodrigo Lopez¹, Aurelien Luciani¹, Fábio Madeira¹, Aron Marchler-Bauer⁸, Huaiyu Mi⁹, Darren A. Natale¹⁰, Marco Necci¹¹, Marco Necci¹², Gift Nuka¹, Christine A. Orengo¹³, Arun Prasad Pandurangan⁵, Typhaine Paysan-Lafosse¹, Sebastien Pesseat¹, Simon C. Potter¹, Matloob Qureshi¹, Neil D. Rawlings¹, Nicole Redaschi⁴, Lorna Richardson¹, Catherine Rivoire⁴, Gustavo A. Salazar¹, Amaia Sangrador-Vegas¹, Christian J. A. Sigrist⁴, Ian Sillitoe¹³, Granger G. Sutton⁶, Narmada Thanki⁸, Paul Thomas⁹, Silvio C. E. Tosatto¹¹, Siew-Yit Yong¹, Robert D. Finn¹ - Show less +39 more•Institutions (13)

European Bioinformatics Institute¹, University of Manchester², University of California, San Francisco³, Swiss Institute of Bioinformatics⁴, Laboratory of Molecular Biology⁵, J. Craig Venter Institute⁶, University of Delaware⁷, National Institutes of Health⁸, University of Southern California⁹, Georgetown University Medical Center¹⁰, University of Padua¹¹, University of Udine¹², University College London¹³

08 Jan 2019-Nucleic Acids Research

TL;DR: Recent developments with InterPro (version 70.0) and its associated software are reported, including an 18% growth in the size of the database in terms on new InterPro entries, updates to content, the inclusion of an additional entry type, refined modelling of discontinuous domains, and the development of a new programmatic interface and website.

...read moreread less

Abstract: The InterPro database (http://www.ebi.ac.uk/interpro/) classifies protein sequences into families and predicts the presence of functionally important domains and sites. Here, we report recent developments with InterPro (version 70.0) and its associated software, including an 18% growth in the size of the database in terms on new InterPro entries, updates to content, the inclusion of an additional entry type, refined modelling of discontinuous domains, and the development of a new programmatic interface and website. These developments extend and enrich the information provided by InterPro, and provide greater flexibility in terms of data access. We also show that InterPro's sequence coverage has kept pace with the growth of UniProtKB, and discuss how our evaluation of residue coverage may help guide future curation activities.

...read moreread less

Journal Article•DOI•

The Immune Epitope Database (IEDB): 2018 update.

[...]

Randi Vita¹, Swapnil Mahajan¹, James A. Overton, Sandeep Kumar Dhanda¹, Sheridan Martini¹, Jason R. Cantrell², Daniel K. Wheeler², Alessandro Sette¹, Alessandro Sette³, Bjoern Peters¹, Bjoern Peters³ - Show less +7 more•Institutions (3)

La Jolla Institute for Allergy and Immunology¹, Leidos², University of California, San Diego³

08 Jan 2019-Nucleic Acids Research

TL;DR: The recent focus of the IEDB has been improved query and reporting functionality to meet the needs of users to access and summarize data that continues to grow in quantity and complexity.

...read moreread less

Abstract: The Immune Epitope Database (IEDB, iedb.org) captures experimental data confined in figures, text and tables of the scientific literature, making it freely available and easily searchable to the public. The scope of the IEDB extends across immune epitope data related to all species studied and includes antibody, T cell, and MHC binding contexts associated with infectious, allergic, autoimmune, and transplant related diseases. Having been publicly accessible for >10 years, the recent focus of the IEDB has been improved query and reporting functionality to meet the needs of our users to access and summarize data that continues to grow in quantity and complexity. Here we present an update on our current efforts and future goals.

...read moreread less

Journal Article•DOI•

The BioGRID interaction database: 2019 update

[...]

Rose Oughtred¹, Chris Stark², Bobby-Joe Breitkreutz², Jennifer M. Rust¹, Lorrie Boucher², Christie S. Chang¹, Nadine Kolas², Lara O'Donnell², Genie Leung², Rochelle McAdam, Frederick Zhang, Sonam Dolma, Andrew Willems², Jasmin Coulombe-Huntington³, Andrew Chatr-aryamontri³, Kara Dolinski¹, Mike Tyers², Mike Tyers³ - Show less +14 more•Institutions (3)

Princeton University¹, Lunenfeld-Tanenbaum Research Institute², Université de Montréal³

08 Jan 2019-Nucleic Acids Research

TL;DR: A new dedicated aspect of BioGRID annotates genome-wide CRISPR/Cas9-based screens that report gene–phenotype and gene–gene relationships, and captures chemical interaction data, including chemical–protein interactions for human drug targets drawn from the DrugBank database and manually curated bioactive compounds reported in the literature.

...read moreread less

Abstract: The Biological General Repository for Interaction Datasets (BioGRID: https://thebiogrid.org) is an open access database dedicated to the curation and archival storage of protein, genetic and chemical interactions for all major model organism species and humans. As of September 2018 (build 3.4.164), BioGRID contains records for 1 598 688 biological interactions manually annotated from 55 809 publications for 71 species, as classified by an updated set of controlled vocabularies for experimental detection methods. BioGRID also houses records for >700 000 post-translational modification sites. BioGRID now captures chemical interaction data, including chemical-protein interactions for human drug targets drawn from the DrugBank database and manually curated bioactive compounds reported in the literature. A new dedicated aspect of BioGRID annotates genome-wide CRISPR/Cas9-based screens that report gene-phenotype and gene-gene relationships. An extension of the BioGRID resource called the Open Repository for CRISPR Screens (ORCS) database (https://orcs.thebiogrid.org) currently contains over 500 genome-wide screens carried out in human or mouse cell lines. All data in BioGRID is made freely available without restriction, is directly downloadable in standard formats and can be readily incorporated into existing applications via our web service platforms. BioGRID data are also freely distributed through partner model organism databases and meta-databases.

...read moreread less

Journal Article•DOI•

IPD-IMGT/HLA Database.

[...]

James Robinson¹, James Robinson², Dominic J. Barker², Xenia Georgiou², Michael A Cooper², Paul Flicek³, Steven G.E. Marsh¹, Steven G.E. Marsh² - Show less +4 more•Institutions (3)

University College London¹, Anthony Nolan², European Bioinformatics Institute³

31 Oct 2019-Nucleic Acids Research

TL;DR: The challenge for the IPD-IMGT/HLA Database is to continue to provide a highly curated database of sequence variants, while supporting the increased number of submissions and complexity of sequences.

...read moreread less

Abstract: The IPD-IMGT/HLA Database, http://www.ebi.ac.uk/ipd/imgt/hla/, currently contains over 25 000 allele sequence for 45 genes, which are located within the Major Histocompatibility Complex (MHC) of the human genome. This region is the most polymorphic region of the human genome, and the levels of polymorphism seen exceed most other genes. Some of the genes have several thousand variants and are now termed hyperpolymorphic, rather than just simply polymorphic. The IPD-IMGT/HLA Database has provided a stable, highly accessible, user-friendly repository for this information, providing the scientific and medical community access to the many variant sequences of this gene system, that are critical for the successful outcome of transplantation. The number of currently known variants, and dramatic increase in the number of new variants being identified has necessitated a dedicated resource with custom tools for curation and publication. The challenge for the database is to continue to provide a highly curated database of sequence variants, while supporting the increased number of submissions and complexity of sequences. In order to do this, traditional methods of accessing and presenting data will be challenged, and new methods will need to be utilized to keep pace with new discoveries.

...read moreread less

Collapse