Ensembl Genomes 2018: an integrated omics infrastructure for non-vertebrate species.
Paul J. Kersey,James E. Allen,Alexis Allot,Matthieu Barba,Sanjay Boddu,Bruce J. Bolt,Denise Carvalho-Silva,Mikkel B. Christensen,Paul Davis,Christoph Grabmueller,Navin Kumar,Zicheng Liu,Thomas Maurel,Ben Moore,Mark D. McDowall,Uma Maheswari,Guy Naamati,Victoria L. Newman,Chuang Kee Ong,Michael Paulini,Helder Pedro,Emily Perry,Matthew Russell,Helen Sparrow,Electra Tapanari,Kieron Taylor,Alessandro Vullo,Gareth Williams,Amonida Zadissia,Andrew Olson,Joshua C. Stein,Sharon Wei,Marcela K. Tello-Ruiz,Doreen Ware,Doreen Ware,Aurelien Luciani,Simon C. Potter,Robert D. Finn,Martin Urban,Kim E. Hammond-Kosack,Dan Bolser,Nishadi De Silva,Kevin L. Howe,Nicholas Langridge,Gareth Maslen,Daniel M. Staines,Andrew D. Yates +46 more
TLDR
This paper provides an update to the previous publications about the Ensembl Genomes resource, with a focus on recent developments and expansions, including the incorporation of almost 20 000 additional genome sequences and over 35 000 tracks of RNA-Seq data.Abstract:
Ensembl Genomes (http://www.ensemblgenomes.org) is an integrating resource for genome-scale data from non-vertebrate species, complementing the resources for vertebrate genomics developed in the Ensembl project (http://www.ensembl.org). Together, the two resources provide a consistent set of programmatic and interactive interfaces to a rich range of data including genome sequence, gene models, transcript sequence, genetic variation, and comparative analysis. This paper provides an update to the previous publications about the resource, with a focus on recent developments and expansions. These include the incorporation of almost 20 000 additional genome sequences and over 35 000 tracks of RNA-Seq data, which have been aligned to genomic sequence and made available for visualization. Other advances since 2015 include the release of the database in Resource Description Framework (RDF) format, a large increase in community-derived curation, a new high-performance protein sequence search, additional cross-references, improved annotation of non-protein-coding genes, and the launch of pre-release and archival sites. Collectively, these changes are part of a continuing response to the increasing quantity of publicly-available genome-scale data, and the consequent need to archive, integrate, annotate and disseminate these using automated, scalable methods.read more
Citations
More filters
Journal ArticleDOI
The EMBL-EBI search and sequence analysis tools APIs in 2019
Fábio Madeira,Youngmi Park,Joon Lee,Nicola Buso,Tamer Gur,Nandana Madhusoodanan,Prasad Basutkar,Adrian R N Tivey,Simon C. Potter,Robert D. Finn,Rodrigo Lopez +10 more
TL;DR: The latest improvements made to the frameworks which enhance the interconnectivity between public EMBL-EBI resources and ultimately enhance biological data discoverability, accessibility, interoperability and reusability are described.
Journal ArticleDOI
HMMER web server: 2018 update.
TL;DR: To help users explore the biological context of their results, and to discover new data resources, search results are now supplemented with cross references to other EMBL-EBI databases.
Journal ArticleDOI
The gasdermins, a protein family executing cell death and inflammation
TL;DR: This Review provides a comprehensive overview of the gasdermin family, the mechanisms that control their activation and their role in inflammatory disorders and cancer.
Journal ArticleDOI
Ensembl variation resources
Sarah E. Hunt,William M. McLaren,Laurent Gil,Anja Thormann,Helen Schuilenburg,Daniel Sheppard,Andrew Parton,Irina M. Armean,Stephen J. Trevanion,Paul Flicek,Fiona Cunningham +10 more
TL;DR: This work develops methods to facilitate data integration and broad access; aggregate information in a consistent manner and make it available a variety of standard formats; build analysis pipelines to compare variants to comprehensive genomic annotation sets; and make all tools and data publicly available.
Journal ArticleDOI
Genenames.org: the HGNC and VGNC resources in 2019.
Bryony Braschi,Paul Denny,Kristian Gray,Tamsin E. M. Jones,Ruth L. Seal,Susan Tweedie,Bethan Yates,Elspeth A. Bruford +7 more
TL;DR: An overview of the HUGO Gene Nomenclature Committee's online data and resources is provided, focusing on the work over the last two years.
References
More filters
Journal ArticleDOI
Basic Local Alignment Search Tool
TL;DR: A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score.
Journal ArticleDOI
tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence.
Todd M. Lowe,Sean R. Eddy +1 more
TL;DR: A program is described, tRNAscan-SE, which identifies 99-100% of transfer RNA genes in DNA sequence while giving less than one false positive per 15 gigabases.
Journal ArticleDOI
UniProt: the Universal Protein knowledgebase
Rolf Apweiler,Amos Marc Bairoch,Cathy H. Wu,Winona C. Barker,Brigitte Boeckmann,Serenella Ferro,Elisabeth Gasteiger,Hongzhan Huang,Rodrigo Lopez,Michele Magrane,Maria Jesus Martin,Darren A. Natale,Claire O'Donovan,Nicole Redaschi,Lai-Su L. Yeh +14 more
TL;DR: The Swiss-Prot, TrEMBL and PIR protein database activities have united to form the Universal Protein Knowledgebase (UniProt), which is to provide a comprehensive, fully classified, richly and accurately annotated protein sequence knowledgebase, with extensive cross-references and query interfaces.
Journal ArticleDOI
InterProScan 5: genome-scale protein function classification
Philip Jones,David Binns,Hsin-Yu Chang,Matthew Fraser,Weizhong Li,Craig McAnulla,Hamish McWilliam,John Maslen,Alex L. Mitchell,Gift Nuka,Sebastien Pesseat,Antony F. Quinn,Amaia Sangrador-Vegas,Maxim Scheremetjew,Siew-Yit Yong,Rodrigo Lopez,Sarah Hunter +16 more
TL;DR: A new Java-based architecture for the widely used protein function prediction software package InterProScan is described, resulting in a flexible and stable system that is able to use both multiprocessor machines and/or conventional clusters to achieve scalable distributed data analysis.
Book
Accelerated Profile HMM Searches
TL;DR: An acceleration heuristic for profile HMMs, the “multiple segment Viterbi” (MSV) algorithm, which computes an optimal sum of multiple ungapped local alignment segments using a striped vector-parallel approach previously described for fast Smith/Waterman alignment.