Showing papers in &quot;Nucleic Acids Research in 2022&quot;

Search and sequence analysis tools services from EMBL-EBI in 2022

[...]

23 Mar 2022-Nucleic Acids Research

TL;DR: The DAVID Gene system as discussed by the authors was rebuilt to gain coverage of more organisms, which increased the taxonomy coverage from 17 399 to 55 464, and the number of gene-term records for most annotation types within the updated knowledgebase have significantly increased.

...read moreread less

Abstract: Abstract DAVID is a popular bioinformatics resource system including a web server and web service for functional annotation and enrichment analyses of gene lists. It consists of a comprehensive knowledgebase and a set of functional analysis tools. Here, we report all updates made in 2021. The DAVID Gene system was rebuilt to gain coverage of more organisms, which increased the taxonomy coverage from 17 399 to 55 464. All existing annotation types have been updated, if available, based on the new DAVID Gene system. Compared with the last version, the number of gene-term records for most annotation types within the updated Knowledgebase have significantly increased. Moreover, we have incorporated new annotations in the Knowledgebase including small molecule-gene interactions from PubChem, drug-gene interactions from DrugBank, tissue expression information from the Human Protein Atlas, disease information from DisGeNET, and pathways from WikiPathways and PathBank. Eight of ten subgroups split from Uniprot Keyword annotation were assigned to specific types. Finally, we added a species parameter for uploading a list of gene symbols to minimize the ambiguity between species, which increases the efficiency of the list upload and eliminates confusion for users. These current updates have significantly expanded the Knowledgebase and enhanced the discovery power of DAVID.

...read moreread less

797 citations

Journal Article•DOI•

[...]

Fábio Madeira, Matt Pearce, Adrian Tivey, Prasad Basutkar, Joon Seung Lee, Ossama Edbali, Nandana Madhusoodanan, A. Kolesnikov, Rodrigo Lopez - Show less +5 more

12 Apr 2022-Nucleic Acids Research

TL;DR: Recent improvements to EBI Search and Job Dispatcher tools frameworks are described and updates made to accommodate the increasing data requirements during the COVID-19 pandemic are described.

...read moreread less

Abstract: Abstract The EMBL-EBI search and sequence analysis tools frameworks provide integrated access to EMBL-EBI’s data resources and core bioinformatics analytical tools. EBI Search (https://www.ebi.ac.uk/ebisearch) provides a full-text search engine across nearly 5 billion entries, while the Job Dispatcher tools framework (https://www.ebi.ac.uk/services) enables the scientific community to perform a diverse range of sequence analysis using popular bioinformatics applications. Both allow users to interact through user-friendly web applications, as well as via RESTful and SOAP-based APIs. Here, we describe recent improvements to these services and updates made to accommodate the increasing data requirements during the COVID-19 pandemic.

...read moreread less

540 citations

Journal Article•DOI•

KEGG for taxonomy-based analysis of pathways and genomes

[...]

Minoru Kanehisa, Miho Furumichi, Yoko Sato, Masayuki Kawashima, Mari Ishiguro-Watanabe - Show less +1 more

27 Oct 2022-Nucleic Acids Research

TL;DR: An increasing number of eukaryotic genomes have been included in KEGG for better representation of organisms in the taxonomic tree, and the Brite hierarchy viewer is used for taxonomy mapping.

...read moreread less

Abstract: Abstract KEGG (https://www.kegg.jp) is a manually curated database resource integrating various biological objects categorized into systems, genomic, chemical and health information. Each object (database entry) is identified by the KEGG identifier (kid), which generally takes the form of a prefix followed by a five-digit number, and can be retrieved by appending /entry/kid in the URL. The KEGG pathway map viewer, the Brite hierarchy viewer and the newly released KEGG genome browser can be launched by appending /pathway/kid, /brite/kid and /genome/kid, respectively, in the URL. Together with an improved annotation procedure for KO (KEGG Orthology) assignment, an increasing number of eukaryotic genomes have been included in KEGG for better representation of organisms in the taxonomic tree. Multiple taxonomy files are generated for classification of KEGG organisms and viruses, and the Brite hierarchy viewer is used for taxonomy mapping, a variant of Brite mapping in the new KEGG Mapper suite. The taxonomy mapping enables analysis of, for example, how functional links of genes in the pathway and physical links of genes on the chromosome are conserved among organism groups.

...read moreread less

520 citations

Journal Article•DOI•

UniProt: the Universal Protein Knowledgebase in 2023

[...]

12 Apr 2022-Nucleic Acids Research

TL;DR: The EMBL-EBI search and sequence analysis tools frameworks as discussed by the authors provide integrated access to EMBL EBI's data resources and core bioinformatics analytical tools, allowing users to interact through user-friendly web applications, as well as via RESTful and SOAP-based APIs.

...read moreread less

Abstract: The EMBL-EBI search and sequence analysis tools frameworks provide integrated access to EMBL-EBI's data resources and core bioinformatics analytical tools. EBI Search (https://www.ebi.ac.uk/ebisearch) provides a full-text search engine across nearly 5 billion entries, while the Job Dispatcher tools framework (https://www.ebi.ac.uk/services) enables the scientific community to perform a diverse range of sequence analysis using popular bioinformatics applications. Both allow users to interact through user-friendly web applications, as well as via RESTful and SOAP-based APIs. Here, we describe recent improvements to these services and updates made to accommodate the increasing data requirements during the COVID-19 pandemic.

...read moreread less

497 citations

Journal Article•DOI•

[...]

Alex Bateman, Maria Jesus Martin, Sandra Orchard, Michele Magrane, Shadab Ahmad, Emanuele Alpi, Emily H Bowler-Barnett, Ramona Britto, Hema Bye-a-Jee, Austra Cukura, P. Denny, Tunca Doğan, ThankGod Ebenezer, Jun Fan, Penelope Garmiri, Leonardo Jose da Costa Gonzales, Emma Hatton-Ellis, Abdulrahman Hussein, Alexandr Ignatchenko, Giuseppe Insana, Rizwan Ishtiaq, Vishal Joshi, Dushyanth Jyothi, Swaathi Kandasaamy, Antonia Lock, Aurelien Luciani, Marija Lugarić, Jie Luo, Y. Lussi, Alistair MacDougall, Fábio Madeira, Mahdi Mahmoudy, Alok Mishra, Katie Moulang, Andrew Nightingale, Sangya Pundir, Guoying Qi, Shri K. Raman Raj, Pedro Duarte da Silva Fonseca GÃ¢ndara Raposo, Daniel Rice, Rabie Saidi, Rafael Santos, Elena Speretta, James Stephenson, Prabhat Totoo, Edward Turner, N. Tyagi, Preethi Vasudev, Kate Warner, Xavier Watkins, Rossana Zaru, Hermann Zellner, Alan Bridge, Lucila Aimo, Ghislaine Argoud-Puy, Andrea H. Auchincloss, Kristian B. Axelsen, Parit Bansal, Delphine Baratin, Teresa M Batista Neto, Marie-Claude Blatter, Jerven Bolleman, Emmanuel Boutet, Lionel Breuza, B. Gil, C. Casals-Casas, Kamal Chikh Echioukh, Elisabeth Coudert, Béatrice A. Cuche, Edouard de Castro, Anne Estreicher, Maria Livia Famiglietti, Marc Feuermann, Elisabeth Gasteiger, Pascale Gaudet, Sebastien Gehant, Vivienne Baillie Gerritsen, Arnaud Gos, Nadine M. Gruaz, Chantal Hulo, Nevila Hyka-Nouspikel, Florence Jungo, Arnaud Kerhornou, Philippe Le Mercier, Damien Lieberherr, Patrick Masson, Anne Morgat, Venkatesh Muthukrishnan, Salvo Paesano, Ivo Pedruzzi, Sandrine Pilbout, Lucille Pourcel, Sylvain Poux, Monica Pozzato, Manuela Pruess, Nicole Redaschi, Catherine Rivoire, Christian J. A. Sigrist, K Sonesson, Shyamala Sundaram, Cathy H. Wu, Cecilia N. Arighi, Leslie Arminski, Chuming Chen, Yongxing Chen, Hongzhan Huang, Kati Laiho, Peter B. McGarvey, Darren A. Natale, Karen F. Ross, C. R. Vinayaka, Qinghua Wang, Yuqi Wang, Jian Zhang - Show less +110 more

21 Nov 2022-Nucleic Acids Research

TL;DR:

...read moreread less

Abstract: Abstract The aim of the UniProt Knowledgebase is to provide users with a comprehensive, high-quality and freely accessible set of protein sequences annotated with functional information. In this publication we describe enhancements made to our data processing pipeline and to our website to adapt to an ever-increasing information content. The number of sequences in UniProtKB has risen to over 227 million and we are working towards including a reference proteome for each taxonomic group. We continue to extract detailed annotations from the literature to update or create reviewed entries, while unreviewed entries are supplemented with annotations provided by automated systems using a variety of machine-learning techniques. In addition, the scientific community continues their contributions of publications and annotations to UniProt entries of their interest. Finally, we describe our new website (https://www.uniprot.org/), designed to enhance our users’ experience and make our data easily accessible to the research community. This interface includes access to AlphaFold structures for more than 85% of all entries as well as improved visualisations for subcellular localisation of proteins.

...read moreread less

332 citations

Journal Article•DOI•

Dali server: structural unification of protein families

[...]

Liisa Holm

TL;DR: Two most recent upgrades to the Dali server for 3D protein structure comparison are reported: the foldomes of key organisms in the AlphaFold Database (version 1) are searchable by Dali, and structural alignments are annotated with protein families.

...read moreread less

Abstract: Abstract Protein structure is key to understanding biological function. Structure comparison deciphers deep phylogenies, providing insight into functional conservation and functional shifts during evolution. Until recently, structural coverage of the protein universe was limited by the cost and labour involved in experimental structure determination. Recent breakthroughs in deep learning revolutionized structural bioinformatics by providing accurate structural models of numerous protein families for which no structural information existed. The Dali server for 3D protein structure comparison is widely used by crystallographers to relate new structures to pre-existing ones. Here, we report two most recent upgrades to the web server: (i) the foldomes of key organisms in the AlphaFold Database (version 1) are searchable by Dali, (ii) structural alignments are annotated with protein families. Using these new features, we discovered a novel functionally diverse subgroup within the WRKY/GCM1 clan. This was accomplished by linking the structurally characterized SWI/SNF and NAM families as well as the structural models of the CG-1 family and uncharacterized proteins to the structure of Gti1/Pac2, a previously known member of the WRKY/GCM1 clan. The Dali server is available at http://ekhidna2.biocenter.helsinki.fi/dali. This website is free and open to all users and there is no login requirement.

...read moreread less

209 citations

Journal Article•DOI•

[...]

TL;DR: The Dali server as discussed by the authors provides structural coverage of the protein universe by linking the structurally characterized SWI/SNF and NAM families as well as structural models of the CG-1 family and uncharacterized proteins to the structure of Gti1/Pac2, a previously known member of the WRKY/GCM1 clan.

...read moreread less

Abstract: Protein structure is key to understanding biological function. Structure comparison deciphers deep phylogenies, providing insight into functional conservation and functional shifts during evolution. Until recently, structural coverage of the protein universe was limited by the cost and labour involved in experimental structure determination. Recent breakthroughs in deep learning revolutionized structural bioinformatics by providing accurate structural models of numerous protein families for which no structural information existed. The Dali server for 3D protein structure comparison is widely used by crystallographers to relate new structures to pre-existing ones. Here, we report two most recent upgrades to the web server: (i) the foldomes of key organisms in the AlphaFold Database (version 1) are searchable by Dali, (ii) structural alignments are annotated with protein families. Using these new features, we discovered a novel functionally diverse subgroup within the WRKY/GCM1 clan. This was accomplished by linking the structurally characterized SWI/SNF and NAM families as well as the structural models of the CG-1 family and uncharacterized proteins to the structure of Gti1/Pac2, a previously known member of the WRKY/GCM1 clan. The Dali server is available at http://ekhidna2.biocenter.helsinki.fi/dali. This website is free and open to all users and there is no login requirement.

...read moreread less

200 citations

Journal Article•DOI•

InterPro in 2022

[...]

The IPD-IMGT/HLA Database

TL;DR: The InterPro database as discussed by the authors provides an integrative classification of protein sequences into families, and identifies functionally important domains and conserved sites, and provides a more user friendly access to the data.

...read moreread less

Abstract: Abstract The InterPro database (https://www.ebi.ac.uk/interpro/) provides an integrative classification of protein sequences into families, and identifies functionally important domains and conserved sites. Here, we report recent developments with InterPro (version 90.0) and its associated software, including updates to data content and to the website. These developments extend and enrich the information provided by InterPro, and provide a more user friendly access to the data. Additionally, we have worked on adding Pfam website features to the InterPro website, as the Pfam website will be retired in late 2022. We also show that InterPro's sequence coverage has kept pace with the growth of UniProtKB. Moreover, we report the development of a card game as a method of engaging the non-scientific community. Finally, we discuss the benefits and challenges brought by the use of artificial intelligence for protein structure prediction.

...read moreread less

172 citations

Journal Article•DOI•

[...]

Dominic J. Barker, Giuseppe Maccari, Xenia Georgiou, Michael A Cooper, Paul Flicek, James Robinson, Steven G.E. Marsh - Show less +3 more

The STRING database in 2023: protein–protein association networks and functional enrichment analyses for any sequenced genome of interest

TL;DR: The IPD-IMGT/HLA database as mentioned in this paper provides a stable and user-friendly repository of highly curated HLA sequences, which includes over 35 000 alleles of the human Major Histocompatibility Complex (MHC).

...read moreread less

Abstract: Abstract It is 24 years since the IPD-IMGT/HLA Database, http://www.ebi.ac.uk/ipd/imgt/hla/, was first released, providing the HLA community with a searchable repository of highly curated HLA sequences. The database now contains over 35 000 alleles of the human Major Histocompatibility Complex (MHC) named by the WHO Nomenclature Committee for Factors of the HLA System. This complex contains the most polymorphic genes in the human genome and is now considered hyperpolymorphic. The IPD-IMGT/HLA Database provides a stable and user-friendly repository for this information. Uptake of Next Generation Sequencing technology in recent years has driven an increase in the number of alleles and the length of sequences submitted. As the size of the database has grown the traditional methods of accessing and presenting this data have been challenged, in response, we have developed a suite of tools providing an enhanced user experience to our traditional web-based users while creating new programmatic access for our bioinformatics user base. This suite of tools is powered by the IPD-API, an Application Programming Interface (API), providing scalable and flexible access to the database. The IPD-API provides a stable platform for our future development allowing us to meet the future challenges of the HLA field and needs of the community.

...read moreread less

154 citations

Journal Article•DOI•

[...]

Damian Szklarczyk, Rebecca Kirsch, Mikaela Koutrouli, Katerina C. Nastou, Farrokh Mehryary, Radja Hachilif, Annika L. Gable, Tao Fang, Nadezhda Tsankova Doncheva, Sampo Pyysalo, Peer Bork, Lars Juhl Jensen, Christian von Mering - Show less +9 more

12 Nov 2022-Nucleic Acids Research

TL;DR: STRING as mentioned in this paper collects and integrates protein-protein interactions, both physical interactions as well as functional associations, from a number of sources: automated text mining of the scientific literature, computational interaction predictions from co-expression, conserved genomic context, databases of interaction experiments and known complexes/pathways from curated sources.

...read moreread less

Abstract: Abstract Much of the complexity within cells arises from functional and regulatory interactions among proteins. The core of these interactions is increasingly known, but novel interactions continue to be discovered, and the information remains scattered across different database resources, experimental modalities and levels of mechanistic detail. The STRING database (https://string-db.org/) systematically collects and integrates protein–protein interactions—both physical interactions as well as functional associations. The data originate from a number of sources: automated text mining of the scientific literature, computational interaction predictions from co-expression, conserved genomic context, databases of interaction experiments and known complexes/pathways from curated sources. All of these interactions are critically assessed, scored, and subsequently automatically transferred to less well-studied organisms using hierarchical orthology information. The data can be accessed via the website, but also programmatically and via bulk downloads. The most recent developments in STRING (version 12.0) are: (i) it is now possible to create, browse and analyze a full interaction network for any novel genome of interest, by submitting its complement of encoded proteins, (ii) the co-expression channel now uses variational auto-encoders to predict interactions, and it covers two new sources, single-cell RNA-seq and experimental proteomics data and (iii) the confidence in each experimentally derived interaction is now estimated based on the detection method used, and communicated to the user in the web-interface. Furthermore, STRING continues to enhance its facilities for functional enrichment analysis, which are now fully available also for user-submitted genomes.

...read moreread less

Journal Article•DOI•

The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2022 update

[...]

Enis Afgan, Anton Nekrutenko, Björn Grüning, Daniel Blankenberg, Jeremy Goecks, Michael C. Schatz, Alexander E. Ostrovsky, Alexandru Mahmoud, Andrew Lonie, Anna Syme, Anne Fouilloux, Anthony Bretaudeau, Anup Kumar, Arthur C. Eschenlauer, Assunta D. Desanto, Aysam Guerler, Beatriz Serrano-Solano, Bérénice Batut, Bradley W. Langhorst, Bridget Carr, Bryan Raubenolt, Cameron J. Hyde, Catherine J. Bromhead, Christopher B. Barnett, Coline Royaux, Cristóbal L. García Gallardo, Daniel Fornika, Dannon Baker, Dave Bouvier, Dave Clements, David A. de Lima Morais, David Lopez Tabernero, Delphine Larivière, E. Nasr, Federico Zambelli, Florian Heyl, Fotis Psomopoulos, Frederik Coppens, Gareth Price, Gianmauro Cuccuru, Gildas Le Corguillé, Gregory Von Kuster, Gulsum Gudukbay, Helena Rasche, Hans-Rudolf Hotz, Ignacio Eguinoa, Igor V. Makunin, Isuru Ranawaka, James Taylor, Jayadev Joshi, Jennifer Hillman-Jackson, John Chilton, Kaivan Kamali, Keith Suderman, Krzysztof Poterlowicz, Yvan Le Bras, Lucille Lopez-Delisle, Luke Sargent, Madeline E. Bassetti, M. A. Tangaro, Marius Van Den Beek, Martin Čech, Matthias Bernt, Matthias Fahrner, Mehmet Tekman, Melanie Föll, Michael R. Crusoe, Miguel Angel Roncoroni, N. K. Kucher, Nathaniel Coraor, Nicholas Stoler, Nick Rhodes, Nicola Soranzo, Niko Pinter, Nuwan Goonasekera, Pablo Moreno, Pavankumar Videm, Petera Melanie, Pietro Mandreoli, Pratik D. Jagtap, Qiang Gu, Ralf J. M. Weber, Ross Lazarus, Ruben H.P. Vorderman, Saskia Hiltemann, Sergey Golitsynskiy, Shilpa Garg, Simon Bray, Simon Gladman, Simone Leo, Subina Mehta, Timothy J. Griffin, Vahid Jalili, Yves Vandenbrouck, Vi-Kwei Wen, Vijaykrishna Nagampalli, W. Bacon, W. L. De Koning, Wolf-Martin Maier, P. J. Briggs - Show less +96 more

21 Apr 2022-Nucleic Acids Research

TL;DR: Key Galaxy technical developments include an improved user interface for launching large-scale analyses with many files, interactive tools for exploratory data analysis, and a complete suite of machine learning tools.

...read moreread less

Abstract: Abstract Galaxy is a mature, browser accessible workbench for scientific computing. It enables scientists to share, analyze and visualize their own data, with minimal technical impediments. A thriving global community continues to use, maintain and contribute to the project, with support from multiple national infrastructure providers that enable freely accessible analysis and training services. The Galaxy Training Network supports free, self-directed, virtual training with >230 integrated tutorials. Project engagement metrics have continued to grow over the last 2 years, including source code contributions, publications, software packages wrapped as tools, registered users and their daily analysis jobs, and new independent specialized servers. Key Galaxy technical developments include an improved user interface for launching large-scale analyses with many files, interactive tools for exploratory data analysis, and a complete suite of machine learning tools. Important scientific developments enabled by Galaxy include Vertebrate Genome Project (VGP) assembly workflows and global SARS-CoV-2 collaborations.

...read moreread less

Journal Article•DOI•

PubChem 2023 update.

[...]

Sunghwan Kim, Jie Chen, Tiejun Cheng, Asta Gindulyte, Jia He, Siqian He, Qingliang Li, Benjamin A Shoemaker, Paul A Thiessen, Bo Yu, Leonid Zaslavsky, Jian Zhang, Evan E Bolton - Show less +9 more

28 Oct 2022-Nucleic Acids Research

TL;DR: An overview of changes made to PubChem in the past two years is provided, including the integration of Google Patents data into PubChem, which greatly expanded the coverage of the PubChem Patent data collection; the creation of the Cell Line and Taxonomy data collections, which provide quick and easy access to chemical information for a given cell line and taxon.

...read moreread less

Abstract: PubChem (https://pubchem.ncbi.nlm.nih.gov) is a popular chemical information resource that serves a wide range of use cases. In the past two years, a number of changes were made to PubChem. Data from more than 120 data sources was added to PubChem. Some major highlights include: the integration of Google Patents data into PubChem, which greatly expanded the coverage of the PubChem Patent data collection; the creation of the Cell Line and Taxonomy data collections, which provide quick and easy access to chemical information for a given cell line and taxon, respectively; and the update of the bioassay data model. In addition, new functionalities were added to the PubChem programmatic access protocols, PUG-REST and PUG-View, including support for target-centric data download for a given protein, gene, pathway, cell line, and taxon and the addition of the 'standardize' option to PUG-REST, which returns the standardized form of an input chemical structure. A significant update was also made to PubChemRDF. The present paper provides an overview of these changes.

...read moreread less

Journal Article•DOI•

CB-Dock2: improved protein–ligand blind docking by integrating cavity detection, docking and homologous template fitting

[...]

21 Apr 2022-Nucleic Acids Research

TL;DR: Gal as mentioned in this paper is a mature, browser accessible workbench for scientific computing, which enables scientists to share, analyze and visualize their own data, with minimal technical impediments. But it does not support large-scale analyses with many files.

...read moreread less

Abstract: Galaxy is a mature, browser accessible workbench for scientific computing. It enables scientists to share, analyze and visualize their own data, with minimal technical impediments. A thriving global community continues to use, maintain and contribute to the project, with support from multiple national infrastructure providers that enable freely accessible analysis and training services. The Galaxy Training Network supports free, self-directed, virtual training with >230 integrated tutorials. Project engagement metrics have continued to grow over the last 2 years, including source code contributions, publications, software packages wrapped as tools, registered users and their daily analysis jobs, and new independent specialized servers. Key Galaxy technical developments include an improved user interface for launching large-scale analyses with many files, interactive tools for exploratory data analysis, and a complete suite of machine learning tools. Important scientific developments enabled by Galaxy include Vertebrate Genome Project (VGP) assembly workflows and global SARS-CoV-2 collaborations.

...read moreread less

Journal Article•DOI•

[...]

Yang Liu, Xiao-Chun Yang, Jianhong Gan, Shuang Chen, Zhi-Xiong Xiao, Yang Cao - Show less +2 more

TL;DR: This updated docking server, named CB-Dock2, reconfigured the input and output web interfaces, together with a highly automatic docking pipeline, making it a particularly efficient and easy-to-use tool for the bioinformatics and cheminformatics communities.

...read moreread less

Abstract: Abstract Protein-ligand blind docking is a powerful method for exploring the binding sites of receptors and the corresponding binding poses of ligands. It has seen wide applications in pharmaceutical and biological researches. Previously, we proposed a blind docking server, CB-Dock, which has been under heavy use (over 200 submissions per day) by researchers worldwide since 2019. Here, we substantially improved the docking method by combining CB-Dock with our template-based docking engine to enhance the accuracy in binding site identification and binding pose prediction. In the benchmark tests, it yielded the success rate of ∼85% for binding pose prediction (RMSD < 2.0 Å), which outperformed original CB-Dock and most popular blind docking tools. This updated docking server, named CB-Dock2, reconfigured the input and output web interfaces, together with a highly automatic docking pipeline, making it a particularly efficient and easy-to-use tool for the bioinformatics and cheminformatics communities. The web server is freely available at https://cadd.labshare.cn/cb-dock2/.

...read moreread less

Journal Article•DOI•

[...]

The NHGRI-EBI GWAS Catalog: knowledgebase and deposition resource

TL;DR: CB-Dock2 as discussed by the authors improved the CB-DOCK algorithm by combining the template-based docking engine with the template based docking engine to enhance the accuracy in binding site identification and binding pose prediction.

...read moreread less

Abstract: Protein-ligand blind docking is a powerful method for exploring the binding sites of receptors and the corresponding binding poses of ligands. It has seen wide applications in pharmaceutical and biological researches. Previously, we proposed a blind docking server, CB-Dock, which has been under heavy use (over 200 submissions per day) by researchers worldwide since 2019. Here, we substantially improved the docking method by combining CB-Dock with our template-based docking engine to enhance the accuracy in binding site identification and binding pose prediction. In the benchmark tests, it yielded the success rate of ∼85% for binding pose prediction (RMSD < 2.0 Å), which outperformed original CB-Dock and most popular blind docking tools. This updated docking server, named CB-Dock2, reconfigured the input and output web interfaces, together with a highly automatic docking pipeline, making it a particularly efficient and easy-to-use tool for the bioinformatics and cheminformatics communities. The web server is freely available at https://cadd.labshare.cn/cb-dock2/.

...read moreread less

Journal Article•DOI•

[...]

TL;DR: The NHGRI-EBI GWAS Catalog (www.ebi.ac.uk/gwas) is a FAIR knowledgebase providing detailed, structured, standardised and interoperable genome-wide association study (GWAS) data to >200 000 users per year from academic research, healthcare and industry as mentioned in this paper .

...read moreread less

Abstract: Abstract The NHGRI-EBI GWAS Catalog (www.ebi.ac.uk/gwas) is a FAIR knowledgebase providing detailed, structured, standardised and interoperable genome-wide association study (GWAS) data to >200 000 users per year from academic research, healthcare and industry. The Catalog contains variant-trait associations and supporting metadata for >45 000 published GWAS across >5000 human traits, and >40 000 full P-value summary statistics datasets. Content is curated from publications or acquired via author submission of prepublication summary statistics through a new submission portal and validation tool. GWAS data volume has vastly increased in recent years. We have updated our software to meet this scaling challenge and to enable rapid release of submitted summary statistics. The scope of the repository has expanded to include additional data types of high interest to the community, including sequencing-based GWAS, gene-based analyses and copy number variation analyses. Community outreach has increased the number of shared datasets from under-represented traits, e.g. cancer, and we continue to contribute to awareness of the lack of population diversity in GWAS. Interoperability of the Catalog has been enhanced through links to other resources including the Polygenic Score Catalog and the International Mouse Phenotyping Consortium, refinements to GWAS trait annotation, and the development of a standard format for GWAS data.

...read moreread less

Journal Article•DOI•

DeepLoc 2.0: multi-label subcellular localization prediction using protein language models

[...]

17 May 2022-Nucleic Acids Research

TL;DR: SynergyFinder as discussed by the authors is a free web-application for interactive analysis and visualization of multi-drug combination response data, which has become a popular tool for multi-dose combination data analytics, partly because the development of its functionality and graphical interface has been driven by a diverse user community.

...read moreread less

Abstract: SynergyFinder (https://synergyfinder.fimm.fi) is a free web-application for interactive analysis and visualization of multi-drug combination response data. Since its first release in 2017, SynergyFinder has become a popular tool for multi-dose combination data analytics, partly because the development of its functionality and graphical interface has been driven by a diverse user community, including both chemical biologists and computational scientists. Here, we describe the latest upgrade of this community-effort, SynergyFinder release 3.0, introducing a number of novel features that support interactive multi-sample analysis of combination synergy, a novel consensus synergy score that combines multiple synergy scoring models, and an improved outlier detection functionality that eliminates false positive results, along with many other post-analysis options such as weighting of synergy by drug concentrations and distinguishing between different modes of synergy (potency and efficacy). Based on user requests, several additional improvements were also implemented, including new data visualizations and export options for multi-drug combinations. With these improvements, SynergyFinder 3.0 supports robust identification of consistent combinatorial synergies for multi-drug combinatorial discovery and clinical translation.

...read moreread less

Journal Article•DOI•

[...]

Vineet Thumuluri, Jose Juan Almagro Armenteros, Alexander Rosenberg Johansen, Henrik Nielsen, Ole Winther - Show less +1 more

30 Apr 2022-Nucleic Acids Research

TL;DR: An update to the popular tool DeepLoc with multi-localization prediction and improvements in both performance and interpretability is proposed, and it is found that the attention output correlates well with the position of sorting signals.

...read moreread less

Abstract: Abstract The prediction of protein subcellular localization is of great relevance for proteomics research. Here, we propose an update to the popular tool DeepLoc with multi-localization prediction and improvements in both performance and interpretability. For training and validation, we curate eukaryotic and human multi-location protein datasets with stringent homology partitioning and enriched with sorting signal information compiled from the literature. We achieve state-of-the-art performance in DeepLoc 2.0 by using a pre-trained protein language model. It has the further advantage that it uses sequence input rather than relying on slower protein profiles. We provide two means of better interpretability: an attention output along the sequence and highly accurate prediction of nine different types of protein sorting signals. We find that the attention output correlates well with the position of sorting signals. The webserver is available at services.healthtech.dtu.dk/service.php?DeepLoc-2.0.

...read moreread less

Journal Article•DOI•

Introducing the Bacterial and Viral Bioinformatics Resource Center (BV-BRC): a resource combining PATRIC, IRD and ViPR.

[...]

30 Apr 2022-Nucleic Acids Research

TL;DR: DeepLoc-2.0 as discussed by the authors proposes an update to the popular tool DeepLoc with multi-localization prediction and improvements in both performance and interpretability, achieving state-of-the-art performance in DeepLoc 2.0.

...read moreread less

Journal Article•DOI•

[...]

ChIP-Atlas 2021 update: a data-mining suite for exploring epigenomic landscapes by fully integrating ChIP-seq, ATAC-seq and Bisulfite-seq data

TL;DR: The BV-BRC as discussed by the authors merged the PAThosystems Resource Integration Center (PATRIC), the Influenza Research Database (IRD) and the Virus Pathogen Database and Analysis Resource (ViPR) BRCs to form the Bacterial and Viral Bioinformatics Resource Center.

...read moreread less

Abstract: The National Institute of Allergy and Infectious Diseases (NIAID) established the Bioinformatics Resource Center (BRC) program to assist researchers with analyzing the growing body of genome sequence and other omics-related data. In this report, we describe the merger of the PAThosystems Resource Integration Center (PATRIC), the Influenza Research Database (IRD) and the Virus Pathogen Database and Analysis Resource (ViPR) BRCs to form the Bacterial and Viral Bioinformatics Resource Center (BV-BRC) https://www.bv-brc.org/. The combined BV-BRC leverages the functionality of the bacterial and viral resources to provide a unified data model, enhanced web-based visualization and analysis tools, bioinformatics services, and a powerful suite of command line tools that benefit the bacterial and viral research communities.

...read moreread less

Journal Article•DOI•

[...]

Zhaonan Zou, Tazro Ohta, Fumihito Miura, Shinya Oki

24 Mar 2022-Nucleic Acids Research

TL;DR: This update collected all the ATAC-seq and whole-genome bisulfite-seq data for six model organisms with the latest genome assemblies and provided a panoramic view of the whole epigenomic landscape of ChIP-Atlas.

...read moreread less

Abstract: Abstract ChIP-Atlas (https://chip-atlas.org) is a web service providing both GUI- and API-based data-mining tools to reveal the architecture of the transcription regulatory landscape. ChIP-Atlas is powered by comprehensively integrating all data sets from high-throughput ChIP-seq and DNase-seq, a method for profiling chromatin regions accessible to DNase. In this update, we further collected all the ATAC-seq and whole-genome bisulfite-seq data for six model organisms (human, mouse, rat, fruit fly, nematode, and budding yeast) with the latest genome assemblies. These together with ChIP-seq data can be visualized with the Peak Browser tool and a genome browser to explore the epigenomic landscape of a query genomic locus, such as its chromatin accessibility, DNA methylation status, and protein–genome interactions. This epigenomic landscape can also be characterized for multiple genes and genomic loci by querying with the Enrichment Analysis tool, which, for example, revealed that inflammatory bowel disease-associated SNPs are the most significantly hypo-methylated in neutrophils. Therefore, ChIP-Atlas provides a panoramic view of the whole epigenomic landscape. All datasets are free to download via either a simple button on the web page or an API.

...read moreread less

Journal Article•DOI•

Comparative Toxicogenomics Database (CTD): update 2023

[...]

24 Mar 2022-Nucleic Acids Research

TL;DR: ChIP-Atlas as discussed by the authors is a web service providing both GUI-and API-based data-mining tools to reveal the architecture of the transcription regulatory landscape, including chromatin accessibility, DNA methylation status, and protein-genome interactions.

...read moreread less

Abstract: ChIP-Atlas (https://chip-atlas.org) is a web service providing both GUI- and API-based data-mining tools to reveal the architecture of the transcription regulatory landscape. ChIP-Atlas is powered by comprehensively integrating all data sets from high-throughput ChIP-seq and DNase-seq, a method for profiling chromatin regions accessible to DNase. In this update, we further collected all the ATAC-seq and whole-genome bisulfite-seq data for six model organisms (human, mouse, rat, fruit fly, nematode, and budding yeast) with the latest genome assemblies. These together with ChIP-seq data can be visualized with the Peak Browser tool and a genome browser to explore the epigenomic landscape of a query genomic locus, such as its chromatin accessibility, DNA methylation status, and protein-genome interactions. This epigenomic landscape can also be characterized for multiple genes and genomic loci by querying with the Enrichment Analysis tool, which, for example, revealed that inflammatory bowel disease-associated SNPs are the most significantly hypo-methylated in neutrophils. Therefore, ChIP-Atlas provides a panoramic view of the whole epigenomic landscape. All datasets are free to download via either a simple button on the web page or an API.

...read moreread less

Journal Article•DOI•

[...]

Allan Peter Davis, Thomas C. Wiegers, Robin J. Johnson, Daniela Sciaky, Jolene Wiegers, Carolyn J. Mattingly - Show less +2 more

28 Sep 2022-Nucleic Acids Research

TL;DR: There is a 20% increase in overall CTD content and a novel tool that computationally generates four-unit information blocks connecting a chemical, gene, phenotype, and disease to construct potential molecular mechanistic pathways is presented.

...read moreread less

Abstract: Abstract The Comparative Toxicogenomics Database (CTD; http://ctdbase.org/) harmonizes cross-species heterogeneous data for chemical exposures and their biological repercussions by manually curating and interrelating chemical, gene, phenotype, anatomy, disease, taxa, and exposure content from the published literature. This curated information is integrated to generate inferences, providing potential molecular mediators to develop testable hypotheses and fill in knowledge gaps for environmental health. This dual nature, acting as both a knowledgebase and a discoverybase, makes CTD a unique resource for the scientific community. Here, we report a 20% increase in overall CTD content for 17 100 chemicals, 54 300 genes, 6100 phenotypes, 7270 diseases and 202 000 exposure statements. We also present CTD Tetramers, a novel tool that computationally generates four-unit information blocks connecting a chemical, gene, phenotype, and disease to construct potential molecular mechanistic pathways. Finally, we integrate terms for human biological media used in the CTD Exposure module to corresponding CTD Anatomy pages, allowing users to survey the chemical profiles for any tissue-of-interest and see how these environmental biomarkers are related to phenotypes for any anatomical site. These, and other webpage visual enhancements, continue to promote CTD as a practical, user-friendly, and innovative resource for finding information and generating testable hypotheses about environmental health.

...read moreread less

Journal Article•DOI•

SynergyFinder 3.0: an interactive analysis and consensus interpretation of multi-drug synergies across multiple samples

[...]

Aleksandr Ianevski, Anil K. Giri, Tero Aittokallio

17 May 2022-Nucleic Acids Research

TL;DR: The latest upgrade of this community-effort SynergyFinder release 3.0 is described, introducing a number of novel features that support interactive multi-sample analysis of combination synergy, a novel consensus synergy score that combines multiple synergy scoring models, and an improved outlier detection functionality that eliminates false positive results.

...read moreread less

Abstract: Abstract SynergyFinder (https://synergyfinder.fimm.fi) is a free web-application for interactive analysis and visualization of multi-drug combination response data. Since its first release in 2017, SynergyFinder has become a popular tool for multi-dose combination data analytics, partly because the development of its functionality and graphical interface has been driven by a diverse user community, including both chemical biologists and computational scientists. Here, we describe the latest upgrade of this community-effort, SynergyFinder release 3.0, introducing a number of novel features that support interactive multi-sample analysis of combination synergy, a novel consensus synergy score that combines multiple synergy scoring models, and an improved outlier detection functionality that eliminates false positive results, along with many other post-analysis options such as weighting of synergy by drug concentrations and distinguishing between different modes of synergy (potency and efficacy). Based on user requests, several additional improvements were also implemented, including new data visualizations and export options for multi-drug combinations. With these improvements, SynergyFinder 3.0 supports robust identification of consistent combinatorial synergies for multi-drug combinatorial discovery and clinical translation.

...read moreread less

Journal Article•DOI•

NetSurfP-3.0: accurate and fast prediction of protein structural features by protein language models and deep learning

[...]

M. H. Høie, Erik Nicolas Kiehl, Bent Petersen, Morten Nielsen, Ole Winther, Henrik Nielsen, Jeppe Hallgren, Paolo Marcatili - Show less +4 more

01 Jun 2022-Nucleic Acids Research

TL;DR: This NetSurfP update exploits recent advances in pre-trained protein language models to drastically improve the runtime of its predecessor by two orders of magnitude, while displaying similar prediction performance.

...read moreread less

Abstract: Abstract Recent advances in machine learning and natural language processing have made it possible to profoundly advance our ability to accurately predict protein structures and their functions. While such improvements are significantly impacting the fields of biology and biotechnology at large, such methods have the downside of high demands in terms of computing power and runtime, hampering their applicability to large datasets. Here, we present NetSurfP-3.0, a tool for predicting solvent accessibility, secondary structure, structural disorder and backbone dihedral angles for each residue of an amino acid sequence. This NetSurfP update exploits recent advances in pre-trained protein language models to drastically improve the runtime of its predecessor by two orders of magnitude, while displaying similar prediction performance. We assessed the accuracy of NetSurfP-3.0 on several independent test datasets and found it to consistently produce state-of-the-art predictions for each of its output features, with a runtime that is up to to 600 times faster than the most commonly available methods performing the same tasks. The tool is freely available as a web server with a user-friendly interface to navigate the results, as well as a standalone downloadable package.

...read moreread less

Journal Article•DOI•