Improving the utility of the Tox21 dataset by deep metadata annotations and constructing reusable benchmarked chemical reference signatures
Reads0
Chats0
TLDR
The importance of data standards in reporting screening results and high-quality annotations to enable re-use and interpretation of these data are demonstrated.Abstract:
The Toxicology in the 21st Century (Tox21) project seeks to develop and test methods for high-throughput examination of the effect certain chemical compounds have on biological systems. Although primary and toxicity assay data were readily available for multiple reporter gene modified cell lines, extensive annotation and curation was required to improve these datasets with respect to how FAIR (Findable, Accessible, Interoperable, and Reusable) they are. In this study, we fully annotated the Tox21 published data with relevant and accepted controlled vocabularies. After removing unreliable data points, we aggregated the results and created three sets of signatures reflecting activity in the reporter gene assays, cytotoxicity, and selective reporter gene activity, respectively. We benchmarked these signatures using the chemical structures of the tested compounds and obtained generally high receiver operating characteristic (ROC) scores, suggesting good quality and utility of these signatures and the underlying data. We analyzed the results to identify promiscuous individual compounds and chemotypes for the three signature categories and interpreted the results to illustrate the utility and re-usability of the datasets. With this study, we aimed to demonstrate the importance of data standards in reporting screening results and high-quality annotations to enable re-use and interpretation of these data. To improve the data with respect to all FAIR criteria, all assay annotations, cleaned and aggregate datasets, and signatures were made available as standardized dataset packages (Aggregated Tox21 bioactivity data, 2019).read more
Citations
More filters
Journal ArticleDOI
LINCS Data Portal 2.0: next generation access point for perturbation-response signatures
Vasileios Stathias,John Paul Turner,Amar Koleti,Dusica Vidovic,Daniel J. Cooper,Mehdi Fazel-Najafabadi,Marcin Pilarczyk,Raymond Terryn,Caty Chung,Afoma C. Umeano,Daniel J.B. Clarke,Alexander Lachmann,John Erol Evangelista,Avi Ma'ayan,Mario Medvedovic,Stephan C. Schürer +15 more
TL;DR: The cornerstone of this update has been the decision to reprocess all high-level LINCS datasets and make them accessible at the data point level enabling users to directly access and download any subset of signatures across the entire library independent from the originating source, project or assay.
Journal ArticleDOI
An integrated chemical environment with tools for chemical safety testing.
Shannon M. Bell,Jaleh Abedini,Patricia Ceger,Xiaoqing Chang,Bethany Cook,Agnes L. Karmaus,Isabel Lea,Kamel Mansouri,Jason Phillips,Eric McAfee,Ruhi Rai,John P. Rooney,Catherine S. Sprankle,Arpit Tandon,David Allen,Warren Casey,Nicole Kleinstreuer +16 more
TL;DR: Improved accessibility and interpretability of in vitro data via mechanistic target mapping and enhanced interactive tools for in vitro to in vivo extrapolation (IVIVE) are described, including improved accessibility andinterpretability of the applications of an expanded data space and building confidence in non-animal approaches.
Journal ArticleDOI
DeepSnap-Deep Learning Approach Predicts Progesterone Receptor Antagonist Activity With High Performance.
TL;DR: The proposed novel DL-based quantitative structure-activity relationship (QSAR) strategy using transfer learning to build prediction models for agonists and antagonists showed a high performance prediction of the PR antagonists by optimization of some parameters and image adjustment from 3D-structures.
References
More filters
Journal ArticleDOI
The FAIR Guiding Principles for scientific data management and stewardship
Mark Wilkinson,Michel Dumontier,IJsbrand Jan Aalbersberg,Gabrielle Appleton,Myles Axton,Arie Baak,Niklas Blomberg,Jan-Willem Boiten,Luiz Olavo Bonino da Silva Santos,Philip E. Bourne,Jildau Bouwman,Anthony J. Brookes,Timothy Clark,Mercè Crosas,Ingrid Dillo,Olivier G. Dumon,Scott C. Edmunds,Chris T. Evelo,Richard Finkers,Alejandra Gonzalez-Beltran,Alasdair J. G. Gray,Paul Groth,Carole Goble,Jeffrey S. Grethe,Jaap Heringa,Peter A C 't Hoen,Rob Hooft,Tobias Kuhn,Ruben Kok,Joost N. Kok,Scott J. Lusher,Maryann E. Martone,Albert Mons,Abel L. Packer,Bengt Persson,Philippe Rocca-Serra,Marco Roos,Rene van Schaik,Susanna-Assunta Sansone,Erik Anthony Schultes,Thierry Sengstag,Ted Slater,George Strawn,Morris A. Swertz,Mark Thompson,Johan van der Lei,Erik M. van Mulligen,Jan Velterop,Andra Waagmeester,Peter Wittenburg,Katherine Wolstencroft,Jun Zhao,Barend Mons,Barend Mons +53 more
TL;DR: The FAIR Data Principles as mentioned in this paper are a set of data reuse principles that focus on enhancing the ability of machines to automatically find and use the data, in addition to supporting its reuse by individuals.
Journal ArticleDOI
Extended-Connectivity Fingerprints
David Rogers,Mathew Hahn +1 more
TL;DR: A description of their implementation has not previously been presented in the literature, and ECFPs can be very rapidly calculated and can represent an essentially infinite number of different molecular features.
Journal ArticleDOI
From core referencing to data re-use: two French national initiatives to reinforce paleodata stewardship (National Cyber Core Repository and LTER France Retro-Observatory)
Fabien Arnaud,Cécile Pignol,Pierre Stéphan,Anne-Lise Develle,Pierre Sabatier,Olivier Evrard,Brice Mourier,Maxime Debret,Cécile Grobois,Laurent Millet,Damien Rius,Dominique Marguerie,Mathias Rouan,Elodie Godinho,Bruno Galabertier,Arnaud Caillo +15 more
TL;DR: ROZA was developed under the umbrella of LTER-France (Long Term Ecological Research) in order to facilitate the re-use of data and samples and will favor to use of paleodata by non-paleodata scientists, in particular ecologists.
Journal ArticleDOI
A Next Generation Connectivity Map: L1000 Platform and the First 1,000,000 Profiles.
Aravind Subramanian,Rajiv Narayan,Steven M. Corsello,Steven M. Corsello,David Peck,Ted Natoli,Xiaodong Lu,Joshua Gould,John F. Davis,Andrew A. Tubelli,Jacob K. Asiedu,David L. Lahr,Jodi E. Hirschman,Zihan Liu,Melanie Donahue,Bina Julian,Mariya Khan,David Wadden,Ian Smith,Daniel D. Lam,Arthur Liberzon,Courtney Toder,Mukta Bagul,Marek Orzechowski,Oana M. Enache,Federica Piccioni,Sarah A. Johnson,Nicholas J. Lyons,Alice H. Berger,Alice H. Berger,Alykhan F. Shamji,Angela N. Brooks,Angela N. Brooks,Anita Vrcic,Corey Flynn,Jacqueline Rosains,David Y. Takeda,David Y. Takeda,Roger Hu,Desiree Davison,Justin Lamb,Kristin Ardlie,Larson Hogstrom,Peyton Greenside,Nathanael S. Gray,Nathanael S. Gray,Paul A. Clemons,Serena J. Silver,Xiaoyun Wu,Wen-Ning Zhao,Wen-Ning Zhao,Willis Read-Button,Xiaohua Wu,Stephen J. Haggarty,Stephen J. Haggarty,Lucienne Ronco,Jesse S. Boehm,Stuart L. Schreiber,Stuart L. Schreiber,Stuart L. Schreiber,John G. Doench,Joshua A. Bittker,David E. Root,Bang Wong,Todd R. Golub +64 more
TL;DR: The expanded CMap is reported, made possible by a new, low-cost, high-throughput reduced representation expression profiling method that is shown to be highly reproducible, comparable to RNA sequencing, and suitable for computational inference of the expression levels of 81% of non-measured transcripts.
Journal ArticleDOI
Expansion of the Gene Ontology knowledgebase and resources
Seth Carbon,J. Chan,R. Kishore,Raymond Lee,Hans-Michael Müller,D. Raciti,K. Van Auken,Paul W. Sternberg +7 more
TL;DR: The current contents of the GO knowledgebase are summarized, several new features and improvements that have been made to the ontology, the annotations and the tools are presented, and extensions to the resource are extended, increasing support for descriptions of causal models of biological systems and network biology.