scispace - formally typeset
Open AccessJournal ArticleDOI

Improving the utility of the Tox21 dataset by deep metadata annotations and constructing reusable benchmarked chemical reference signatures

Daniel J. Cooper, +1 more
- 23 Apr 2019 - 
- Vol. 24, Iss: 8, pp 1604
Reads0
Chats0
TLDR
The importance of data standards in reporting screening results and high-quality annotations to enable re-use and interpretation of these data are demonstrated.
Abstract
The Toxicology in the 21st Century (Tox21) project seeks to develop and test methods for high-throughput examination of the effect certain chemical compounds have on biological systems. Although primary and toxicity assay data were readily available for multiple reporter gene modified cell lines, extensive annotation and curation was required to improve these datasets with respect to how FAIR (Findable, Accessible, Interoperable, and Reusable) they are. In this study, we fully annotated the Tox21 published data with relevant and accepted controlled vocabularies. After removing unreliable data points, we aggregated the results and created three sets of signatures reflecting activity in the reporter gene assays, cytotoxicity, and selective reporter gene activity, respectively. We benchmarked these signatures using the chemical structures of the tested compounds and obtained generally high receiver operating characteristic (ROC) scores, suggesting good quality and utility of these signatures and the underlying data. We analyzed the results to identify promiscuous individual compounds and chemotypes for the three signature categories and interpreted the results to illustrate the utility and re-usability of the datasets. With this study, we aimed to demonstrate the importance of data standards in reporting screening results and high-quality annotations to enable re-use and interpretation of these data. To improve the data with respect to all FAIR criteria, all assay annotations, cleaned and aggregate datasets, and signatures were made available as standardized dataset packages (Aggregated Tox21 bioactivity data, 2019).

read more

Citations
More filters
Journal ArticleDOI

LINCS Data Portal 2.0: next generation access point for perturbation-response signatures

TL;DR: The cornerstone of this update has been the decision to reprocess all high-level LINCS datasets and make them accessible at the data point level enabling users to directly access and download any subset of signatures across the entire library independent from the originating source, project or assay.
Journal ArticleDOI

An integrated chemical environment with tools for chemical safety testing.

TL;DR: Improved accessibility and interpretability of in vitro data via mechanistic target mapping and enhanced interactive tools for in vitro to in vivo extrapolation (IVIVE) are described, including improved accessibility andinterpretability of the applications of an expanded data space and building confidence in non-animal approaches.
Journal ArticleDOI

DeepSnap-Deep Learning Approach Predicts Progesterone Receptor Antagonist Activity With High Performance.

TL;DR: The proposed novel DL-based quantitative structure-activity relationship (QSAR) strategy using transfer learning to build prediction models for agonists and antagonists showed a high performance prediction of the PR antagonists by optimization of some parameters and image adjustment from 3D-structures.
References
More filters
Journal ArticleDOI

The FAIR Guiding Principles for scientific data management and stewardship

TL;DR: The FAIR Data Principles as mentioned in this paper are a set of data reuse principles that focus on enhancing the ability of machines to automatically find and use the data, in addition to supporting its reuse by individuals.
Journal ArticleDOI

Extended-Connectivity Fingerprints

TL;DR: A description of their implementation has not previously been presented in the literature, and ECFPs can be very rapidly calculated and can represent an essentially infinite number of different molecular features.
Journal ArticleDOI

From core referencing to data re-use: two French national initiatives to reinforce paleodata stewardship (National Cyber Core Repository and LTER France Retro-Observatory)

TL;DR: ROZA was developed under the umbrella of LTER-France (Long Term Ecological Research) in order to facilitate the re-use of data and samples and will favor to use of paleodata by non-paleodata scientists, in particular ecologists.
Journal ArticleDOI

A Next Generation Connectivity Map: L1000 Platform and the First 1,000,000 Profiles.

TL;DR: The expanded CMap is reported, made possible by a new, low-cost, high-throughput reduced representation expression profiling method that is shown to be highly reproducible, comparable to RNA sequencing, and suitable for computational inference of the expression levels of 81% of non-measured transcripts.
Journal ArticleDOI

Expansion of the Gene Ontology knowledgebase and resources

TL;DR: The current contents of the GO knowledgebase are summarized, several new features and improvements that have been made to the ontology, the annotations and the tools are presented, and extensions to the resource are extended, increasing support for descriptions of causal models of biological systems and network biology.
Related Papers (5)