Improving the utility of the Tox21 dataset by deep metadata annotations and constructing reusable benchmarked chemical reference signatures

doi:10.3390/MOLECULES24081604

Open AccessJournal ArticleDOI

Improving the utility of the Tox21 dataset by deep metadata annotations and constructing reusable benchmarked chemical reference signatures

Daniel J. Cooper, +1 more

- 23 Apr 2019 -

Molecules

- Vol. 24, Iss: 8, pp 1604

Chats0

TLDR

The importance of data standards in reporting screening results and high-quality annotations to enable re-use and interpretation of these data are demonstrated.

Abstract:

The Toxicology in the 21st Century (Tox21) project seeks to develop and test methods for high-throughput examination of the effect certain chemical compounds have on biological systems. Although primary and toxicity assay data were readily available for multiple reporter gene modified cell lines, extensive annotation and curation was required to improve these datasets with respect to how FAIR (Findable, Accessible, Interoperable, and Reusable) they are. In this study, we fully annotated the Tox21 published data with relevant and accepted controlled vocabularies. After removing unreliable data points, we aggregated the results and created three sets of signatures reflecting activity in the reporter gene assays, cytotoxicity, and selective reporter gene activity, respectively. We benchmarked these signatures using the chemical structures of the tested compounds and obtained generally high receiver operating characteristic (ROC) scores, suggesting good quality and utility of these signatures and the underlying data. We analyzed the results to identify promiscuous individual compounds and chemotypes for the three signature categories and interpreted the results to illustrate the utility and re-usability of the datasets. With this study, we aimed to demonstrate the importance of data standards in reporting screening results and high-quality annotations to enable re-use and interpretation of these data. To improve the data with respect to all FAIR criteria, all assay annotations, cleaned and aggregate datasets, and signatures were made available as standardized dataset packages (Aggregated Tox21 bioactivity data, 2019).

Citations

PDF

Open Access

More filters

Journal ArticleDOI

LINCS Data Portal 2.0: next generation access point for perturbation-response signatures

Vasileios Stathias, +15 more

- 08 Jan 2020 -

Nucleic Acids Research

TL;DR: The cornerstone of this update has been the decision to reprocess all high-level LINCS datasets and make them accessible at the data point level enabling users to directly access and download any subset of signatures across the entire library independent from the originating source, project or assay.

...read moreread less

Journal ArticleDOI

An integrated chemical environment with tools for chemical safety testing.

Shannon M. Bell, +16 more

- 14 Jun 2020 -

Toxicology in Vitro

TL;DR: Improved accessibility and interpretability of in vitro data via mechanistic target mapping and enhanced interactive tools for in vitro to in vivo extrapolation (IVIVE) are described, including improved accessibility andinterpretability of the applications of an expanded data space and building confidence in non-animal approaches.

...read moreread less

Journal ArticleDOI

DeepSnap-Deep Learning Approach Predicts Progesterone Receptor Antagonist Activity With High Performance.

Yasunari Matsuzaka, +1 more

- 22 Jan 2020 -

Frontiers in Bioengineering and Biotechn...

TL;DR: The proposed novel DL-based quantitative structure-activity relationship (QSAR) strategy using transfer learning to build prediction models for agonists and antagonists showed a high performance prediction of the PR antagonists by optimization of some parameters and image adjustment from 3D-structures.

...read moreread less

References

PDF

Open Access

More filters

Journal ArticleDOI

The FAIR Guiding Principles for scientific data management and stewardship

Mark Wilkinson, +53 more

- 15 Mar 2016 -

Scientific Data

TL;DR: The FAIR Data Principles as mentioned in this paper are a set of data reuse principles that focus on enhancing the ability of machines to automatically find and use the data, in addition to supporting its reuse by individuals.

...read moreread less

Journal ArticleDOI

Extended-Connectivity Fingerprints

David Rogers, +1 more

- 28 Apr 2010 -

Journal of Chemical Information and Mode...

TL;DR: A description of their implementation has not previously been presented in the literature, and ECFPs can be very rapidly calculated and can represent an essentially infinite number of different molecular features.

...read moreread less

Journal ArticleDOI

From core referencing to data re-use: two French national initiatives to reinforce paleodata stewardship (National Cyber Core Repository and LTER France Retro-Observatory)

Fabien Arnaud, +15 more

TL;DR: ROZA was developed under the umbrella of LTER-France (Long Term Ecological Research) in order to facilitate the re-use of data and samples and will favor to use of paleodata by non-paleodata scientists, in particular ecologists.

...read moreread less

Journal ArticleDOI

A Next Generation Connectivity Map: L1000 Platform and the First 1,000,000 Profiles.

Aravind Subramanian, +64 more

- 30 Nov 2017 -

Cell

TL;DR: The expanded CMap is reported, made possible by a new, low-cost, high-throughput reduced representation expression profiling method that is shown to be highly reproducible, comparable to RNA sequencing, and suitable for computational inference of the expression levels of 81% of non-measured transcripts.

...read moreread less

Journal ArticleDOI

Expansion of the Gene Ontology knowledgebase and resources

Seth Carbon, +7 more

- 04 Jan 2017 -

Nucleic Acids Research

TL;DR: The current contents of the GO knowledgebase are summarized, several new features and improvements that have been made to the ontology, the annotations and the tools are presented, and extensions to the resource are extended, increasing support for descriptions of causal models of biological systems and network biology.

...read moreread less