scispace - formally typeset
Search or ask a question

Showing papers by "Richard S. Judson published in 2018"


Journal ArticleDOI
TL;DR: This study aims to develop robust QSAR/QSPR models for chemical properties of environmental interest that can be used for regulatory purposes and uses data from the publicly available PHYSPROP database, a set of 13 common physicochemical and environmental fate properties.
Abstract: The collection of chemical structure information and associated experimental data for quantitative structure–activity/property relationship (QSAR/QSPR) modeling is facilitated by an increasing number of public databases containing large amounts of useful data. However, the performance of QSAR models highly depends on the quality of the data and modeling methodology used. This study aims to develop robust QSAR/QSPR models for chemical properties of environmental interest that can be used for regulatory purposes. This study primarily uses data from the publicly available PHYSPROP database consisting of a set of 13 common physicochemical and environmental fate properties. These datasets have undergone extensive curation using an automated workflow to select only high-quality data, and the chemical structures were standardized prior to calculation of the molecular descriptors. The modeling procedure was developed based on the five Organization for Economic Cooperation and Development (OECD) principles for QSAR models. A weighted k-nearest neighbor approach was adopted using a minimum number of required descriptors calculated using PaDEL, an open-source software. The genetic algorithms selected only the most pertinent and mechanistically interpretable descriptors (2–15, with an average of 11 descriptors). The sizes of the modeled datasets varied from 150 chemicals for biodegradability half-life to 14,050 chemicals for logP, with an average of 3222 chemicals across all endpoints. The optimal models were built on randomly selected training sets (75%) and validated using fivefold cross-validation (CV) and test sets (25%). The CV Q2 of the models varied from 0.72 to 0.95, with an average of 0.86 and an R2 test value from 0.71 to 0.96, with an average of 0.82. Modeling and performance details are described in QSAR model reporting format and were validated by the European Commission’s Joint Research Center to be OECD compliant. All models are freely available as an open-source, command-line application called OPEn structure–activity/property Relationship App (OPERA). OPERA models were applied to more than 750,000 chemicals to produce freely available predicted data on the U.S. Environmental Protection Agency’s CompTox Chemistry Dashboard.

271 citations


Journal ArticleDOI
20 Feb 2018-PLOS ONE
TL;DR: A hybrid method for gene selection is presented herein that combines data-driven and knowledge-driven concepts into one cohesive method and indicates that the sentinel genes selected can be used to accurately predict pathway perturbations and biological relationships for samples under study.
Abstract: Changes in gene expression can help reveal the mechanisms of disease processes and the mode of action for toxicities and adverse effects on cellular responses induced by exposures to chemicals, drugs and environment agents. The U.S. Tox21 Federal collaboration, which currently quantifies the biological effects of nearly 10,000 chemicals via quantitative high-throughput screening(qHTS) in in vitro model systems, is now making an effort to incorporate gene expression profiling into the existing battery of assays. Whole transcriptome analyses performed on large numbers of samples using microarrays or RNA-Seq is currently cost-prohibitive. Accordingly, the Tox21 Program is pursuing a high-throughput transcriptomics (HTT) method that focuses on the targeted detection of gene expression for a carefully selected subset of the transcriptome that potentially can reduce the cost by a factor of 10-fold, allowing for the analysis of larger numbers of samples. To identify the optimal transcriptome subset, genes were sought that are (1) representative of the highly diverse biological space, (2) capable of serving as a proxy for expression changes in unmeasured genes, and (3) sufficient to provide coverage of well described biological pathways. A hybrid method for gene selection is presented herein that combines data-driven and knowledge-driven concepts into one cohesive method. Our approach is modular, applicable to any species, and facilitates a robust, quantitative evaluation of performance. In particular, we were able to perform gene selection such that the resulting set of “sentinel genes” adequately represents all known canonical pathways from Molecular Signature Database (MSigDB v4.0) and can be used to infer expression changes for the remainder of the transcriptome. The resulting computational model allowed us to choose a purely data-driven subset of 1500 sentinel genes, referred to as the S1500 set, which was then augmented using a knowledge-driven selection of additional genes to create the final S1500+ gene set. Our results indicate that the sentinel genes selected can be used to accurately predict pathway perturbations and biological relationships for samples under study.

94 citations


Journal ArticleDOI
TL;DR: In this paper, the Life Cycle Initiative, hosted at the United Nations Environment Programme, selected human toxicity impacts from exposure to chemical substances as an impact category that requires gazetteers.
Abstract: Background: The Life Cycle Initiative, hosted at the United Nations Environment Programme, selected human toxicity impacts from exposure to chemical substances as an impact category that requires g...

37 citations


Journal ArticleDOI
TL;DR: An integrated analysis of chemical-mediated effects on steroidogenesis in the HT-H295R assay is developed and the maximum mean Mahalanobis distance (maxmMd) values were high for strong modulators (prochloraz, mifepristone) and lower for moderatemodulators (atrazine, molinate).

36 citations


Journal ArticleDOI
25 Jul 2018-PLOS ONE
TL;DR: This work improves the confidence of predictions made using HTS data, increasing the ability to use this data in risk assessment, and uses nonparametric bootstrap resampling to calculate uncertainties in concentration-response parameters from a variety of HTS assays.
Abstract: High throughput screening (HTS) projects like the U.S. Environmental Protection Agency's ToxCast program are required to address the large and rapidly increasing number of chemicals for which we have little to no toxicity measurements. Concentration-response parameters such as potency and efficacy are extracted from HTS data using nonlinear regression, and models and analyses built from these parameters are used to predict in vivo and in vitro toxicity of thousands of chemicals. How these predictions are impacted by uncertainties that stem from parameter estimation and propagated through the models and analyses has not been well explored. While data size and complexity makes uncertainty quantification computationally expensive for HTS datasets, continued advancements in computational resources have allowed these computational challenges to be met. This study uses nonparametric bootstrap resampling to calculate uncertainties in concentration-response parameters from a variety of HTS assays. Using the ToxCast estrogen receptor model for bioactivity as a case study, we highlight how these uncertainties can be propagated through models to quantify the uncertainty in model outputs. Uncertainty quantification in model outputs is used to identify potential false positives and false negatives and to determine the distribution of model values around semi-arbitrary activity cutoffs, increasing confidence in model predictions. At the individual chemical-assay level, curves with high variability are flagged for manual inspection or retesting, focusing subject-matter-expert time on results that need further input. This work improves the confidence of predictions made using HTS data, increasing the ability to use this data in risk assessment.

34 citations


Journal ArticleDOI
TL;DR: Considering the lack of reproducibility of the in vivo Hershberger assay, the in vitro AR model may better predict specific AR interaction and can rapidly and cost-effectively screen thousands of chemicals without using animals.

24 citations


01 Jan 2018
TL;DR: The task force thereby focuses on two major issues that emerged from the workshops, namely considering near-field exposures and improving dose–response modeling, and proposed set of recommendations for improving the characterization of human exposure and toxicity impacts in LCIA and other comparative assessment frameworks.
Abstract: Background: The Life Cycle Initiative, hosted at the United Nations Environment Programme, selected human toxicity impacts from exposure to chemical substances as an impact category that requires g...

20 citations


Journal ArticleDOI
TL;DR: A semi-automated process for selecting and annotating reference chemicals across many targets in a standardized format allows rapid development of candidate reference chemical lists for a wide variety of targets that can facilitate performance evaluation of in vitro assays as a critical step in imparting confidence in alternative approaches.
Abstract: Instilling confidence in use of in vitro assays for predictive toxicology requires evaluation of assay performance. Performance is typically assessed using reference chemicals--compounds with defined activity against the test system target. However, developing reference chemical lists has historically been very resource-intensive. We developed a semi-automated process for selecting and annotating reference chemicals across many targets in a standardized format and demonstrate the workflow here. A series of required fields defines the potential reference chemical: the in vitro molecular target, pathway, or phenotype affected; and the chemical's mode (e.g. agonist, antagonist, inhibitor). Activity information was computationally extracted into a database from multiple public sources including non-curated scientific literature and curated chemical-biological databases, resulting in the identification of chemical activity in 2995 biological targets. Sample data from literature sources covering 54 molecular targets ranging from data-poor to data-rich was manually checked for accuracy. Precision rates were 82.7% from curated data sources and 39.5% from automated literature extraction. We applied the final reference chemical lists to evaluating performance of EPA's ToxCast program in vitro bioassays. The level of support, i.e. the number of independent reports in the database linking a chemical to a target, was found to strongly correlate with likelihood of positive results in the ToxCast assays, although individual assay performance had considerable variation. This overall approach allows rapid development of candidate reference chemical lists for a wide variety of targets that can facilitate performance evaluation of in vitro assays as a critical step in imparting confidence in alternative approaches.

17 citations


Posted ContentDOI
08 Oct 2018-bioRxiv
TL;DR: A repository of mathematical models for anatomical and physiological quantities of interest provides a basis for PBPK models of human pregnancy and gestation, and can ultimately be used to support decision-making with respect to optimal pharmacological dosing and risk assessment for pregnant women and their developing fetuses.
Abstract: Many parameters treated as constants in traditional physiologically based pharmacokinetic models must be formulated as time-varying quantities when modeling pregnancy and gestation due to the dramatic physiological and anatomical changes that occur during this period. While several collections of empirical models for such parameters have been published, each has shortcomings. We sought to create a repository of empirical models for tissue volumes, blood flow rates, and other quantities that undergo substantial changes in a human mother and her fetus during the time between conception and birth, and to address deficiencies with similar, previously published repositories. We used maximum likelihood estimation to calibrate various models for the time-varying quantities of interest, and then used the Akaike information criterion to select an optimal model for each quantity. For quantities of interest for which time-course data were not available, we constructed composite models using percentages and/or models describing related quantities. In this way, we developed a comprehensive collection of formulae describing parameters essential for constructing a PBPK model of a human mother and her fetus throughout the approximately 40 weeks of pregnancy and gestation. We included models describing blood flow rates through various fetal blood routes that have no counterparts in adults. Our repository of mathematical models for anatomical and physiological quantities of interest provides a basis for PBPK models of human pregnancy and gestation, and as such, it can ultimately be used to support decision-making with respect to optimal pharmacological dosing and risk assessment for pregnant women and their developing fetuses. The views expressed in this article are those of the authors and do not necessarily represent the views or policies of the U.S. Environmental Protection Agency.

17 citations


Journal ArticleDOI
TL;DR: This review of efforts at the US EPA and the National Toxicology Program to develop assays and models to predict estrogen, androgen and steroidogenesis activity shows these approaches have been shown to be robust enough to move from research to application in regulatory risk assessment.

15 citations


Journal ArticleDOI
TL;DR: The final model was enriched for biological descriptors that indicated xenobiotic metabolism gene expression, oxidative stress, and cytotoxicity, demonstrating the importance of accounting for kinetics and non-specific bioactivity in predicting systemic effect levels.
Abstract: In an effort to address a major challenge in chemical safety assessment, alternative approaches for characterizing systemic effect levels, a predictive model was developed. Systemic effect levels were curated from ToxRefDB, HESS-DB and COSMOS-DB from numerous study types totaling 4379 in vivo studies for 1247 chemicals. Observed systemic effects in mammalian models are a complex function of chemical dynamics, kinetics, and inter- and intra-individual variability. To address this complex problem, systemic effect levels were modeled at the study-level by leveraging study covariates (e.g., study type, strain, administration route) in addition to multiple descriptor sets, including chemical (ToxPrint, PaDEL, and Physchem), biological (ToxCast), and kinetic descriptors. Using random forest modeling with cross-validation and external validation procedures, study-level covariates alone accounted for approximately 15% of the variance reducing the root mean squared error (RMSE) from 0.96 log10 to 0.85 log10 mg/kg/day, providing a baseline performance metric (lower expectation of model performance). A consensus model developed using a combination of study-level covariates, chemical, biological, and kinetic descriptors explained a total of 43% of the variance with an RMSE of 0.69 log10 mg/kg/day. A benchmark model (upper expectation of model performance) was also developed with an RMSE of 0.5 log10 mg/kg/day by incorporating study-level covariates and the mean effect level per chemical. To achieve a representative chemical-level prediction, the minimum study-level predicted and observed effect level per chemical were compared reducing the RMSE from 1.0 to 0.73 log10 mg/kg/day, equivalent to 87% of predictions falling within an order-of-magnitude of the observed value. Although biological descriptors did not improve model performance, the final model was enriched for biological descriptors that indicated xenobiotic metabolism gene expression, oxidative stress, and cytotoxicity, demonstrating the importance of accounting for kinetics and non-specific bioactivity in predicting systemic effect levels. Herein, we generated an externally predictive model of systemic effect levels for use as a safety assessment tool and have generated forward predictions for over 30,000 chemicals.

Journal ArticleDOI
TL;DR: High-throughput screening data from the ToxCast program is utilized, coupled with chemical structural information, to generate chemical clusters using three similarity methods pertaining to nine MIEs within an AOP network for hepatic steatosis to illustrate how the AOP framework can support an iterative process whereby in vitro toxicity testing and chemical structure can be combined to improve toxicity predictions.

Journal ArticleDOI
TL;DR: A QSAR model based on custom distance metrics in the structure-activity space built on top of the k-nearest neighbor model, incorporating non-linearity not only in the chemical structure space, but also in the biological activity space, is explored.
Abstract: Quantitative structure-activity relationship (QSAR) models are important tools used in discovering new drug candidates and identifying potentially harmful environmental chemicals These models often face two fundamental challenges: limited amount of available biological activity data and noise or uncertainty in the activity data themselves To address these challenges, we introduce and explore a QSAR model based on custom distance metrics in the structure-activity space The model is built on top of the k-nearest neighbor model, incorporating non-linearity not only in the chemical structure space, but also in the biological activity space The model is tuned and evaluated using activity data for human estrogen receptor from the US EPA ToxCast and Tox21 databases The model closely trails the CERAPP consensus model (built on top of 48 individual human estrogen receptor activity models) in agonist activity predictions and consistently outperforms the CERAPP consensus model in antagonist activity predictions We suggest that incorporating non-linear distance metrics may significantly improve QSAR model performance when the available biological activity data are limited