scispace - formally typeset
Search or ask a question

Showing papers by "Philip E. Bourne published in 2023"


Posted ContentDOI
02 May 2023-bioRxiv
TL;DR: DeepUrfold as mentioned in this paper is a variational Bayesian approach to analyze protein structure relationships, which leverages its deep generative model's embeddings, which represent a distilled, lower-dimensional space of a given protein and its amalgamation of sequence, structure and biophysical properties.
Abstract: Our views of fold space implicitly rest upon many assumptions that impact how we analyze, interpret and understand biological systems—from protein structure comparison and classification to function prediction and evolutionary analyses. For instance, is there an optimal granularity at which to view protein structural similarities (e.g., architecture, topology or some other level)? If so, how does it vary with the type of question being asked? Similarly, the discrete/continuous dichotomy of fold space is central in structural bioinformatics, but remains unresolved. Discrete views of fold space bin ‘similar’ folds into distinct, non-overlapping groups; unfortunately, such binning may inherently miss many remote relationships. While hierarchical databases like CATH, SCOP and ECOD represent major steps forward in protein classification, a scalable, objective and conceptually flexible method, with less reliance on assumptions and heuristics, could enable a more systematic and nuanced exploration of fold space, particularly as regards evolutionarily-distant relationships. Building upon a recent ‘Urfold’ model of protein structure, we have developed a new approach to analyze protein structure relationships. Termed ‘DeepUrfold’, this method is rooted in deep generative modeling via variational Bayesian inference, and we find it to be useful for comparative analysis across the protein universe. Critically, DeepUrfold leverages its deep generative model’s embeddings, which represent a distilled, lower-dimensional space of a given protein and its amalgamation of sequence, structure and biophysical properties. Notably, DeepUrfold is structure-guided, versus being purely structure-based, and its architecture allows each trained model to learn protein features (structural and otherwise) that, in a sense, ‘define’ different superfamilies. Deploying DeepUrfold with CATH suggests a new, mostly-continuous view of fold space—a view that extends beyond simple 3D structural/geometric similarity, towards the realm of integrated sequence↔structure↔function properties. We find that such an approach can quantitatively represent and detect evolutionarily-remote relationships that evade existing methods. Availability Our detailed results can be explored at https://bournelab.org/research/DeepUrfold/; the DeepUrfold code is available at http://www.github.com/bouralab/DeepUrfold and data are available at https://doi.org/10.5281/zenodo.6916524.

3 citations


Posted ContentDOI
20 Mar 2023-bioRxiv
TL;DR: In this article , the authors investigate the design and discovery of macrocyclic kinase inhibitors (MKIs) starting from initial acyclic compounds by performing microsecond-scale atomistic simulations for multiple MKIs, constructing an MKI database, and analyzing MKIs using hierarchical cluster analysis.
Abstract: Macrocyclic kinase inhibitors (MKIs) are gaining attention due to their favorable selectivity and potential to overcome drug resistance, yet they remain challenging to design because of their novel structures. To facilitate the design and discovery of MKIs, we investigate MKI rational design starting from initial acyclic compounds by performing microsecond-scale atomistic simulations for multiple MKIs, constructing an MKI database, and analyzing MKIs using hierarchical cluster analysis. Our studies demonstrate that the binding modes of MKIs are like that of their corresponding acyclic counterparts against the same kinase targets. Importantly, within the respective binding sites, the MKI scaffolds retain the same conformations as their corresponding acyclic counterparts, demonstrating the rigidity of scaffolds before and after molecular cyclization. The MKI database includes 641 nanomole-level MKIs from 56 human kinases elucidating the features of rigid scaffolds, and the tendency of core structures among MKIs. Collectively these results and resources can facilitate MKI development.

Journal ArticleDOI
TL;DR: In this paper , a fold change visualization called mirrored axis distortion of fold change (MAD-FC) is proposed to demonstrate readability, proportionality, and symmetry of fold changes.
Abstract: We propose a fold change visualization that demonstrates a combination of properties from log and linear plots of fold change. A useful fold change visualization can exhibit: (1) readability, where fold change values are recoverable from datapoint position; (2) proportionality, where fold change values of the same direction are proportionally distant from the point of no change; (3) symmetry, where positive and negative fold changes are equidistant to the point of no change; and (4) high dynamic range, where datapoint values are discernable across orders of magnitude. A linear visualization has readability and partial proportionality but lacks high dynamic range and symmetry (because negative direction fold changes are bound between [0, 1] while positive are between [1, $\infty$]). Log plots of fold change have partial readability, high dynamic range, and symmetry, but lack proportionality because of the log transform. We outline a new transform and visualization, named mirrored axis distortion of fold change (MAD-FC), that extends a linear visualization of fold change data to exhibit readability, proportionality, and symmetry (but still has the limited dynamic range of linear plots). We illustrate the use of MAD-FC with biomedical data using various fold change charts. We argue that MAD-FC plots may be a more useful visualization than log or linear plots for applications that require a limited dynamic range (approximately $\pm$2 orders of magnitude or $\pm$8 units in log2 space).

Journal ArticleDOI
TL;DR: In this paper , a 3D ligand binding site enhanced sequence pre-training strategy was proposed to encode the evolutionary links between ligand-binding sites across gene families. And a new out-of-cluster meta-learning algorithm that extracts and accumulates information learned from predicting ligands of distinct gene families (meta-data) and applies the meta-data to a dark gene family.
Abstract: Systematically discovering protein-ligand interactions across the entire human and pathogen genomes is critical in chemical genomics, protein function prediction, drug discovery, and many other areas. However, more than 90% of gene families remain “dark”—i.e., their small-molecule ligands are undiscovered due to experimental limitations or human/historical biases. Existing computational approaches typically fail when the dark protein differs from those with known ligands. To address this challenge, we have developed a deep learning framework, called PortalCG, which consists of four novel components: (i) a 3-dimensional ligand binding site enhanced sequence pre-training strategy to encode the evolutionary links between ligand-binding sites across gene families; (ii) an end-to-end pretraining-fine-tuning strategy to reduce the impact of inaccuracy of predicted structures on function predictions by recognizing the sequence-structure-function paradigm; (iii) a new out-of-cluster meta-learning algorithm that extracts and accumulates information learned from predicting ligands of distinct gene families (meta-data) and applies the meta-data to a dark gene family; and (iv) a stress model selection step, using different gene families in the test data from those in the training and development data sets to facilitate model deployment in a real-world scenario. In extensive and rigorous benchmark experiments, PortalCG considerably outperformed state-of-the-art techniques of machine learning and protein-ligand docking when applied to dark gene families, and demonstrated its generalization power for target identifications and compound screenings under out-of-distribution (OOD) scenarios. Furthermore, in an external validation for the multi-target compound screening, the performance of PortalCG surpassed the rational design from medicinal chemists. Our results also suggest that a differentiable sequence-structure-function deep learning framework, where protein structural information serves as an intermediate layer, could be superior to conventional methodology where predicted protein structures were used for the compound screening. We applied PortalCG to two case studies to exemplify its potential in drug discovery: designing selective dual-antagonists of dopamine receptors for the treatment of opioid use disorder (OUD), and illuminating the understudied human genome for target diseases that do not yet have effective and safe therapeutics. Our results suggested that PortalCG is a viable solution to the OOD problem in exploring understudied regions of protein functional space.

Journal ArticleDOI
TL;DR: In this article , the features of KRAS binding pockets and ligand-binding characteristics of allosteric KRAS complexes using a structural systems pharmacology approach are provided. But ligand features are not discussed.
Abstract: KRAS, a common human oncogene, has been recognized as a critical drug target in treating multiple cancers. After four decades of effort, one allosteric KRAS drug (Sotorasib) has been approved, inspiring more KRAS-targeted drug research. Here, we provide the features of KRAS binding pockets and ligand-binding characteristics of KRAS complexes using a structural systems pharmacology approach. Three distinct binding sites (conserved nucleotide-binding site, shallow Switch-I/II pocket, and allosteric Switch-II/α3 pocket) are characterized. Ligand-binding features are determined based on encoded KRAS-inhibitor interaction fingerprints. Finally, the flexibility of the three distinct binding sites to accommodate different potential ligands, based on MD simulation, is discussed. Collectively, these findings are intended to facilitate rational KRAS drug design.

Posted ContentDOI
03 Feb 2023-bioRxiv
TL;DR: In this paper , the features of KRAS binding pockets and ligand-binding characteristics of allosteric KRAS complexes using a structural systems pharmacology approach are provided. And the flexibility of the three distinct binding sites to accommodate different potential ligands, based on MD simulation, is discussed.
Abstract: KRAS, a common human oncogene, has been recognized as a critical drug target in treating multiple cancers. After four decades of effort, one allosteric KRAS drug (Sotorasib) has been approved, inspiring more KRAS-targeted drug research. Here we provide the features of KRAS binding pockets and ligand-binding characteristics of KRAS complexes using a structural systems pharmacology approach. Three distinct binding sites (conserved nucleotide-binding site, shallow Switch-I/II pocket, and allosteric Switch-II/α3 pocket) are characterized. Ligand-binding features are determined based on encoded KRAS-inhibitor interaction fingerprints. Finally, the flexibility of the three distinct binding sites to accommodate different potential ligands, based on MD simulation, is discussed. Collectively, these findings are intended to facilitate rational KRAS drug design.

Journal ArticleDOI
TL;DR: In this article , a PLOS Computational Biology Methods (CLM) method is used to solve a set of computational problems in the context of computational biology methods, such as:
Abstract: This is a PLOS Computational Biology Methods paper

16 Mar 2023
TL;DR: In this article , the authors extend the use of contra plots to determine which results have evidence of negligible (near zero) effect size, which is important for eliminating alternative scientific explanations and identifying approximate independence between an intervention and the variable measured.
Abstract: Scientific experiments study interventions that show evidence of an effect size that is meaningfully large, negligibly small, or inconclusively broad. Previously, we proposed contra-analysis as a decision-making process to help determine which interventions have a meaningfully large effect by using contra plots to compare effect size across broadly related experiments. Here, we extend the use of contra plots to determine which results have evidence of negligible (near-zero) effect size. Determining if an effect size is negligible is important for eliminating alternative scientific explanations and identifying approximate independence between an intervention and the variable measured. We illustrate that contra plots can score negligible effect size across studies, inform the selection of a threshold for negligible effect based on broadly related results, and determine which results have evidence of negligible effect with a hypothesis test. No other data visualization can carry out all three of these tasks for analyzing negligible effect size. We demonstrate this analysis technique on real data from biomedical research. This new application of contra plots can differentiate statistically insignificant results with high strength (narrow and near-zero interval estimate of effect size) from those with low strength (broad interval estimate of effect size). Such a designation could help resolve the File Drawer problem in science, where statistically insignificant results are underreported because their interpretation is ambiguous and nonstandard. With our proposed procedure, results designated with negligible effect will be considered strong and publishable evidence of near-zero effect size.