scispace - formally typeset
Search or ask a question

Showing papers in "PLOS Computational Biology in 2019"


Journal ArticleDOI
TL;DR: A series of major new developments in the BEAST 2 core platform and model hierarchy that have occurred since the first release of the software, culminating in the recent 2.5 release are described.
Abstract: Elaboration of Bayesian phylogenetic inference methods has continued at pace in recent years with major new advances in nearly all aspects of the joint modelling of evolutionary data. It is increasingly appreciated that some evolutionary questions can only be adequately answered by combining evidence from multiple independent sources of data, including genome sequences, sampling dates, phenotypic data, radiocarbon dates, fossil occurrences, and biogeographic range information among others. Including all relevant data into a single joint model is very challenging both conceptually and computationally. Advanced computational software packages that allow robust development of compatible (sub-)models which can be composed into a full model hierarchy have played a key role in these developments. Developing such software frameworks is increasingly a major scientific activity in its own right, and comes with specific challenges, from practical software design, development and engineering challenges to statistical and conceptual modelling challenges. BEAST 2 is one such computational software platform, and was first announced over 4 years ago. Here we describe a series of major new developments in the BEAST 2 core platform and model hierarchy that have occurred since the first release of the software, culminating in the recent 2.5 release.

2,045 citations


Journal ArticleDOI
TL;DR: This work presents a novel open-source Hi-C scaffolder that does not require an a priori estimate of chromosome number and minimizes errors by scaffolding with the assistance of an assembly graph.
Abstract: Long-read sequencing and novel long-range assays have revolutionized de novo genome assembly by automating the reconstruction of reference-quality genomes. In particular, Hi-C sequencing is becoming an economical method for generating chromosome-scale scaffolds. Despite its increasing popularity, there are limited open-source tools available. Errors, particularly inversions and fusions across chromosomes, remain higher than alternate scaffolding technologies. We present a novel open-source Hi-C scaffolder that does not require an a priori estimate of chromosome number and minimizes errors by scaffolding with the assistance of an assembly graph. We demonstrate higher accuracy than the state-of-the-art methods across a variety of Hi-C library preparations and input assembly sizes. The Python and C++ code for our method is openly available at https://github.com/machinegun/SALSA.

391 citations


Journal ArticleDOI
TL;DR: New features and enhancements of TCGAbiolinks are introduced, including more accurate and flexible pipelines for differential expression analyses, different methods for tumor purity estimation and filtering, and integration of normal samples from other platforms iv) support for other genomics datasets, exemplified by the TARGET data.
Abstract: The advent of Next-Generation Sequencing (NGS) technologies has opened new perspectives in deciphering the genetic mechanisms underlying complex diseases. Nowadays, the amount of genomic data is massive and substantial efforts and new tools are required to unveil the information hidden in the data. The Genomic Data Commons (GDC) Data Portal is a platform that contains different genomic studies including the ones from The Cancer Genome Atlas (TCGA) and the Therapeutically Applicable Research to Generate Effective Treatments (TARGET) initiatives, accounting for more than 40 tumor types originating from nearly 30000 patients. Such platforms, although very attractive, must make sure the stored data are easily accessible and adequately harmonized. Moreover, they have the primary focus on the data storage in a unique place, and they do not provide a comprehensive toolkit for analyses and interpretation of the data. To fulfill this urgent need, comprehensive but easily accessible computational methods for integrative analyses of genomic data that do not renounce a robust statistical and theoretical framework are required. In this context, the R/Bioconductor package TCGAbiolinks was developed, offering a variety of bioinformatics functionalities. Here we introduce new features and enhancements of TCGAbiolinks in terms of i) more accurate and flexible pipelines for differential expression analyses, ii) different methods for tumor purity estimation and filtering, iii) integration of normal samples from other platforms iv) support for other genomics datasets, exemplified here by the TARGET data. Evidence has shown that accounting for tumor purity is essential in the study of tumorigenesis, as these factors promote confounding behavior regarding differential expression analysis. With this in mind, we implemented these filtering procedures in TCGAbiolinks. Moreover, a limitation of some of the TCGA datasets is the unavailability or paucity of corresponding normal samples. We thus integrated into TCGAbiolinks the possibility to use normal samples from the Genotype-Tissue Expression (GTEx) project, which is another large-scale repository cataloging gene expression from healthy individuals. The new functionalities are available in the TCGAbiolinks version 2.8 and higher released in Bioconductor version 3.7.

265 citations


Journal ArticleDOI
TL;DR: DeepConv-DTI as mentioned in this paper proposes a deep learning based DTI prediction model capturing local residue patterns of proteins participating in DTIs, which can detect binding sites of proteins for DTIs.
Abstract: Identification of drug-target interactions (DTIs) plays a key role in drug discovery. The high cost and labor-intensive nature of in vitro and in vivo experiments have highlighted the importance of in silico-based DTI prediction approaches. In several computational models, conventional protein descriptors have been shown to not be sufficiently informative to predict accurate DTIs. Thus, in this study, we propose a deep learning based DTI prediction model capturing local residue patterns of proteins participating in DTIs. When we employ a convolutional neural network (CNN) on raw protein sequences, we perform convolution on various lengths of amino acids subsequences to capture local residue patterns of generalized protein classes. We train our model with large-scale DTI information and demonstrate the performance of the proposed model using an independent dataset that is not seen during the training phase. As a result, our model performs better than previous protein descriptor-based models. Also, our model performs better than the recently developed deep learning models for massive prediction of DTIs. By examining pooled convolution results, we confirmed that our model can detect binding sites of proteins for DTIs. In conclusion, our prediction model for detecting local residue patterns of target proteins successfully enriches the protein features of a raw protein sequence, yielding better prediction results than previous approaches. Our code is available at https://github.com/GIST-CSBL/DeepConv-DTI.

253 citations


Journal ArticleDOI
TL;DR: It is shown that dense phases are best described as droplet-spanning networks that are characterized by reversible physical crosslinks among multivalent proteins, and the concept of saturation concentration thresholds does not apply to multicomponent systems where obligate heterotypic interactions drive phase transitions.
Abstract: Many biomolecular condensates form via spontaneous phase transitions that are driven by multivalent proteins. These molecules are biological instantiations of associative polymers that conform to a so-called stickers-and-spacers architecture. The stickers are protein-protein or protein-RNA interaction motifs and / or domains that can form reversible, non-covalent crosslinks with one another. Spacers are interspersed between stickers and their preferential interactions with solvent molecules determine the cooperativity of phase transitions. Here, we report the development of an open source computational engine known as LASSI (LAttice simulation engine for Sticker and Spacer Interactions) that enables the calculation of full phase diagrams for multicomponent systems comprising of coarse-grained representations of multivalent proteins. LASSI is designed to enable computationally efficient phenomenological modeling of spontaneous phase transitions of multicomponent mixtures comprising of multivalent proteins and RNA molecules. We demonstrate the application of LASSI using simulations of linear and branched multivalent proteins. We show that dense phases are best described as droplet-spanning networks that are characterized by reversible physical crosslinks among multivalent proteins. We connect recent observations regarding correlations between apparent stoichiometry and dwell times of condensates to being proxies for the internal structural organization, specifically the convolution of internal density and extent of networking, within condensates. Finally, we demonstrate that the concept of saturation concentration thresholds does not apply to multicomponent systems where obligate heterotypic interactions drive phase transitions. This emerges from the ellipsoidal structures of phase diagrams for multicomponent systems and it has direct implications for the regulation of biomolecular condensates in vivo.

240 citations


Journal ArticleDOI
TL;DR: Multi-layer convolutional neural networks (CNNs) set the new state of the art for predicting neural responses to natural images in primate V1 and deep features learned for object recognition are better explanations for V1 computation than all previous filter bank theories.
Abstract: Despite great efforts over several decades, our best models of primary visual cortex (V1) still predict spiking activity quite poorly when probed with natural stimuli, highlighting our limited understanding of the nonlinear computations in V1. Recently, two approaches based on deep learning have emerged for modeling these nonlinear computations: transfer learning from artificial neural networks trained on object recognition and data-driven convolutional neural network models trained end-to-end on large populations of neurons. Here, we test the ability of both approaches to predict spiking activity in response to natural images in V1 of awake monkeys. We found that the transfer learning approach performed similarly well to the data-driven approach and both outperformed classical linear-nonlinear and wavelet-based feature representations that build on existing theories of V1. Notably, transfer learning using a pre-trained feature space required substantially less experimental time to achieve the same performance. In conclusion, multi-layer convolutional neural networks (CNNs) set the new state of the art for predicting neural responses to natural images in primate V1 and deep features learned for object recognition are better explanations for V1 computation than all previous filter bank theories. This finding strengthens the necessity of V1 models that are multiple nonlinearities away from the image domain and it supports the idea of explaining early visual cortex based on high-level functional goals.

233 citations


Journal ArticleDOI
TL;DR: How machine learning and constraint-based modeling can be combined is described, reviewing recent works at the intersection of both domains and discussing the mathematical and practical aspects involved, as well as overlapping systematic classifications from both frameworks.
Abstract: Omic data analysis is steadily growing as a driver of basic and applied molecular biology research. Core to the interpretation of complex and heterogeneous biological phenotypes are computational approaches in the fields of statistics and machine learning. In parallel, constraint-based metabolic modeling has established itself as the main tool to investigate large-scale relationships between genotype, phenotype, and environment. The development and application of these methodological frameworks have occurred independently for the most part, whereas the potential of their integration for biological, biomedical, and biotechnological research is less known. Here, we describe how machine learning and constraint-based modeling can be combined, reviewing recent works at the intersection of both domains and discussing the mathematical and practical aspects involved. We overlap systematic classifications from both frameworks, making them accessible to nonexperts. Finally, we delineate potential future scenarios, propose new joint theoretical frameworks, and suggest concrete points of investigation for this joint subfield. A multiview approach merging experimental and knowledge-driven omic data through machine learning methods can incorporate key mechanistic information in an otherwise biologically-agnostic learning process.

178 citations


Journal ArticleDOI
TL;DR: A set of useful guidelines for practitioners specifying how to correctly perform dimensionality reduction, interpret its output, and communicate results are presented.
Abstract: Dimensionality reduction (DR) is frequently applied during the analysis of high-dimensional data. Both a means of denoising and simplification, it can be beneficial for the majority of modern biological datasets, in which it’s not uncommon to have hundreds or even millions of simultaneous measurements collected for a single sample. Because of “the curse of dimensionality,” many statistical methods lack power when applied to high-dimensional data. Even if the number of collected data points is large, they remain sparsely submerged in a voluminous high-dimensional space that is practically impossible to explore exhaustively (see chapter 12 [1]). By reducing the dimensionality of the data, you can often alleviate this challenging and troublesome phenomenon. Low-dimensional data representations that remove noise but retain the signal of interest can be instrumental in understanding hidden structures and patterns. Original high-dimensional data often contain measurements on uninformative or redundant variables. DR can be viewed as a method for latent feature extraction. It is also frequently used for data compression, exploration, and visualization. Although many DR techniques have been developed and implemented in standard data analytic pipelines, they are easy to misuse, and their results are often misinterpreted in practice. This article presents a set of useful guidelines for practitioners specifying how to correctly perform DR, interpret its output, and communicate results. Note that this is not a review article, and we recommend some important reviews in the references.

169 citations


Journal ArticleDOI
TL;DR: A novel image reconstruction method, in which the pixel values of an image are optimized to make its DNN features similar to those decoded from human brain activity at multiple layers, which suggests that the method can effectively combine hierarchical neural representations to reconstruct perceptual and subjective images.
Abstract: The mental contents of perception and imagery are thought to be encoded in hierarchical representations in the brain, but previous attempts to visualize perceptual contents have failed to capitalize on multiple levels of the hierarchy, leaving it challenging to reconstruct internal imagery. Recent work showed that visual cortical activity measured by functional magnetic resonance imaging (fMRI) can be decoded (translated) into the hierarchical features of a pre-trained deep neural network (DNN) for the same input image, providing a way to make use of the information from hierarchical visual features. Here, we present a novel image reconstruction method, in which the pixel values of an image are optimized to make its DNN features similar to those decoded from human brain activity at multiple layers. We found that our method was able to reliably produce reconstructions that resembled the viewed natural images. A natural image prior introduced by a deep generator neural network effectively rendered semantically meaningful details to the reconstructions. Human judgment of the reconstructions supported the effectiveness of combining multiple DNN layers to enhance the visual quality of generated images. While our model was solely trained with natural images, it successfully generalized to artificial shapes, indicating that our model was not simply matching to exemplars. The same analysis applied to mental imagery demonstrated rudimentary reconstructions of the subjective content. Our results suggest that our method can effectively combine hierarchical neural representations to reconstruct perceptual and subjective images, providing a new window into the internal contents of the brain.

160 citations


Journal ArticleDOI
TL;DR: A novel computational method named Ensemble of Decision Tree based MiRNA-Disease Association prediction (EDTMDA) is proposed, which innovatively built a computational framework integrating ensemble learning and dimensionality reduction.
Abstract: In recent years, increasing associations between microRNAs (miRNAs) and human diseases have been identified. Based on accumulating biological data, many computational models for potential miRNA-disease associations inference have been developed, which saves time and expenditure on experimental studies, making great contributions to researching molecular mechanism of human diseases and developing new drugs for disease treatment. In this paper, we proposed a novel computational method named Ensemble of Decision Tree based MiRNA-Disease Association prediction (EDTMDA), which innovatively built a computational framework integrating ensemble learning and dimensionality reduction. For each miRNA-disease pair, the feature vector was extracted by calculating the statistical measures, graph theoretical measures, and matrix factorization results for the miRNA and disease, respectively. Then multiple base learnings were built to yield many decision trees (DTs) based on random selection of negative samples and miRNA/disease features. Particularly, Principal Components Analysis was applied to each base learning to reduce feature dimensionality and hence remove the noise or redundancy. Average strategy was adopted for these DTs to get final association scores between miRNAs and diseases. In model performance evaluation, EDTMDA showed AUC of 0.9309 in global leave-one-out cross validation (LOOCV) and AUC of 0.8524 in local LOOCV. Additionally, AUC of 0.9192+/-0.0009 in 5-fold cross validation proved the model’s reliability and stability. Furthermore, three types of case studies for four human diseases were implemented. As a result, 94% (Esophageal Neoplasms), 86% (Kidney Neoplasms), 96% (Breast Neoplasms) and 88% (Carcinoma Hepatocellular) of top 50 predicted miRNAs were confirmed by experimental evidences in literature.

151 citations


Journal ArticleDOI
TL;DR: Apollo as discussed by the authors is an open source software package that enables researchers to efficiently inspect and refine the precise structure and role of genomic features in a graphical browser-based platform, allowing distributed users to simultaneously edit the same encoded features while also instantly seeing the updates made by other researchers on the same region.
Abstract: Genome annotation is the process of identifying the location and function of a genome's encoded features. Improving the biological accuracy of annotation is a complex and iterative process requiring researchers to review and incorporate multiple sources of information such as transcriptome alignments, predictive models based on sequence profiles, and comparisons to features found in related organisms. Because rapidly decreasing costs are enabling an ever-growing number of scientists to incorporate sequencing as a routine laboratory technique, there is widespread demand for tools that can assist in the deliberative analytical review of genomic information. To this end, we present Apollo, an open source software package that enables researchers to efficiently inspect and refine the precise structure and role of genomic features in a graphical browser-based platform. Some of Apollo's newer user interface features include support for real-time collaboration, allowing distributed users to simultaneously edit the same encoded features while also instantly seeing the updates made by other researchers on the same region in a manner similar to Google Docs. Its technical architecture enables Apollo to be integrated into multiple existing genomic analysis pipelines and heterogeneous laboratory workflow platforms. Finally, we consider the implications that Apollo and related applications may have on how the results of genome research are published and made accessible.

Journal ArticleDOI
TL;DR: This project shows that collaborative efforts between research teams to develop ensemble forecasting approaches can bring measurable improvements in forecast accuracy and important reductions in the variability of performance from year to year.
Abstract: Seasonal influenza results in substantial annual morbidity and mortality in the United States and worldwide. Accurate forecasts of key features of influenza epidemics, such as the timing and severity of the peak incidence in a given season, can inform public health response to outbreaks. As part of ongoing efforts to incorporate data and advanced analytical methods into public health decision-making, the United States Centers for Disease Control and Prevention (CDC) has organized seasonal influenza forecasting challenges since the 2013/2014 season. In the 2017/2018 season, 22 teams participated. A subset of four teams created a research consortium called the FluSight Network in early 2017. During the 2017/2018 season they worked together to produce a collaborative multi-model ensemble that combined 21 separate component models into a single model using a machine learning technique called stacking. This approach creates a weighted average of predictive densities where the weight for each component is determined by maximizing overall ensemble accuracy over past seasons. In the 2017/2018 influenza season, one of the largest seasonal outbreaks in the last 15 years, this multi-model ensemble performed better on average than all individual component models and placed second overall in the CDC challenge. It also outperformed the baseline multi-model ensemble created by the CDC that took a simple average of all models submitted to the forecasting challenge. This project shows that collaborative efforts between research teams to develop ensemble forecasting approaches can bring measurable improvements in forecast accuracy and important reductions in the variability of performance from year to year. Efforts such as this, that emphasize real-time testing and evaluation of forecasting models and facilitate the close collaboration between public health officials and modeling researchers, are essential to improving our understanding of how best to use forecasts to improve public health response to seasonal and emerging epidemic threats.

Journal ArticleDOI
TL;DR: A new model of Logistic Model Tree for predicting miRNA-Disease Association (LMTRDA) is proposed by fusing multi-source information including miRNA sequences, miRNA functional similarity, disease semantic similarity, and known mi RNA-disease associations by introducing miRNA sequence information and extract its features using natural language processing technique for the first time in the miRNA and disease prediction model.
Abstract: Emerging evidence has shown microRNAs (miRNAs) play an important role in human disease research. Identifying potential association among them is significant for the development of pathology, diagnose and therapy. However, only a tiny portion of all miRNA-disease pairs in the current datasets are experimentally validated. This prompts the development of high-precision computational methods to predict real interaction pairs. In this paper, we propose a new model of Logistic Model Tree for predicting miRNA-Disease Association (LMTRDA) by fusing multi-source information including miRNA sequences, miRNA functional similarity, disease semantic similarity, and known miRNA-disease associations. In particular, we introduce miRNA sequence information and extract its features using natural language processing technique for the first time in the miRNA-disease prediction model. In the cross-validation experiment, LMTRDA obtained 90.51% prediction accuracy with 92.55% sensitivity at the AUC of 90.54% on the HMDD V3.0 dataset. To further evaluate the performance of LMTRDA, we compared it with different classifier and feature descriptor models. In addition, we also validate the predictive ability of LMTRDA in human diseases including Breast Neoplasms, Breast Neoplasms and Lymphoma. As a result, 28, 27 and 26 out of the top 30 miRNAs associated with these diseases were verified by experiments in different kinds of case studies. These experimental results demonstrate that LMTRDA is a reliable model for predicting the association among miRNAs and diseases.

Journal ArticleDOI
TL;DR: Analysis of the model’s energy function uncovers distinct mechanisms for chromatin folding at various length scales and suggests a need to go beyond simple A/B compartment types to predict specific contacts between regulatory elements using polymer simulations.
Abstract: We introduce a computational model to simulate chromatin structure and dynamics. Starting from one-dimensional genomics and epigenomics data that are available for hundreds of cell types, this model enables de novo prediction of chromatin structures at five-kilo-base resolution. Simulated chromatin structures recapitulate known features of genome organization, including the formation of chromatin loops, topologically associating domains (TADs) and compartments, and are in quantitative agreement with chromosome conformation capture experiments and super-resolution microscopy measurements. Detailed characterization of the predicted structural ensemble reveals the dynamical flexibility of chromatin loops and the presence of cross-talk among neighboring TADs. Analysis of the model’s energy function uncovers distinct mechanisms for chromatin folding at various length scales and suggests a need to go beyond simple A/B compartment types to predict specific contacts between regulatory elements using polymer simulations.

Journal ArticleDOI
TL;DR: Both parameter recovery and the stability of model-based estimates were poor but improved substantially when both choice and RT were used (compared to choice only), and when more trials were included in the analysis.
Abstract: A well-established notion in cognitive neuroscience proposes that multiple brain systems contribute to choice behaviour. These include: (1) a model-free system that uses values cached from the outcome history of alternative actions, and (2) a model-based system that considers action outcomes and the transition structure of the environment. The widespread use of this distinction, across a range of applications, renders it important to index their distinct influences with high reliability. Here we consider the two-stage task, widely considered as a gold standard measure for the contribution of model-based and model-free systems to human choice. We tested the internal/temporal stability of measures from this task, including those estimated via an established computational model, as well as an extended model using drift-diffusion. Drift-diffusion modeling suggested that both choice in the first stage, and RTs in the second stage, are directly affected by a model-based/free trade-off parameter. Both parameter recovery and the stability of model-based estimates were poor but improved substantially when both choice and RT were used (compared to choice only), and when more trials (than conventionally used in research practice) were included in our analysis. The findings have implications for interpretation of past and future studies based on the use of the two-stage task, as well as for characterising the contribution of model-based processes to choice behaviour.

Journal ArticleDOI
TL;DR: In this paper, the authors introduce a stochastic model for network communication that combines local and global information about the network topology to generate biased random walks on the network and investigate the effects of varying the global information bias on the communication cost.
Abstract: Communication of signals among nodes in a complex network poses fundamental problems of efficiency and cost. Routing of messages along shortest paths requires global information about the topology, while spreading by diffusion, which operates according to local topological features, is informationally "cheap" but inefficient. We introduce a stochastic model for network communication that combines local and global information about the network topology to generate biased random walks on the network. The model generates a continuous spectrum of dynamics that converge onto shortest-path and random-walk (diffusion) communication processes at the limiting extremes. We implement the model on two cohorts of human connectome networks and investigate the effects of varying the global information bias on the network's communication cost. We identify routing strategies that approach a (highly efficient) shortest-path communication process with a relatively small global information bias on the system's dynamics. Moreover, we show that the cost of routing messages from and to hub nodes varies as a function of the global information bias driving the system's dynamics. Finally, we implement the model to identify individual subject differences from a communication dynamics point of view. The present framework departs from the classical shortest paths vs. diffusion dichotomy, unifying both models under a single family of dynamical processes that differ by the extent to which global information about the network topology influences the routing patterns of neural signals traversing the network.

Journal ArticleDOI
TL;DR: This work shows that complex microbial communities generically exhibit a transition as a function of available energy fluxes from a “resource-limited” regime where community structure and stability is shaped by energetic and metabolic considerations to a diverse regime where the dominant force shaping microbial communities is the overlap between species’ consumption preferences.
Abstract: A fundamental goal of microbial ecology is to understand what determines the diversity, stability, and structure of microbial ecosystems. The microbial context poses special conceptual challenges because of the strong mutual influences between the microbes and their chemical environment through the consumption and production of metabolites. By analyzing a generalized consumer resource model that explicitly includes cross-feeding, stochastic colonization, and thermodynamics, we show that complex microbial communities generically exhibit a transition as a function of available energy fluxes from a "resource-limited" regime where community structure and stability is shaped by energetic and metabolic considerations to a diverse regime where the dominant force shaping microbial communities is the overlap between species' consumption preferences. These two regimes have distinct species abundance patterns, different functional profiles, and respond differently to environmental perturbations. Our model reproduces large-scale ecological patterns observed across multiple experimental settings such as nestedness and differential beta diversity patterns along energy gradients. We discuss the experimental implications of our results and possible connections with disorder-induced phase transitions in statistical physics.

Journal ArticleDOI
TL;DR: A planar model containing 9 degrees of freedom and 18 musculotendon actuators to walk was trained using a custom optimization framework and was robust to all deficits, finding a stable gait in all cases.
Abstract: Deficits in the ankle plantarflexor muscles, such as weakness and contracture, occur commonly in conditions such as cerebral palsy, stroke, muscular dystrophy, Charcot-Marie-Tooth disease, and sarcopenia. While these deficits likely contribute to observed gait pathologies, determining cause-effect relationships is difficult due to the often co-occurring biomechanical and neural deficits. To elucidate the effects of weakness and contracture, we systematically introduced isolated deficits into a musculoskeletal model and generated simulations of walking to predict gait adaptations due to these deficits. We trained a planar model containing 9 degrees of freedom and 18 musculotendon actuators to walk using a custom optimization framework through which we imposed simple objectives, such as minimizing cost of transport while avoiding falling and injury, and maintaining head stability. We first generated gaits at prescribed speeds between 0.50 m/s and 2.00 m/s that reproduced experimentally observed kinematic, kinetic, and metabolic trends for walking. We then generated a gait at self-selected walking speed; quantitative comparisons between our simulation and experimental data for joint angles, joint moments, and ground reaction forces showed root-mean-squared errors of less than 1.6 standard deviations and normalized cross-correlations above 0.8 except for knee joint moment trajectories. Finally, we applied mild, moderate, and severe levels of muscle weakness or contracture to either the soleus (SOL) or gastrocnemius (GAS) or both of these major plantarflexors (PF) and retrained the model to walk at a self-selected speed. The model was robust to all deficits, finding a stable gait in all cases. Severe PF weakness caused the model to adopt a slower, "heel-walking" gait. Severe contracture of only SOL or both PF yielded similar results: the model adopted a "toe-walking" gait with excessive hip and knee flexion during stance. These results highlight how plantarflexor weakness and contracture may contribute to observed gait patterns.

Journal ArticleDOI
TL;DR: It is shown that the cross design coupled with the CSS sensitivity and S synergy scoring methods may provide a robust and accurate characterization of both drug combination sensitivity and synergy levels, with minimal experimental materials required.
Abstract: High-throughput drug screening has facilitated the discovery of drug combinations in cancer. Many existing studies adopted a full matrix design, aiming for the characterization of drug pair effects for cancer cells. However, the full matrix design may be suboptimal as it requires a drug pair to be combined at multiple concentrations in a full factorial manner. Furthermore, many of the computational tools assess only the synergy but not the sensitivity of drug combinations, which might lead to false positive discoveries. We proposed a novel cross design to enable a more cost-effective and simultaneous testing of drug combination sensitivity and synergy. We developed a drug combination sensitivity score (CSS) to determine the sensitivity of a drug pair, and showed that the CSS is highly reproducible between the replicates and thus supported its usage as a robust metric. We further showed that CSS can be predicted using machine learning approaches which determined the top pharmaco-features to cluster cancer cell lines based on their drug combination sensitivity profiles. To assess the degree of drug interactions using the cross design, we developed an S synergy score based on the difference between the drug combination and the single drug dose-response curves. We showed that the S score is able to detect true synergistic and antagonistic drug combinations at an accuracy level comparable to that using the full matrix design. Taken together, we showed that the cross design coupled with the CSS sensitivity and S synergy scoring methods may provide a robust and accurate characterization of both drug combination sensitivity and synergy levels, with minimal experimental materials required. Our experimental-computational approach could be utilized as an efficient pipeline for improving the discovery rate in high-throughput drug combination screening, particularly for primary patient samples which are difficult to obtain.

Journal ArticleDOI
TL;DR: Among the many thematic overviews of precision oncology, this review innovates by further comprehensively including precision pharmacology, and within this framework, articulating its protein structural landscape and consequences to cellular signaling pathways.
Abstract: At the root of the so-called precision medicine or precision oncology, which is our focus here, is the hypothesis that cancer treatment would be considerably better if therapies were guided by a tumor’s genomic alterations. This hypothesis has sparked major initiatives focusing on whole-genome and/or exome sequencing, creation of large databases, and developing tools for their statistical analyses—all aspiring to identify actionable alterations, and thus molecular targets, in a patient. At the center of the massive amount of collected sequence data is their interpretations that largely rest on statistical analysis and phenotypic observations. Statistics is vital, because it guides identification of cancer-driving alterations. However, statistics of mutations do not identify a change in protein conformation; therefore, it may not define sufficiently accurate actionable mutations, neglecting those that are rare. Among the many thematic overviews of precision oncology, this review innovates by further comprehensively including precision pharmacology, and within this framework, articulating its protein structural landscape and consequences to cellular signaling pathways. It provides the underlying physicochemical basis, thereby also opening the door to a broader community.

Journal ArticleDOI
TL;DR: This research presents a novel and scalable approach called “Smart Towns” to solve the challenge of integrating bioinformatics and data science into the design and engineering of smart devices.
Abstract: 1 Design Lab, UC San Diego, La Jolla, California, United States of America, 2 Center for Computational Biology and Bioinformatics, UC San Diego, La Jolla, California, United States of America, 3 Department of Pediatrics, UC San Diego, La Jolla, California, United States of America, 4 Data Science Hub, San Diego Supercomputer Center, UC San Diego, La Jolla, California, United States of America, 5 Departments of Bioengineering, and Computer Science and Engineering, and Center for Microbiome Innovation, UC San Diego, La Jolla, California, United States of America, 6 Bioinformatics and Systems Biology Graduate Program, UC San Diego, La Jolla, California, United States of America, 7 Department of Statistics and Berkeley Institute for Data Science, UC Berkeley, and Lawrence Berkeley National Laboratory, Berkeley, California, United States of America

Journal ArticleDOI
TL;DR: This work proves mathematically that the resulting optimal metabolic flux distribution is described by a limited number of subnetworks, known as Elementary Flux Modes (EFMs), and finds that the maximal number of flux-carrying EFMs is determined only by the number of imposed constraints on enzyme expression, not by the size, kinetics or topology of the network.
Abstract: Growth rate is a near-universal selective pressure across microbial species. High growth rates require hundreds of metabolic enzymes, each with different nonlinear kinetics, to be precisely tuned within the bounds set by physicochemical constraints. Yet, the metabolic behaviour of many species is characterized by simple relations between growth rate, enzyme expression levels and metabolic rates. We asked if this simplicity could be the outcome of optimisation by evolution. Indeed, when the growth rate is maximized-in a static environment under mass-conservation and enzyme expression constraints-we prove mathematically that the resulting optimal metabolic flux distribution is described by a limited number of subnetworks, known as Elementary Flux Modes (EFMs). We show that, because EFMs are the minimal subnetworks leading to growth, a small active number automatically leads to the simple relations that are measured. We find that the maximal number of flux-carrying EFMs is determined only by the number of imposed constraints on enzyme expression, not by the size, kinetics or topology of the network. This minimal-EFM extremum principle is illustrated in a graphical framework, which explains qualitative changes in microbial behaviours, such as overflow metabolism and co-consumption, and provides a method for identification of the enzyme expression constraints that limit growth under the prevalent conditions. The extremum principle applies to all microorganisms that are selected for maximal growth rates under protein concentration constraints, for example the solvent capacities of cytosol, membrane or periplasmic space.

Journal ArticleDOI
TL;DR: In this article, patient-specific brain network models of 15 drug-resistant epilepsy patients with implanted stereotactic electroencephalography (SEEG) electrodes were derived from structural data of magnetic resonance imaging (MRI) and diffusion tensor weighted imaging (DTI).
Abstract: Information transmission in the human brain is a fundamentally dynamic network process. In partial epilepsy, this process is perturbed and highly synchronous seizures originate in a local network, the so-called epileptogenic zone (EZ), before recruiting other close or distant brain regions. We studied patient-specific brain network models of 15 drug-resistant epilepsy patients with implanted stereotactic electroencephalography (SEEG) electrodes. Each personalized brain model was derived from structural data of magnetic resonance imaging (MRI) and diffusion tensor weighted imaging (DTI), comprising 88 nodes equipped with region specific neural mass models capable of demonstrating a range of epileptiform discharges. Each patient’s virtual brain was further personalized through the integration of the clinically hypothesized EZ. Subsequent simulations and connectivity modulations were performed and uncovered a finite repertoire of seizure propagation patterns. Across patients, we found that (i) patient-specific network connectivity is predictive for the subsequent seizure propagation pattern; (ii) seizure propagation is characterized by a systematic sequence of brain states; (iii) propagation can be controlled by an optimal intervention on the connectivity matrix; (iv) the degree of invasiveness can be significantly reduced via the proposed seizure control as compared to traditional resective surgery. To stop seizures, neurosurgeons typically resect the EZ completely. We showed that stability analysis of the network dynamics, employing structural and dynamical information, estimates reliably the spatiotemporal properties of seizure propagation. This suggests novel less invasive paradigms of surgical interventions to treat and manage partial epilepsy.

Journal ArticleDOI
TL;DR: An evaluation approach that disentangles different components of forecasting ability using metrics that separately assess the calibration, sharpness and bias of forecasts is proposed, which suggests that forecasts may have been of good enough quality to inform decision making based on predictions a few weeks ahead of time but not longer.
Abstract: Real-time forecasts based on mathematical models can inform critical decision-making during infectious disease outbreaks. Yet, epidemic forecasts are rarely evaluated during or after the event, and there is little guidance on the best metrics for assessment. Here, we propose an evaluation approach that disentangles different components of forecasting ability using metrics that separately assess the calibration, sharpness and bias of forecasts. This makes it possible to assess not just how close a forecast was to reality but also how well uncertainty has been quantified. We used this approach to analyse the performance of weekly forecasts we generated in real time for Western Area, Sierra Leone, during the 2013-16 Ebola epidemic in West Africa. We investigated a range of forecast model variants based on the model fits generated at the time with a semi-mechanistic model, and found that good probabilistic calibration was achievable at short time horizons of one or two weeks ahead but model predictions were increasingly unreliable at longer forecasting horizons. This suggests that forecasts may have been of good enough quality to inform decision making based on predictions a few weeks ahead of time but not longer, reflecting the high level of uncertainty in the processes driving the trajectory of the epidemic. Comparing forecasts based on the semi-mechanistic model to simpler null models showed that the best semi-mechanistic model variant performed better than the null models with respect to probabilistic calibration, and that this would have been identified from the earliest stages of the outbreak. As forecasts become a routine part of the toolkit in public health, standards for evaluation of performance will be important for assessing quality and improving credibility of mathematical models, and for elucidating difficulties and trade-offs when aiming to make the most useful and reliable forecasts.

Journal ArticleDOI
TL;DR: A computational method that adopts a zero-truncated Poisson regression framework to explicitly remove systematic biases in the PLAC-seq and HiChIP datasets, and then uses the normalized chromatin contact frequencies to identify significant chromatin interactions anchored at genomic regions bound by the protein of interest.
Abstract: Hi-C and chromatin immunoprecipitation (ChIP) have been combined to identify long-range chromatin interactions genome-wide at reduced cost and enhanced resolution, but extracting information from the resulting datasets has been challenging. Here we describe a computational method, MAPS, Model-based Analysis of PLAC-seq and HiChIP, to process the data from such experiments and identify long-range chromatin interactions. MAPS adopts a zero-truncated Poisson regression framework to explicitly remove systematic biases in the PLAC-seq and HiChIP datasets, and then uses the normalized chromatin contact frequencies to identify significant chromatin interactions anchored at genomic regions bound by the protein of interest. MAPS shows superior performance over existing software tools in the analysis of chromatin interactions from multiple PLAC-seq and HiChIP datasets centered on different transcriptional factors and histone marks. MAPS is freely available at https://github.com/ijuric/MAPS.

Journal ArticleDOI
TL;DR: The integration of single-cell RNA-seq profiles of cells derived from lung adenocarcinoma and breast cancer patients into a multi-scale stoichiometric model of a cancer cell population reduces the space of feasible single- cell fluxomes and points out the possible metabolic interactions among cells via exchange of metabolites.
Abstract: Metabolic reprogramming is a general feature of cancer cells. Regrettably, the comprehensive quantification of metabolites in biological specimens does not promptly translate into knowledge on the utilization of metabolic pathways. By estimating fluxes across metabolic pathways, computational models hold the promise to bridge this gap between data and biological functionality. These models currently portray the average behavior of cell populations however, masking the inherent heterogeneity that is part and parcel of tumorigenesis as much as drug resistance. To remove this limitation, we propose single-cell Flux Balance Analysis (scFBA) as a computational framework to translate single-cell transcriptomes into single-cell fluxomes. We show that the integration of single-cell RNA-seq profiles of cells derived from lung adenocarcinoma and breast cancer patients into a multi-scale stoichiometric model of a cancer cell population: significantly 1) reduces the space of feasible single-cell fluxomes; 2) allows to identify clusters of cells with different growth rates within the population; 3) points out the possible metabolic interactions among cells via exchange of metabolites. The scFBA suite of MATLAB functions is available at https://github.com/BIMIB-DISCo/scFBA, as well as the case study datasets.

Journal ArticleDOI
TL;DR: DeepDrug3D is presented, a new approach to characterize and classify binding pockets in proteins with deep learning that employs a state-of-the-art convolutional neural network in which biomolecular structures are represented as voxels assigned interaction energy-based attributes.
Abstract: Comprehensive characterization of ligand-binding sites is invaluable to infer molecular functions of hypothetical proteins, trace evolutionary relationships between proteins, engineer enzymes to achieve a desired substrate specificity, and develop drugs with improved selectivity profiles. These research efforts pose significant challenges owing to the fact that similar pockets are commonly observed across different folds, leading to the high degree of promiscuity of ligand-protein interactions at the system-level. On that account, novel algorithms to accurately classify binding sites are needed. Deep learning is attracting a significant attention due to its successful applications in a wide range of disciplines. In this communication, we present DeepDrug3D, a new approach to characterize and classify binding pockets in proteins with deep learning. It employs a state-of-the-art convolutional neural network in which biomolecular structures are represented as voxels assigned interaction energy-based attributes. The current implementation of DeepDrug3D, trained to detect and classify nucleotide- and heme-binding sites, not only achieves a high accuracy of 95%, but also has the ability to generalize to unseen data as demonstrated for steroid-binding proteins and peptidase enzymes. Interestingly, the analysis of strongly discriminative regions of binding pockets reveals that this high classification accuracy arises from learning the patterns of specific molecular interactions, such as hydrogen bonds, aromatic and hydrophobic contacts. DeepDrug3D is available as an open-source program at https://github.com/pulimeng/DeepDrug3D with the accompanying TOUGH-C1 benchmarking dataset accessible from https://osf.io/enz69/.

Journal ArticleDOI
TL;DR: While these tools had excellent performance, the poorest method predicted more than one third of the benign variants to be disease-causing, which allows choosing reliable methods for benign variant interpretation, for both research and clinical purposes, as well as provide a benchmark for method developers.
Abstract: Computational tools are widely used for interpreting variants detected in sequencing projects. The choice of these tools is critical for reliable variant impact interpretation for precision medicine and should be based on systematic performance assessment. The performance of the methods varies widely in different performance assessments, for example due to the contents and sizes of test datasets. To address this issue, we obtained 63,160 common amino acid substitutions (allele frequency ≥1% and <25%) from the Exome Aggregation Consortium (ExAC) database, which contains variants from 60,706 genomes or exomes. We evaluated the specificity, the capability to detect benign variants, for 10 variant interpretation tools. In addition to overall specificity of the tools, we tested their performance for variants in six geographical populations. PON-P2 had the best performance (95.5%) followed by FATHMM (86.4%) and VEST (83.5%). While these tools had excellent performance, the poorest method predicted more than one third of the benign variants to be disease-causing. The results allow choosing reliable methods for benign variant interpretation, for both research and clinical purposes, as well as provide a benchmark for method developers.

Journal ArticleDOI
TL;DR: This work shows how cell-generated contractile forces produce substantial irreversible changes to the density and architecture of physiologically relevant ECMs–collagen I and fibrin–in a matter of minutes and confirms that plasticity, as a mechanical law to capture remodeling in these networks, is fundamentally tied to material damage via force-driven unbinding of fiber crosslinks.
Abstract: The mechanical properties of the extracellular matrix (ECM)-a complex, 3D, fibrillar scaffold of cells in physiological environments-modulate cell behavior and can drive tissue morphogenesis, regeneration, and disease progression. For simplicity, it is often convenient to assume these properties to be time-invariant. In living systems, however, cells dynamically remodel the ECM and create time-dependent local microenvironments. Here, we show how cell-generated contractile forces produce substantial irreversible changes to the density and architecture of physiologically relevant ECMs-collagen I and fibrin-in a matter of minutes. We measure the 3D deformation profiles of the ECM surrounding cancer and endothelial cells during stages when force generation is active or inactive. We further correlate these ECM measurements to both discrete fiber simulations that incorporate fiber crosslink unbinding kinetics and continuum-scale simulations that account for viscoplastic and damage features. Our findings further confirm that plasticity, as a mechanical law to capture remodeling in these networks, is fundamentally tied to material damage via force-driven unbinding of fiber crosslinks. These results characterize in a multiscale manner the dynamic nature of the mechanical environment of physiologically mimicking cell-in-gel systems.

Journal ArticleDOI
TL;DR: To systematically map the mutational tolerance of an antibody variable fragment, AbLIFT, an automated web server that designs multipoint core mutations to improve contacts between specific Fv light and heavy chains, is developed and applied.
Abstract: Antibodies developed for research and clinical applications may exhibit suboptimal stability, expressibility, or affinity. Existing optimization strategies focus on surface mutations, whereas natural affinity maturation also introduces mutations in the antibody core, simultaneously improving stability and affinity. To systematically map the mutational tolerance of an antibody variable fragment (Fv), we performed yeast display and applied deep mutational scanning to an anti-lysozyme antibody and found that many of the affinity-enhancing mutations clustered at the variable light-heavy chain interface, within the antibody core. Rosetta design combined enhancing mutations, yielding a variant with tenfold higher affinity and substantially improved stability. To make this approach broadly accessible, we developed AbLIFT, an automated web server that designs multipoint core mutations to improve contacts between specific Fv light and heavy chains (http://AbLIFT.weizmann.ac.il). We applied AbLIFT to two unrelated antibodies targeting the human antigens VEGF and QSOX1. Strikingly, the designs improved stability, affinity, and expression yields. The results provide proof-of-principle for bypassing laborious cycles of antibody engineering through automated computational affinity and stability design.