scispace - formally typeset
Search or ask a question

Showing papers in "Cell systems in 2021"


Journal ArticleDOI
TL;DR: A web-based tool (covid-omics.app) is presented enabling interactive exploration of the compendium and its utility through a machine learning approach for prediction of COVID-19 severity is illustrated.
Abstract: We performed RNA-seq and high-resolution mass spectrometry on 128 blood samples from COVID-19-positive and COVID-19-negative patients with diverse disease severities and outcomes. Quantified transcripts, proteins, metabolites, and lipids were associated with clinical outcomes in a curated relational database, uniquely enabling systems analysis and cross-ome correlations to molecules and patient prognoses. We mapped 219 molecular features with high significance to COVID-19 status and severity, many of which were involved in complement activation, dysregulated lipid transport, and neutrophil activation. We identified sets of covarying molecules, e.g., protein gelsolin and metabolite citrate or plasmalogens and apolipoproteins, offering pathophysiological insights and therapeutic suggestions. The observed dysregulation of platelet function, blood coagulation, acute phase response, and endotheliopathy further illuminated the unique COVID-19 phenotype. We present a web-based tool (covid-omics.app) enabling interactive exploration of our compendium and illustrate its utility through a machine learning approach for prediction of COVID-19 severity.

361 citations


Journal ArticleDOI
TL;DR: Recently, protein language models have emerged as a powerful machine-learning approach for distilling information from massive protein sequence databases as discussed by the authors, which can discover evolutionary, structural, and functional organization across protein space.
Abstract: Language models have recently emerged as a powerful machine-learning approach for distilling information from massive protein sequence databases. From readily available sequence data alone, these models discover evolutionary, structural, and functional organization across protein space. Using language models, we can encode amino-acid sequences into distributed vector representations that capture their structural and functional properties, as well as evaluate the evolutionary fitness of sequence variants. We discuss recent advances in protein language modeling and their applications to downstream protein property prediction problems. We then consider how these models can be enriched with prior biological knowledge and introduce an approach for encoding protein structural knowledge into the learned representations. The knowledge distilled by these models allows us to improve downstream function prediction through transfer learning. Deep protein language models are revolutionizing protein biology. They suggest new ways to approach protein and therapeutic design. However, further developments are needed to encode strong biological priors into protein language models and to increase their accessibility to the broader community.

126 citations


Journal ArticleDOI
TL;DR: In this paper, the authors characterized the time-dependent progression of the disease in 139 COVID-19 inpatients by measuring 86 accredited diagnostic parameters, such as blood cell counts and enzyme activities, as well as untargeted plasma proteomes at 687 sampling points.
Abstract: COVID-19 is highly variable in its clinical presentation, ranging from asymptomatic infection to severe organ damage and death. We characterized the time-dependent progression of the disease in 139 COVID-19 inpatients by measuring 86 accredited diagnostic parameters, such as blood cell counts and enzyme activities, as well as untargeted plasma proteomes at 687 sampling points. We report an initial spike in a systemic inflammatory response, which is gradually alleviated and followed by a protein signature indicative of tissue repair, metabolic reconstitution, and immunomodulation. We identify prognostic marker signatures for devising risk-adapted treatment strategies and use machine learning to classify therapeutic needs. We show that the machine learning models based on the proteome are transferable to an independent cohort. Our study presents a map linking routinely used clinical diagnostic parameters to plasma proteomes and their dynamics in an infectious disease.

96 citations


Journal ArticleDOI
TL;DR: In this article, the BXD family of mice is extended to 140 fully isogenic (F1) strains, creating a uniquely powerful model for precision medicine, where each member can be replicated, heritable traits can be mapped with high power and precision.
Abstract: The challenge of precision medicine is to model complex interactions among DNA variants, phenotypes, development, environments, and treatments. We address this challenge by expanding the BXD family of mice to 140 fully isogenic strains, creating a uniquely powerful model for precision medicine. This family segregates for 6 million common DNA variants-a level that exceeds many human populations. Because each member can be replicated, heritable traits can be mapped with high power and precision. Current BXD phenomes are unsurpassed in coverage and include much omics data and thousands of quantitative traits. BXDs can be extended by a single-generation cross to as many as 19,460 isogenic F1 progeny, and this extended BXD family is an effective platform for testing causal modeling and for predictive validation. BXDs are a unique core resource for the field of experimental precision medicine.

82 citations


Journal ArticleDOI
TL;DR: This study compared doublet-detection methods regarding detection accuracy under various experimental settings, impacts on downstream analyses, and computational efficiencies and showed that existing methods exhibited diverse performance and distinct advantages in different aspects.
Abstract: In single-cell RNA sequencing (scRNA-seq), doublets form when two cells are encapsulated into one reaction volume. The existence of doublets, which appear to be-but are not-real cells, is a key confounder in scRNA-seq data analysis. Computational methods have been developed to detect doublets in scRNA-seq data; however, the scRNA-seq field lacks a comprehensive benchmarking of these methods, making it difficult for researchers to choose an appropriate method for specific analyses. We conducted a systematic benchmark study of nine cutting-edge computational doublet-detection methods. Our study included 16 real datasets, which contained experimentally annotated doublets, and 112 realistic synthetic datasets. We compared doublet-detection methods regarding detection accuracy under various experimental settings, impacts on downstream analyses, and computational efficiencies. Our results show that existing methods exhibited diverse performance and distinct advantages in different aspects. Overall, the DoubletFinder method has the best detection accuracy, and the cxds method has the highest computational efficiency. A record of this paper's transparent peer review process is included in the Supplemental Information.

73 citations


Journal ArticleDOI
TL;DR: In this paper, the authors focus on mass spectrometry (MS)-based proteomics and describe how machine learning and, in particular, deep learning now predicts experimental peptide measurements from amino acid sequences alone.
Abstract: There is an avalanche of biomedical data generation and a parallel expansion in computational capabilities to analyze and make sense of these data. Starting with genome sequencing and widely employed deep sequencing technologies, these trends have now taken hold in all omics disciplines and increasingly call for multi-omics integration as well as data interpretation by artificial intelligence technologies. Here, we focus on mass spectrometry (MS)-based proteomics and describe how machine learning and, in particular, deep learning now predicts experimental peptide measurements from amino acid sequences alone. This will dramatically improve the quality and reliability of analytical workflows because experimental results should agree with predictions in a multi-dimensional data landscape. Machine learning has also become central to biomarker discovery from proteomics data, which now starts to outperform existing best-in-class assays. Finally, we discuss model transparency and explainability and data privacy that are required to deploy MS-based biomarkers in clinical settings.

66 citations


Journal ArticleDOI
TL;DR: In this paper, a path-independent machine learning-assisted directed evolution (MLDE) protocol was proposed for combinatorial libraries, which allows in-silico screening of full-combinatorial libraries and achieves the global fitness maximum up to 81 times more frequently than single-step greedy optimization.
Abstract: Directed evolution of proteins often involves a greedy optimization in which the mutation in the highest-fitness variant identified in each round of single-site mutagenesis is fixed. The efficiency of such a single-step greedy walk depends on the order in which beneficial mutations are identified-the process is path dependent. Here, we investigate and optimize a path-independent machine learning-assisted directed evolution (MLDE) protocol that allows in silico screening of full combinatorial libraries. In particular, we evaluate the importance of different protein encoding strategies, training procedures, models, and training set design strategies on MLDE outcome, finding the most important consideration to be the implementation of strategies that reduce inclusion of minimally informative "holes" (protein variants with zero or extremely low fitness) in training data. When applied to an epistatic, hole-filled, four-site combinatorial fitness landscape, our optimized protocol achieved the global fitness maximum up to 81-fold more frequently than single-step greedy optimization. A record of this paper's transparent peer review process is included in the supplemental information.

57 citations


Journal ArticleDOI
TL;DR: In this paper, a hybrid approach that combines explicit mathematical models of cell dynamics with a machine learning framework, implemented in TensorFlow, is proposed to identify combinatorial perturbations of potential therapeutic interest.
Abstract: Systematic perturbation of cells followed by comprehensive measurements of molecular and phenotypic responses provides informative data resources for constructing computational models of cell biology. Models that generalize well beyond training data can be used to identify combinatorial perturbations of potential therapeutic interest. Major challenges for machine learning on large biological datasets are to find global optima in a complex multidimensional space and mechanistically interpret the solutions. To address these challenges, we introduce a hybrid approach that combines explicit mathematical models of cell dynamics with a machine-learning framework, implemented in TensorFlow. We tested the modeling framework on a perturbation-response dataset of a melanoma cell line after drug treatments. The models can be efficiently trained to describe cellular behavior accurately. Even though completely data driven and independent of prior knowledge, the resulting de novo network models recapitulate some known interactions. The approach is readily applicable to various kinetic models of cell biology. A record of this paper's Transparent Peer Review process is included in the Supplemental Information.

53 citations


Journal ArticleDOI
TL;DR: In this article, the authors propose a general approach called Hotspot that operates directly on a given metric of cell-cell similarity, allowing for its integration with any method (linear or non-linear) for identifying the primary axes of transcriptional variation between cells.
Abstract: Two fundamental aims that emerge when analyzing single-cell RNA-seq data are identifying which genes vary in an informative manner and determining how these genes organize into modules. Here, we propose a general approach to these problems, called "Hotspot," that operates directly on a given metric of cell-cell similarity, allowing for its integration with any method (linear or non-linear) for identifying the primary axes of transcriptional variation between cells. In addition, we show that when using multimodal data, Hotspot can be used to identify genes whose expression reflects alternative notions of similarity between cells, such as physical proximity in a tissue or clonal relatedness in a cell lineage tree. In this manner, we demonstrate that while Hotspot is capable of identifying genes that reflect nuanced transcriptional variability between T helper cells, it can also identify spatially dependent patterns of gene expression in the cerebellum as well as developmentally heritable expression programs during embryogenesis. Hotspot is implemented as an open-source Python package and is available for use at http://www.github.com/yoseflab/hotspot. A record of this paper's transparent peer review process is included in the supplemental information.

47 citations


Journal ArticleDOI
TL;DR: In this paper, an interpretable and generalizable deep-learning model was proposed to predict interaction between two proteins using only their sequence and maintains high accuracy with limited training data and across species.
Abstract: Summary We combine advances in neural language modeling and structurally motivated design to develop D-SCRIPT, an interpretable and generalizable deep-learning model, which predicts interaction between two proteins using only their sequence and maintains high accuracy with limited training data and across species. We show that a D-SCRIPT model trained on 38,345 human PPIs enables significantly improved functional characterization of fly proteins compared with the state-of-the-art approach. Evaluating the same D-SCRIPT model on protein complexes with known 3D structure, we find that the inter-protein contact map output by D-SCRIPT has significant overlap with the ground truth. We apply D-SCRIPT to screen for PPIs in cow (Bos taurus) at a genome-wide scale and focusing on rumen physiology, identify functional gene modules related to metabolism and immune response. The predicted interactions can then be leveraged for function prediction at scale, addressing the genome-to-phenome challenge, especially in species where little data are available.

47 citations


Journal ArticleDOI
TL;DR: The engineered tissues possess superior liver identity when compared with other PSC-derived liver organoids and show the presence of hepatocyte, biliary, endothelial, and stellate-like cell populations in single-cell RNA-seq analysis.
Abstract: Summary Pluripotent stem cell (PSC)-derived organoids have emerged as novel multicellular models of human tissue development but display immature phenotypes, aberrant tissue fates, and a limited subset of cells. Here, we demonstrate that integrated analysis and engineering of gene regulatory networks (GRNs) in PSC-derived multilineage human liver organoids direct maturation and vascular morphogenesis in vitro. Overexpression of PROX1 and ATF5, combined with targeted CRISPR-based transcriptional activation of endogenous CYP3A4, reprograms tissue GRNs and improves native liver functions, such as FXR signaling, CYP3A4 enzymatic activity, and stromal cell reactivity. The engineered tissues possess superior liver identity when compared with other PSC-derived liver organoids and show the presence of hepatocyte, biliary, endothelial, and stellate-like cell populations in single-cell RNA-seq analysis. Finally, they show hepatic functions when studied in vivo. Collectively, our approach provides an experimental framework to direct organogenesis in vitro by systematically probing molecular pathways and transcriptional networks that promote tissue development.

Journal ArticleDOI
Mustafa Khammash1
TL;DR: In this article, the structural constraints underlying robust perfect adaptation (RPA) have been investigated and the authors elucidate these ideas using biological examples from systems and synthetic biology, and argue that understanding the structural constraint underlying RPA allows us to look past implementation details and offers a compelling means to unravel regulatory biological complexity.
Abstract: A distinctive feature of many biological systems is their ability to adapt to persistent stimuli or disturbances that would otherwise drive them away from a desirable steady state. The resulting stasis enables organisms to function reliably while being subjected to very different external environments. This perspective concerns a stringent type of biological adaptation, robust perfect adaptation (RPA), that is resilient to certain network and parameter perturbations. As in engineered control systems, RPA requires that the regulating network satisfy certain structural constraints that cannot be avoided. We elucidate these ideas using biological examples from systems and synthetic biology. We then argue that understanding the structural constraints underlying RPA allows us to look past implementation details and offers a compelling means to unravel regulatory biological complexity.

Journal ArticleDOI
TL;DR: This work demonstrates that CRISPRi screens can reveal global sources of metabolic robustness and identify local regulatory mechanisms that buffer decreases of specific enzymes.
Abstract: Summary Enzymes maintain metabolism, and their concentration affects cellular fitness: high enzyme levels are costly, and low enzyme levels can limit metabolic flux. Here, we used CRISPR interference (CRISPRi) to study the consequences of decreasing E. coli enzymes below wild-type levels. A pooled CRISPRi screen with 7,177 strains demonstrates that metabolism buffers fitness defects for hours after the induction of CRISPRi. We characterized the metabolome and proteome responses in 30 CRISPRi strains and elucidated three gene-specific buffering mechanisms: ornithine buffered the knockdown of carbamoyl phosphate synthetase (CarAB) by increasing CarAB activity, S-adenosylmethionine buffered the knockdown of homocysteine transmethylase (MetE) by de-repressing expression of the methionine pathway, and 6-phosphogluconate buffered the knockdown of 6-phosphogluconate dehydrogenase (Gnd) by activating a bypass. In total, this work demonstrates that CRISPRi screens can reveal global sources of metabolic robustness and identify local regulatory mechanisms that buffer decreases of specific enzymes. A record of this paper’s transparent peer review process is included in the Supplemental Information.

Journal ArticleDOI
TL;DR: In this paper, the authors combined principled statistical methods with a framework based on catastrophe theory and approximate Bayesian computation to formulate a quantitative dynamical landscape that accurately predicts cell fate outcomes.
Abstract: Fate decisions in developing tissues involve cells transitioning between discrete cell states, each defined by distinct gene expression profiles. The Waddington landscape, in which the development of a cell is viewed as a ball rolling through a valley filled terrain, is an appealing way to describe differentiation. To construct and validate accurate landscapes, quantitative methods based on experimental data are necessary. We combined principled statistical methods with a framework based on catastrophe theory and approximate Bayesian computation to formulate a quantitative dynamical landscape that accurately predicts cell fate outcomes of pluripotent stem cells exposed to different combinations of signaling factors. Analysis of the landscape revealed two distinct ways in which cells make a binary choice between one of two fates. We suggest that these represent archetypal designs for developmental decisions. The approach is broadly applicable for the quantitative analysis of differentiation and for determining the logic of developmental decisions.

Journal ArticleDOI
TL;DR: AutoGeneS is introduced, a platform that automatically extracts discriminative genes and reveals the cellular heterogeneity of bulk RNA samples and requires no prior knowledge about marker genes and selects genes by simultaneously optimizing multiple criteria: minimizing the correlation and maximizing the distance between cell types.
Abstract: Knowing cell-type proportions in a tissue is very important to identify which cells or cell types are targeted by a disease or perturbation. Hence, several deconvolution methods have been proposed to infer cell-type proportions from bulk RNA samples. Their performance with noisy reference profiles and closely correlated cell types highly depends on the set of genes undergoing deconvolution. In this work, we introduce AutoGeneS, a platform that automatically extracts discriminative genes and reveals the cellular heterogeneity of bulk RNA samples. AutoGeneS requires no prior knowledge about marker genes and selects genes by simultaneously optimizing multiple criteria: minimizing the correlation and maximizing the distance between cell types. AutoGeneS can be applied to reference profiles from various sources like single-cell experiments or sorted cell populations. Ground truth cell proportions analyzed by flow cytometry confirmed the accuracy of AutoGeneS in identifying cell-type proportions. AutoGeneS is available for use via a standalone Python package (https://github.com/theislab/AutoGeneS).

Journal ArticleDOI
TL;DR: In this article, the authors employed isotope tracing and mass spectrometry to probe age-related changes in NAD+ metabolism across tissues, and observed modest tissue NAD+ depletion (median decrease ∼30%).
Abstract: Summary NAD+ is an essential coenzyme for all living cells. NAD+ concentrations decline with age, but whether this reflects impaired production or accelerated consumption remains unclear. We employed isotope tracing and mass spectrometry to probe age-related changes in NAD+ metabolism across tissues. In aged mice, we observed modest tissue NAD+ depletion (median decrease ∼30%). Circulating NAD+ precursors were not significantly changed, and isotope tracing showed the unimpaired synthesis of nicotinamide from tryptophan. In most tissues of aged mice, turnover of the smaller tissue NAD+ pool was modestly faster such that absolute NAD+ biosynthetic flux was maintained, consistent with more active NAD+-consuming enzymes. Calorie restriction partially mitigated age-associated NAD+ decline by decreasing consumption. Acute inflammatory stress induced by LPS decreased NAD+ by impairing synthesis in both young and aged mice. Thus, the decline in NAD+ with normal aging is relatively subtle and occurs despite maintained NAD+ production, likely due to increased consumption.

Journal ArticleDOI
TL;DR: In this article, the reproducibility and variability of microbial community assembly were investigated in replicate glucose-limited habitats and it was shown that the previously observed family-level convergence in these habitats reflects a reproducible metabolic organization.
Abstract: Summary For microbiome biology to become a more predictive science, we must identify which descriptive features of microbial communities are reproducible and predictable, which are not, and why. We address this question by experimentally studying parallelism and convergence in microbial community assembly in replicate glucose-limited habitats. Here, we show that the previously observed family-level convergence in these habitats reflects a reproducible metabolic organization, where the ratio of the dominant metabolic groups can be explained from a simple resource-partitioning model. In turn, taxonomic divergence among replicate communities arises from multistability in population dynamics. Multistability can also lead to alternative functional states in closed ecosystems but not in metacommunities. Our findings empirically illustrate how the evolutionary conservation of quantitative metabolic traits, multistability, and the inherent stochasticity of population dynamics, may all conspire to generate the patterns of reproducibility and variability at different levels of organization that are commonplace in microbial community assembly.

Journal ArticleDOI
TL;DR: Pulse-chase proteomics on mouse brains in three genetic models of AD reveals that the presynaptic terminal is particularly vulnerable and represents a critical site for manifestation of initial AD etiology.
Abstract: Compromised protein homeostasis underlies accumulation of plaques and tangles in Alzheimer's disease (AD). To observe protein turnover at early stages of amyloid beta (Aβ) proteotoxicity, we performed pulse-chase proteomics on mouse brains in three genetic models of AD that knock in alleles of amyloid precursor protein (APP) prior to the accumulation of plaques and during disease progression. At initial stages of Aβ accumulation, the turnover of proteins associated with presynaptic terminals is selectively impaired. Presynaptic proteins with impaired turnover, particularly synaptic vesicle (SV)-associated proteins, have elevated levels, misfold in both a plaque-dependent and -independent manner, and interact with APP and Aβ. Concurrent with elevated levels of SV-associated proteins, we found an enlargement of the SV pool as well as enhancement of presynaptic potentiation. Together, our findings reveal that the presynaptic terminal is particularly vulnerable and represents a critical site for manifestation of initial AD etiology. A record of this paper's transparent peer review process is included in the Supplemental Information.

Journal ArticleDOI
TL;DR: In this article, the authors show that parietal cortex, striatum, and thalamus contributed more than frontal cortex to decoding differences in consciousness, highlighting the importance of integration between parietal and subcortical structures and challenge a key role for frontal cortex in consciousness.
Abstract: The neural substrates of consciousness remain elusive. Competing theories that attempt to explain consciousness disagree on the contribution of frontal versus posterior cortex and omit subcortical influences. This lack of understanding impedes the ability to monitor consciousness, which can lead to adverse clinical consequences. To test substrates and measures of consciousness, we recorded simultaneously from frontal cortex, parietal cortex, and subcortical structures, the striatum and thalamus, in awake, sleeping, and anesthetized macaques. We manipulated consciousness on a finer scale using thalamic stimulation, rousing macaques from continuously administered anesthesia. Our results show that, unlike measures targeting complexity, a measure additionally capturing neural integration (Φ∗) robustly correlated with changes in consciousness. Machine learning approaches show parietal cortex, striatum, and thalamus contributed more than frontal cortex to decoding differences in consciousness. These findings highlight the importance of integration between parietal and subcortical structures and challenge a key role for frontal cortex in consciousness.

Journal ArticleDOI
TL;DR: It is found that SARS-CoV-2 subunit peptides may not be robustly displayed by the Major Histocompatibility Complex (MHC) molecules in certain individuals.
Abstract: Subunit vaccines induce immunity to a pathogen by presenting a component of the pathogen and thus inherently limit the representation of pathogen peptides for cellular immunity-based memory. We find that severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) subunit peptides may not be robustly displayed by the major histocompatibility complex (MHC) molecules in certain individuals. We introduce an augmentation strategy for subunit vaccines that adds a small number of SARS-CoV-2 peptides to a vaccine to improve the population coverage of pathogen peptide display. Our population coverage estimates integrate clinical data on peptide immunogenicity in convalescent COVID-19 patients and machine learning predictions. We evaluate the population coverage of 9 different subunits of SARS-CoV-2, including 5 functional domains and 4 full proteins, and augment each of them to fill a predicted coverage gap.

Journal ArticleDOI
TL;DR: In this paper, the authors define objectives in learning perturbation response in single-cell omics and survey existing approaches, resources, and datasets (https://github.com/theislab/sc-pert).
Abstract: Cell biology is fundamentally limited in its ability to collect complete data on cellular phenotypes and the wide range of responses to perturbation. Areas such as computer vision and speech recognition have addressed this problem of characterizing unseen or unlabeled conditions with the combined advances of big data, deep learning, and computing resources in the past 5 years. Similarly, recent advances in machine learning approaches enabled by single-cell data start to address prediction tasks in perturbation response modeling. We first define objectives in learning perturbation response in single-cell omics; survey existing approaches, resources, and datasets (https://github.com/theislab/sc-pert); and discuss how a perturbation atlas can enable deep learning models to construct an informative perturbation latent space. We then examine future avenues toward more powerful and explainable modeling using deep neural networks, which enable the integration of disparate information sources and an understanding of heterogeneous, complex, and unseen systems.

Journal ArticleDOI
TL;DR: A positive-unlabeled (PU) learning framework to infer sequence-function relationships from large-scale DMS data is developed and the estimated parameters pinpoint key residues that dictate protein structure and function.
Abstract: Machine learning can infer how protein sequence maps to function without requiring a detailed understanding of the underlying physical or biological mechanisms. It is challenging to apply existing supervised learning frameworks to large-scale experimental data generated by deep mutational scanning (DMS) and related methods. DMS data often contain high-dimensional and correlated sequence variables, experimental sampling error and bias, and the presence of missing data. Notably, most DMS data do not contain examples of negative sequences, making it challenging to directly estimate how sequence affects function. Here, we develop a positive-unlabeled (PU) learning framework to infer sequence-function relationships from large-scale DMS data. Our PU learning method displays excellent predictive performance across ten large-scale sequence-function datasets, representing proteins of different folds, functions, and library types. The estimated parameters pinpoint key residues that dictate protein structure and function. Finally, we apply our statistical sequence-function model to design highly stabilized enzymes.

Journal ArticleDOI
TL;DR: In this paper, a minimizer-space de Bruijn graph-based representation of 661,405 bacterial genomes, comprising 16 million nodes and 45 million edges, was constructed for anti-microbial resistance (AMR) genes in 12min.
Abstract: Summary DNA sequencing data continue to progress toward longer reads with increasingly lower sequencing error rates. Here, we define an algorithmic approach, mdBG, that makes use of minimizer-space de Bruijn graphs to enable long-read genome assembly. mdBG achieves orders-of-magnitude improvement in both speed and memory usage over existing methods without compromising accuracy. A human genome is assembled in under 10 min using 8 cores and 10 GB RAM, and 60 Gbp of metagenome reads are assembled in 4 min using 1 GB RAM. In addition, we constructed a minimizer-space de Bruijn graph-based representation of 661,405 bacterial genomes, comprising 16 million nodes and 45 million edges, and successfully search it for anti-microbial resistance (AMR) genes in 12 min. We expect our advances to be essential to sequence analysis, given the rise of long-read sequencing in genomics, metagenomics, and pangenomics. Code for constructing mdBGs is freely available for download at https://github.com/ekimb/rust-mdbg/ .

Journal ArticleDOI
TL;DR: In this article, a methodologies using single-cell RNA sequencing (scRNA-seq) analysis was presented to identify early, convergent, and MN-resolved signatures of ALS.
Abstract: Induced pluripotent stem cell (iPSC)-derived neural cultures from amyotrophic lateral sclerosis (ALS) patients can model disease phenotypes. However, heterogeneity arising from genetic and experimental variability limits their utility, impacting reproducibility and the ability to track cellular origins of pathogenesis. Here, we present methodologies using single-cell RNA sequencing (scRNA-seq) analysis to address these limitations. By repeatedly differentiating and applying scRNA-seq to motor neurons (MNs) from healthy, familial ALS, sporadic ALS, and genome-edited iPSC lines across multiple patients, batches, and platforms, we account for genetic and experimental variability toward identifying unified and reproducible ALS signatures. Combining HOX and developmental gene expression with global clustering, we anatomically classified cells into rostrocaudal, progenitor, and postmitotic identities. By relaxing statistical thresholds, we discovered genes in iPSC-MNs that were concordantly dysregulated in postmortem MNs and yielded predictive ALS markers in other human and mouse models. Our approach thus revealed early, convergent, and MN-resolved signatures of ALS.

Journal ArticleDOI
TL;DR: In this paper, a bioinformatics pipeline for integrating multi-omics data into personalized genome-scale flux balance analysis models of 716 radiation-sensitive and 199 radiation-resistant tumors was developed.
Abstract: Redox cofactor production is integral toward antioxidant generation, clearance of reactive oxygen species, and overall tumor response to ionizing radiation treatment. To identify systems-level alterations in redox metabolism that confer resistance to radiation therapy, we developed a bioinformatics pipeline for integrating multi-omics data into personalized genome-scale flux balance analysis models of 716 radiation-sensitive and 199 radiation-resistant tumors. These models collectively predicted that radiation-resistant tumors reroute metabolic flux to increase mitochondrial NADPH stores and reactive oxygen species (ROS) scavenging. Simulated genome-wide knockout screens agreed with experimental siRNA gene knockdowns in matched radiation-sensitive and radiation-resistant cancer cell lines, revealing gene targets involved in mitochondrial NADPH production, central carbon metabolism, and folate metabolism that allow for selective inhibition of glutathione production and H2O2 clearance in radiation-resistant cancers. This systems approach represents a significant advancement in developing quantitative genome-scale models of redox metabolism and identifying personalized metabolic targets for improving radiation sensitivity in individual cancer patients.

Journal ArticleDOI
TL;DR: In this article, the authors describe control systems approaches for achieving context-aware devices that are robust to context effects, and then consider cell fate programing as a case study to explore the potential impact of contextaware devices for regenerative medicine applications.
Abstract: The rise of systems biology has ushered a new paradigm: the view of the cell as a system that processes environmental inputs to drive phenotypic outputs. Synthetic biology provides a complementary approach, allowing us to program cell behavior through the addition of synthetic genetic devices into the cellular processor. These devices, and the complex genetic circuits they compose, are engineered using a design-prototype-test cycle, allowing for predictable device performance to be achieved in a context-dependent manner. Within mammalian cells, context effects impact device performance at multiple scales, including the genetic, cellular, and extracellular levels. In order for synthetic genetic devices to achieve predictable behaviors, approaches to overcome context dependence are necessary. Here, we describe control systems approaches for achieving context-aware devices that are robust to context effects. We then consider cell fate programing as a case study to explore the potential impact of context-aware devices for regenerative medicine applications.

Journal ArticleDOI
TL;DR: In this article, the authors provide an extensible approach to rationally prioritize combination therapies for testing in in-vivo mouse models of tuberculosis, and develop classifiers predictive of multidrug treatment outcome in a mouse model of disease relapse.
Abstract: Summary Lengthy multidrug chemotherapy is required to achieve a durable cure in tuberculosis. However, we lack well-validated, high-throughput in vitro models that predict animal outcomes. Here, we provide an extensible approach to rationally prioritize combination therapies for testing in in vivo mouse models of tuberculosis. We systematically measured Mycobacterium tuberculosis response to all two- and three-drug combinations among ten antibiotics in eight conditions that reproduce lesion microenvironments, resulting in >500,000 measurements. Using these in vitro data, we developed classifiers predictive of multidrug treatment outcome in a mouse model of disease relapse and identified ensembles of in vitro models that best describe in vivo treatment outcomes. We identified signatures of potencies and drug interactions in specific in vitro models that distinguish whether drug combinations are better than the standard of care in two important preclinical mouse models. Our framework is generalizable to other difficult-to-treat diseases requiring combination therapies. A record of this paper’s transparent peer review process is included in the supplemental information.

Journal ArticleDOI
TL;DR: The DREAM challenge as mentioned in this paper used in-vitro experimental intMEMOIR recordings and in-silico data for a C.elegans lineage tree and a Mus musculus tree of 10,000 cells.
Abstract: Summary The recent advent of CRISPR and other molecular tools enabled the reconstruction of cell lineages based on induced DNA mutations and promises to solve the ones of more complex organisms. To date, no lineage reconstruction algorithms have been rigorously examined for their performance and robustness across dataset types and number of cells. To benchmark such methods, we decided to organize a DREAM challenge using in vitro experimental intMEMOIR recordings and in silico data for a C. elegans lineage tree of about 1,000 cells and a Mus musculus tree of 10,000 cells. Some of the 22 approaches submitted had excellent performance, but structural features of the trees prevented optimal reconstructions. Using smaller sub-trees as training sets proved to be a good approach for tuning algorithms to reconstruct larger trees. The simulation and reconstruction methods here generated delineate a potential way forward for solving larger cell lineage trees such as in mouse.

Journal ArticleDOI
TL;DR: RBM-MHC is shown to be a flexible and easily interpretable method that can be used as a predictor of cancer neoantigens and viral epitopes, as a tool for feature discovery, and to reconstruct peptide motifs presented on specific HLA molecules.
Abstract: The recent increase of immunopeptidomics data, obtained by mass spectrometry or binding assays, opens up possibilities for investigating endogenous antigen presentation by the highly polymorphic human leukocyte antigen class I (HLA-I) protein. State-of-the-art methods predict with high accuracy presentation by HLA alleles that are well represented in databases at the time of release but have a poorer performance for rarer and less characterized alleles. Here, we introduce a method based on Restricted Boltzmann Machines (RBMs) for prediction of antigens presented on the Major Histocompatibility Complex (MHC) encoded by HLA genes-RBM-MHC. RBM-MHC can be trained on custom and newly available samples with no or a small amount of HLA annotations. RBM-MHC ensures improved predictions for rare alleles and matches state-of-the-art performance for well-characterized alleles while being less data demanding. RBM-MHC is shown to be a flexible and easily interpretable method that can be used as a predictor of cancer neoantigens and viral epitopes, as a tool for feature discovery, and to reconstruct peptide motifs presented on specific HLA molecules.

Journal ArticleDOI
TL;DR: In this paper, the authors propose a modular strategy to build molecular quasi-integral feedback controllers, which involves following two design principles: the first principle is to utilize an ultrasensitive response, which determines the gain of the controller and influences the steady-state error.
Abstract: Summary Feedback control has enabled the success of automated technologies by mitigating the effects of variability, unknown disturbances, and noise While it is known that biological feedback loops reduce the impact of noise and help shape kinetic responses, many questions remain about how to design molecular integral controllers Here, we propose a modular strategy to build molecular quasi-integral feedback controllers, which involves following two design principles The first principle is to utilize an ultrasensitive response, which determines the gain of the controller and influences the steady-state error The second is to use a tunable threshold of the ultrasensitive response, which determines the equilibrium point of the system We describe a reaction network, named brink controller, that satisfies these conditions by combining molecular sequestration and an activation/deactivation cycle With computational models, we examine potential biological implementations of brink controllers, and we illustrate different example applications