scispace - formally typeset
Search or ask a question

Showing papers in "Cell systems in 2020"


Journal ArticleDOI
TL;DR: A platform for ultra-high throughput serum and plasma proteomics that builds on ISO13485 standardisation and high-flow liquid chromatography to facilitate implementation in clinical laboratories and identifies 27 potential biomarkers that are differentially expressed depending on the WHO severity grade of COVID-19.
Abstract: The COVID-19 pandemic is an unprecedented global challenge, and point-of-care diagnostic classifiers are urgently required. Here, we present a platform for ultra-high-throughput serum and plasma proteomics that builds on ISO13485 standardization to facilitate simple implementation in regulated clinical laboratories. Our low-cost workflow handles up to 180 samples per day, enables high precision quantification, and reduces batch effects for large-scale and longitudinal studies. We use our platform on samples collected from a cohort of early hospitalized cases of the SARS-CoV-2 pandemic and identify 27 potential biomarkers that are differentially expressed depending on the WHO severity grade of COVID-19. They include complement factors, the coagulation system, inflammation modulators, and pro-inflammatory factors upstream and downstream of interleukin 6. All protocols and software for implementing our approach are freely available. In total, this work supports the development of routine proteomic assays to aid clinical decision making and generate hypotheses about potential COVID-19 therapeutic targets.

407 citations


Journal ArticleDOI
TL;DR: An integrated predictor of MHC class I presentation that combines new models for M HC class I binding and antigen processing is developed that outperformed the two individual components as well as NetMHCpan 4.0 and MixMHCpred 2.0.
Abstract: Summary Computational prediction of the peptides presented on major histocompatibility complex (MHC) class I proteins is an important tool for studying T cell immunity. The data available to develop such predictors have expanded with the use of mass spectrometry to identify naturally presented MHC ligands. In addition to elucidating binding motifs, the identified ligands also reflect the antigen processing steps that occur prior to MHC binding. Here, we developed an integrated predictor of MHC class I presentation that combines new models for MHC class I binding and antigen processing. Considering only peptides first predicted by the binding model to bind strongly to MHC, the antigen processing model is trained to discriminate published mass spectrometry-identified MHC class I ligands from unobserved peptides. The integrated model outperformed the two individual components as well as NetMHCpan 4.0 and MixMHCpred 2.0.2 on held-out mass spectrometry experiments. Our predictors are implemented in the open source MHCflurry package, version 2.0 ( github.com/openvax/mhcflurry ).

159 citations


Journal ArticleDOI
TL;DR: This work shows that a deep graph neural network, ProteinSolver, can precisely design sequences that fold into a predetermined shape by phrasing this challenge as a constraint satisfaction problem (CSP), akin to Sudoku puzzles.
Abstract: Summary Protein structure and function is determined by the arrangement of the linear sequence of amino acids in 3D space. We show that a deep graph neural network, ProteinSolver, can precisely design sequences that fold into a predetermined shape by phrasing this challenge as a constraint satisfaction problem (CSP), akin to Sudoku puzzles. We trained ProteinSolver on over 70,000,000 real protein sequences corresponding to over 80,000 structures. We show that our method rapidly designs new protein sequences and benchmark them in silico using energy-based scores, molecular dynamics, and structure prediction methods. As a proof-of-principle validation, we use ProteinSolver to generate sequences that match the structure of serum albumin, then synthesize the top-scoring design and validate it in vitro using circular dichroism. ProteinSolver is freely available at http://design.proteinsolver.org and https://gitlab.com/ostrokach/proteinsolver. A record of this paper's transparent peer review process is included in the Supplemental Information.

144 citations


Journal ArticleDOI
TL;DR: The key to the approach is that during training nucleAIzer automatically adapts its nucleus-style model to unseen and unlabeled data using image style transfer to automatically generate augmented training samples, making deep learning for nucleus segmentation fairly simple and labor free for most biological light microscopy experiments.
Abstract: Single-cell segmentation is typically a crucial task of image-based cellular analysis. We present nucleAIzer, a deep-learning approach aiming toward a truly general method for localizing 2D cell nuclei across a diverse range of assays and light microscopy modalities. We outperform the 739 methods submitted to the 2018 Data Science Bowl on images representing a variety of realistic conditions, some of which were not represented in the training data. The key to our approach is that during training nucleAIzer automatically adapts its nucleus-style model to unseen and unlabeled data using image style transfer to automatically generate augmented training samples. This allows the model to recognize nuclei in new and different experiments efficiently without requiring expert annotations, making deep learning for nucleus segmentation fairly simple and labor free for most biological light microscopy experiments. It can also be used online, integrated into CellProfiler and freely downloaded at www.nucleaizer.org. A record of this paper's transparent peer review process is included in the Supplemental Information.

141 citations


Journal ArticleDOI
Kevin Wu1, Furqan M. Fazal1, Kevin R. Parker1, James Zou1, Howard Y. Chang1 
TL;DR: The mitochondrial residency signal is interpreted as an indicator of intracellular RNA trafficking with respect to double-membrane vesicles, a critical stage in the coronavirus life cycle and serves as a hypothesis generation tool for SARS-CoV-2 biology and inform experimental efforts to combat the virus.
Abstract: SARS-CoV-2 genomic and subgenomic RNA (sgRNA) transcripts hijack the host cell's machinery Subcellular localization of its viral RNA could, thus, play important roles in viral replication and host antiviral immune response We perform computational modeling of SARS-CoV-2 viral RNA subcellular residency across eight subcellular neighborhoods We compare hundreds of SARS-CoV-2 genomes with the human transcriptome and other coronaviruses We predict the SARS-CoV-2 RNA genome and sgRNAs to be enriched toward the host mitochondrial matrix and nucleolus, and that the 5' and 3' viral untranslated regions contain the strongest, most distinct localization signals We interpret the mitochondrial residency signal as an indicator of intracellular RNA trafficking with respect to double-membrane vesicles, a critical stage in the coronavirus life cycle Our computational analysis serves as a hypothesis generation tool to suggest models for SARS-CoV-2 biology and inform experimental efforts to combat the virus A record of this paper's Transparent Peer Review process is included in the Supplemental Information

107 citations


Journal ArticleDOI
TL;DR: A benchmark dataset containing the inter-molecular non-covalent interactions for more than 10,000 compound-protein pairs is compiled and the interpretability of neural attentions in existing models is evaluated.
Abstract: Summary Computational approaches for understanding compound-protein interactions (CPIs) can greatly facilitate drug development. Recently, a number of deep-learning-based methods have been proposed to predict binding affinities and attempt to capture local interaction sites in compounds and proteins through neural attentions (i.e., neural network architectures that enable the interpretation of feature importance). Here, we compiled a benchmark dataset containing the inter-molecular non-covalent interactions for more than 10,000 compound-protein pairs and systematically evaluated the interpretability of neural attentions in existing models. We also developed a multi-objective neural network, called MONN, to predict both non-covalent interactions and binding affinities between compounds and proteins. Comprehensive evaluation demonstrated that MONN can successfully predict the non-covalent interactions between compounds and proteins that cannot be effectively captured by neural attentions in previous prediction methods. Moreover, MONN outperforms other state-of-the-art methods in predicting binding affinities. Source code for MONN is freely available for download at https://github.com/lishuya17/MONN .

102 citations


Journal ArticleDOI
TL;DR: The approach shows microbiome-derived short-chain fatty acids (SCFAs) to either improve or worsen UC severity, depending on the involvement of effector CD4 T cells, and paradoxical findings underscore the emerging utility of human physiomimetic technology in combination with systems immunology to study causality and the fundamental entanglement of immunity, metabolism, and tissue homeostasis.
Abstract: Summary Although the association between the microbiome and IBD and liver diseases is known, the cause and effect remain elusive. By connecting human microphysiological systems of the gut, liver, and circulating Treg and Th17 cells, we created a multi-organ model of ulcerative colitis (UC) ex vivo. The approach shows microbiome-derived short-chain fatty acids (SCFAs) to either improve or worsen UC severity, depending on the involvement of effector CD4 T cells. Using multiomics, we found SCFAs increased production of ketone bodies, glycolysis, and lipogenesis, while markedly reducing innate immune activation of the UC gut. However, during acute T cell-mediated inflammation, SCFAs exacerbated CD4+ T cell-effector function, partially through metabolic reprograming, leading to gut barrier disruption and hepatic injury. These paradoxical findings underscore the emerging utility of human physiomimetic technology in combination with systems immunology to study causality and the fundamental entanglement of immunity, metabolism, and tissue homeostasis.

99 citations


Journal ArticleDOI
TL;DR: Scribe is presented, a toolkit for detecting and visualizing causal regulatory interactions between genes and the potential for single-cell experiments to power network reconstruction and it is demonstrated that performing causal inference requires temporal coupling between measurements.
Abstract: Here, we present Scribe (https://github.com/aristoteleo/Scribe-py), a toolkit for detecting and visualizing causal regulatory interactions between genes and explore the potential for single-cell experiments to power network reconstruction. Scribe employs restricted directed information to determine causality by estimating the strength of information transferred from a potential regulator to its downstream target. We apply Scribe and other leading approaches for causal network reconstruction to several types of single-cell measurements and show that there is a dramatic drop in performance for "pseudotime"-ordered single-cell data compared with true time-series data. We demonstrate that performing causal inference requires temporal coupling between measurements. We show that methods such as "RNA velocity" restore some degree of coupling through an analysis of chromaffin cell fate commitment. These analyses highlight a shortcoming in experimental and computational methods for analyzing gene regulation at single-cell resolution and suggest ways of overcoming it.

96 citations


Journal ArticleDOI
TL;DR: Racism and COVID-19 represent a pandemic on aPandemic for Blacks and the pandemics find themselves synergized to the detriment of Blacks and their health.
Abstract: Racism and COVID-19 represent a pandemic on a pandemic for Blacks. The pandemics find themselves synergized to the detriment of Blacks and their health. The complexity of the combination of these pandemics are evident when examining the interplay between racist policing practices and health.

89 citations


Journal ArticleDOI
TL;DR: This work trains a model on just 72 compounds to make predictions over a 10,833-compound library, identifying and experimentally validating compounds with nanomolar affinity for diverse kinases and whole-cell growth inhibition of Mycobacterium tuberculosis.
Abstract: Machine learning that generates biological hypotheses has transformative potential, but most learning algorithms are susceptible to pathological failure when exploring regimes beyond the training data distribution. A solution to address this issue is to quantify prediction uncertainty so that algorithms can gracefully handle novel phenomena that confound standard methods. Here, we demonstrate the broad utility of robust uncertainty prediction in biological discovery. By leveraging Gaussian process-based uncertainty prediction on modern pre-trained features, we train a model on just 72 compounds to make predictions over a 10,833-compound library, identifying and experimentally validating compounds with nanomolar affinity for diverse kinases and whole-cell growth inhibition of Mycobacterium tuberculosis. Uncertainty facilitates a tight iterative loop between computation and experimentation and generalizes across biological domains as diverse as protein engineering and single-cell transcriptomics. More broadly, our work demonstrates that uncertainty should play a key role in the increasing adoption of machine learning algorithms into the experimental lifecycle.

85 citations


Journal ArticleDOI
TL;DR: Solo is described, a semi-supervised deep learning approach that identifies doublets with greater accuracy than existing methods and can be applied in combination with experimental doublet detection methods to further purify scRNA-seq data to true single cells.
Abstract: Single-cell RNA sequencing (scRNA-seq) measurements of gene expression enable an unprecedented high-resolution view into cellular state However, current methods often result in two or more cells that share the same cell-identifying barcode; these "doublets" violate the fundamental premise of single-cell technology and can lead to incorrect inferences Here, we describe Solo, a semi-supervised deep learning approach that identifies doublets with greater accuracy than existing methods Solo embeds cells unsupervised using a variational autoencoder and then appends a feed-forward neural network layer to the encoder to form a supervised classifier We train this classifier to distinguish simulated doublets from the observed data Solo can be applied in combination with experimental doublet detection methods to further purify scRNA-seq data to true single cells It is freely available from https://githubcom/calico/solo A record of this paper's transparent peer review process is included in the Supplemental Information

Journal ArticleDOI
TL;DR: It is concluded that establishing precise protein levels involves both coordinated synthesis and post-translational fine-tuning via protein degradation.
Abstract: How do cells maintain relative proportions of protein complex components? Advances in quantitative, genome-wide measurements have begun to shed light onto the roles of protein synthesis and degradation in establishing the precise proportions in living cells: on the one hand, ribosome profiling studies indicate that proteins are already produced in the correct relative proportions. On the other hand, proteomic studies found that many complexes contain subunits that are made in excess and subsequently degraded. Here, we discuss these seemingly contradictory findings, emerging principles, and remaining open questions. We conclude that establishing precise protein levels involves both coordinated synthesis and post-translational fine-tuning via protein degradation.

Journal ArticleDOI
TL;DR: It is demonstrated that transient optical perturbations generate a persistent and robust potassium-channel-mediated change in the membrane potential of bacteria within the biofilm, which could enable computations within prokaryotic communities and suggests a parallel between neurons and bacteria.
Abstract: Summary Cellular membrane potential plays a key role in the formation and retrieval of memories in the metazoan brain, but it remains unclear whether such memory can also be encoded in simpler organisms like bacteria. Here, we show that single-cell-level memory patterns can be imprinted in bacterial biofilms by light-induced changes in the membrane potential. We demonstrate that transient optical perturbations generate a persistent and robust potassium-channel-mediated change in the membrane potential of bacteria within the biofilm. The light-exposed cells respond in an anti-phase manner, relative to unexposed cells, to both natural and induced oscillations in extracellular ion concentrations. This anti-phase response, which persists for hours following the transient optical stimulus, enables a direct single-cell resolution visualization of spatial memory patterns within the biofilm. The ability to encode robust and persistent membrane-potential-based memory patterns could enable computations within prokaryotic communities and suggests a parallel between neurons and bacteria.

Journal ArticleDOI
TL;DR: Deep exploration networks (DENs) are developed, a class of activation-maximizing generative models, which minimize the cost of a neural network fitness predictor by gradient descent by penalizing any two generated patterns on the basis of a similarity metric.
Abstract: Engineering gene and protein sequences with defined functional properties is a major goal of synthetic biology Deep neural network models, together with gradient ascent-style optimization, show promise for sequence design The generated sequences can however get stuck in local minima and often have low diversity Here, we develop deep exploration networks (DENs), a class of activation-maximizing generative models, which minimize the cost of a neural network fitness predictor by gradient descent By penalizing any two generated patterns on the basis of a similarity metric, DENs explicitly maximize sequence diversity To avoid drifting into low-confidence regions of the predictor, we incorporate variational autoencoders to maintain the likelihood ratio of generated sequences Using DENs, we engineered polyadenylation signals with more than 10-fold higher selection odds than the best gradient ascent-generated patterns, identified splice regulatory sequences predicted to result in highly differential splicing between cell lines, and improved on state-of-the-art results for protein design tasks

Journal ArticleDOI
TL;DR: It is argued that TI stimulation cannot work via passive membrane filtering, as previously hypothesized, and instead, TI stimulation requires an ion-channel mediated signal rectification process that is also responsible for high-frequency conduction block in off-target tissues, thus challenging clinical applications of TI.
Abstract: Temporal interference (TI) is a non-invasive neurostimulation technique that utilizes high-frequency external electric fields to stimulate deep neuronal structures without affecting superficial, off-target structures. TI represents a potential breakthrough for treating conditions, such as Parkinson's disease and chronic pain. However, early clinical work on TI stimulation was met with mixed outcomes challenging its fundamental mechanisms and applications. Here, we apply established physics to study the mechanisms of TI with the goal of optimizing it for clinical use. We argue that TI stimulation cannot work via passive membrane filtering, as previously hypothesized. Instead, TI stimulation requires an ion-channel mediated signal rectification process. Unfortunately, this mechanism is also responsible for high-frequency conduction block in off-target tissues, thus challenging clinical applications of TI. In consequence, we propose a set of experimental controls that should be performed in future experiments to refine our understanding and practice of TI stimulation. A record of this paper's transparent peer review process is included in the Supplemental Information.

Journal ArticleDOI
TL;DR: SERGIO, a simulator of single-cell gene expression data that models the stochastic nature of transcription as well as regulation of genes by multiple transcription factors according to a user-provided gene regulatory network, is presented.
Abstract: Summary A common approach to benchmarking of single-cell transcriptomics tools is to generate synthetic datasets that statistically resemble experimental data. However, most existing single-cell simulators do not incorporate transcription factor-gene regulatory interactions that underlie expression dynamics. Here, we present SERGIO, a simulator of single-cell gene expression data that models the stochastic nature of transcription as well as regulation of genes by multiple transcription factors according to a user-provided gene regulatory network. SERGIO can simulate any number of cell types in steady state or cells differentiating to multiple fates. We show that datasets generated by SERGIO are statistically comparable to experimental data generated by Illumina HiSeq2000, Drop-seq, Illumina 10X chromium, and Smart-seq. We use SERGIO to benchmark several single-cell analysis tools, including GRN inference methods, and identify Tcf7, Gata3, and Bcl11b as key drivers of T cell differentiation by performing in silico knockout experiments. SERGIO is freely available for download here: https://github.com/PayamDiba/SERGIO .

Journal ArticleDOI
TL;DR: This study finds that the stress-like subpopulation of cancer cells is present from the early stages of tumorigenesis and provides evidence that this state has higher tumor-seeding capabilities and that its induction leads to increased growth under both MEK and BRAF inhibitors.
Abstract: Summary Transcriptional profiling of tumors has revealed a stress-like state among the cancer cells with the concerted expression of genes such as fos, jun, and heat-shock proteins, though this has been controversial given possible dissociation-effects associated with single-cell RNA sequencing. Here, we validate the existence of this state using a combination of zebrafish melanoma modeling, spatial transcriptomics, and human samples. We found that the stress-like subpopulation of cancer cells is present from the early stages of tumorigenesis. Comparing with previously reported single-cell RNA sequencing datasets from diverse cancer types, including triple-negative breast cancer, oligodendroglioma, and pancreatic adenocarcinoma, indicated the conservation of this state during tumorigenesis. We also provide evidence that this state has higher tumor-seeding capabilities and that its induction leads to increased growth under both MEK and BRAF inhibitors. Collectively, our study supports the stress-like cells as a cancer cell state expressing a coherent set of genes and exhibiting drug-resistance properties.

Journal ArticleDOI
TL;DR: The implementation of an integrated technique to quantify cell-state-specific changes in the physical arrangement of protein complexes concurrently for thousands of proteins and hundreds of complexes is described.
Abstract: Living systems integrate biochemical reactions that determine the functional state of each cell. Reactions are primarily mediated by proteins. In proteomic studies, these have been treated as independent entities, disregarding their higher-level organization into complexes that affects their activity and/or function and is thus of great interest for biological research. Here, we describe the implementation of an integrated technique to quantify cell-state-specific changes in the physical arrangement of protein complexes concurrently for thousands of proteins and hundreds of complexes. Applying this technique to a comparison of human cells in interphase and mitosis, we provide a systematic overview of mitotic proteome reorganization. The results recall key hallmarks of mitotic complex remodeling and suggest a model of nuclear pore complex disassembly, which we validate by orthogonal methods. To support the interpretation of quantitative SEC-SWATH-MS datasets, we extend the software CCprofiler and provide an interactive exploration tool, SECexplorer-cc.

Journal ArticleDOI
TL;DR: A modified CRISPRi system leveraging the predictable reduction in efficacy of imperfectly matched sgRNAs to generate defined levels of CRISpri activity is developed and its broad applicability is demonstrated.
Abstract: Summary Essential genes are the hubs of cellular networks, but lack of high-throughput methods for titrating gene expression has limited our understanding of the fitness landscapes against which their expression levels are optimized. We developed a modified CRISPRi system leveraging the predictable reduction in efficacy of imperfectly matched sgRNAs to generate defined levels of CRISPRi activity and demonstrated its broad applicability. Using libraries of mismatched sgRNAs predicted to span the full range of knockdown levels, we characterized the expression-fitness relationships of most essential genes in Escherichia coli and Bacillus subtilis. We find that these relationships vary widely from linear to bimodal but are similar within pathways. Notably, despite ∼2 billion years of evolutionary separation between E. coli and B. subtilis, most essential homologs have similar expression-fitness relationships with rare but informative differences. Thus, the expression levels of essential genes may reflect homeostatic or evolutionary constraints shared between the two organisms.

Journal ArticleDOI
TL;DR: Comparison of humans with dogs reveals a nonlinear relationship that translates dog-to-human years and aligns the timing of major physiological milestones between the two species, with extension to mice.
Abstract: All mammals progress through similar physiological stages throughout life, from early development to puberty, aging, and death Yet, the extent to which this conserved physiology reflects underlying genomic events is unclear Here, we map the common methylation changes experienced by mammalian genomes as they age, focusing on comparison of humans with dogs, an emerging model of aging Using oligo-capture sequencing, we characterize methylomes of 104 Labrador retrievers spanning a 16-year age range, achieving >150× coverage within mammalian syntenic blocks Comparison with human methylomes reveals a nonlinear relationship that translates dog-to-human years and aligns the timing of major physiological milestones between the two species, with extension to mice Conserved changes center on developmental gene networks, which are sufficient to translate age and the effects of anti-aging interventions across multiple mammals These results establish methylation not only as a diagnostic age readout but also as a cross-species translator of physiological aging milestones

Journal ArticleDOI
TL;DR: It is demonstrated that drug-induced changes to ErK dynamics alter the conditions under which cells proliferate, opening the door to high-throughput screens using live-cell biosensors and revealing that cell proliferation integrates information from Erk dynamics as well as additional permissive cues.
Abstract: Summary Complex, time-varying responses have been observed widely in cell signaling, but how specific dynamics are generated or regulated is largely unknown. One major obstacle has been that high-throughput screens are typically incompatible with the live-cell assays used to monitor dynamics. Here, we address this challenge by screening a library of 429 kinase inhibitors and monitoring extracellular-regulated kinase (Erk) activity over 5 h in more than 80,000 single primary mouse keratinocytes. Our screen reveals both known and uncharacterized modulators of Erk dynamics, including inhibitors of non-epidermal growth factor receptor (EGFR) receptor tyrosine kinases (RTKs) that increase Erk pulse frequency and overall activity. Using drug treatment and direct optogenetic control, we demonstrate that drug-induced changes to Erk dynamics alter the conditions under which cells proliferate. Our work opens the door to high-throughput screens using live-cell biosensors and reveals that cell proliferation integrates information from Erk dynamics as well as additional permissive cues.

Journal ArticleDOI
TL;DR: It is proposed that splicing promotes the nuclear export of AU-rich mRNAs and that codon- and splicing-dependent effects on expression are under evolutionary pressure in the human genome.
Abstract: In the human genome, most genes undergo splicing, and patterns of codon usage are splicing dependent: guanine and cytosine (GC) content is the highest within single-exon genes and within first exons of multi-exon genes. However, the effects of codon usage on gene expression are typically characterized in unspliced model genes. Here, we measured the effects of splicing on expression in a panel of synonymous reporter genes that varied in nucleotide composition. We found that high GC content increased protein yield, mRNA yield, cytoplasmic mRNA localization, and translation of unspliced reporters. Splicing did not affect the expression of GC-rich variants. However, splicing promoted the expression of AT-rich variants by increasing their steady-state protein and mRNA levels, in part through promoting cytoplasmic localization of mRNA. We propose that splicing promotes the nuclear export of AU-rich mRNAs and that codon- and splicing-dependent effects on expression are under evolutionary pressure in the human genome.

Journal ArticleDOI
TL;DR: It is shown that persister cells escape drug-induced cell-cycle arrest via brief, sporadic ERK pulses generated by transmembrane receptors and growth factors operating in an autocrine/paracrine manner and generates a persistent population of melanoma cells that rewires MAPK signaling to sustain non-genetic drug resistance.
Abstract: Targeted inhibition of oncogenic pathways can be highly effective in halting the rapid growth of tumors but often leads to the emergence of slowly dividing persister cells, which constitute a reservoir for the selection of drug-resistant clones. In BRAFV600E melanomas, RAF and MEK inhibitors efficiently block oncogenic signaling, but persister cells emerge. Here, we show that persister cells escape drug-induced cell-cycle arrest via brief, sporadic ERK pulses generated by transmembrane receptors and growth factors operating in an autocrine/paracrine manner. Quantitative proteomics and computational modeling show that ERK pulsing is enabled by rewiring of mitogen-activated protein kinase (MAPK) signaling: from an oncogenic BRAFV600E monomer-driven configuration that is drug sensitive to a receptor-driven configuration that involves Ras-GTP and RAF dimers and is highly resistant to RAF and MEK inhibitors. Altogether, this work shows that pulsatile MAPK activation by factors in the microenvironment generates a persistent population of melanoma cells that rewires MAPK signaling to sustain non-genetic drug resistance.

Journal ArticleDOI
TL;DR: Using a single-cell dataset from a patient with colorectal cancer, SCARLET constructs a tumor phylogeny that is consistent with the observed CNAs and suggests an alternate origin for the patient's metastases.
Abstract: Motivation: Single-cell DNA sequencing enables the measurement of somatic mutations in individual tumor cells, and provides data to reconstruct the evolutionary history of the tumor.

Journal ArticleDOI
TL;DR: It is shown that a minimal model of transcriptional bursting and gene interactions can give rise to rare coordinated high expression states, and established principles of gene regulation are sufficient to describe this behavior and argue for its more general existence.
Abstract: Non-genetic transcriptional variability is a potential mechanism for therapy resistance in melanoma. Specifically, rare subpopulations of cells occupy a transient pre-resistant state characterized by coordinated high expression of several genes and survive therapy. How might these rare states arise and disappear within the population? It is unclear whether the canonical models of probabilistic transcriptional pulsing can explain this behavior, or if it requires special, hitherto unidentified mechanisms. We show that a minimal model of transcriptional bursting and gene interactions can give rise to rare coordinated high expression states. These states occur more frequently in networks with low connectivity and depend on three parameters. While entry into these states is initiated by a long transcriptional burst that also triggers entry of other genes, the exit occurs through independent inactivation of individual genes. Together, we demonstrate that established principles of gene regulation are sufficient to describe this behavior and argue for its more general existence. A record of this paper's transparent peer review process is included in the Supplemental Information.

Journal ArticleDOI
TL;DR: A combinatorial machine learning method is presented to evaluate and optimize peptide vaccine formulations for SARS-CoV-2 that optimizes the presentation likelihood of a diverse set of vaccine peptides conditioned on a target human-population HLA haplotype distribution and expected epitope drift.
Abstract: We present a combinatorial machine learning method to evaluate and optimize peptide vaccine formulations for SARS-CoV-2. Our approach optimizes the presentation likelihood of a diverse set of vaccine peptides conditioned on a target human-population HLA haplotype distribution and expected epitope drift. Our proposed SARS-CoV-2 MHC class I vaccine formulations provide 93.21% predicted population coverage with at least five vaccine peptide-HLA average hits per person (≥ 1 peptide: 99.91%) with all vaccine peptides perfectly conserved across 4,690 geographically sampled SARS-CoV-2 genomes. Our proposed MHC class II vaccine formulations provide 97.21% predicted coverage with at least five vaccine peptide-HLA average hits per person with all peptides having an observed mutation probability of ≤ 0.001. We provide an open-source implementation of our design methods (OptiVax), vaccine evaluation tool (EvalVax), as well as the data used in our design efforts here: https://github.com/gifford-lab/optivax.

Journal ArticleDOI
TL;DR: A putative 28-member RNA-binding protein complex associated with amyotrophic lateral sclerosis is validated, suggesting a coordinated function in alternative splicing in disease progression, and a brain interaction map (BraInMap) resource facilitates mechanistic exploration of the unique molecular machinery driving core cellular processes of the central nervous system.
Abstract: Connectivity webs mediate the unique biology of the mammalian brain. Yet, while cell circuit maps are increasingly available, knowledge of their underlying molecular networks remains limited. Here, we applied multi-dimensional biochemical fractionation with mass spectrometry and machine learning to survey endogenous macromolecules across the adult mouse brain. We defined a global "interactome" comprising over one thousand multi-protein complexes. These include hundreds of brain-selective assemblies that have distinct physical and functional attributes, show regional and cell-type specificity, and have links to core neurological processes and disorders. Using reciprocal pull-downs and a transgenic model, we validated a putative 28-member RNA-binding protein complex associated with amyotrophic lateral sclerosis, suggesting a coordinated function in alternative splicing in disease progression. This brain interaction map (BraInMap) resource facilitates mechanistic exploration of the unique molecular machinery driving core cellular processes of the central nervous system. It is publicly available and can be explored here https://www.bu.edu/dbin/cnsb/mousebrain/.

Journal ArticleDOI
TL;DR: Detailed transcriptomic profiling of combinatorial and temporal control mutants identified 81 genes that depend on stimulus- specific NFκB duration for their stimulus-specificity and delineate two gene regulatory strategies that decode stimulus- Specific NFκBs dynamics and determine distinct biological functions.
Abstract: Summary Pathogen-derived lipopolysaccharide (LPS) and cytokine tumor necrosis factor (TNF) activate NFκB with distinct duration dynamics, but how immune response genes decode NFκB duration to produce stimulus-specific expression remains unclear. Here, detailed transcriptomic profiling of combinatorial and temporal control mutants identified 81 genes that depend on stimulus-specific NFκB duration for their stimulus-specificity. Combining quantitative experimentation with mathematical modeling, we found that for some genes a long mRNA half-life allowed effective decoding, but for many genes this was insufficient to account for the data; instead, we found that chromatin mechanisms, such as a slow transition rate between inactive and RelA-bound enhancer states, could also decode NFκB dynamics. Chromatin-mediated decoding is favored by genes acting as immune effectors (e.g., tissue remodelers and T cell recruiters) rather than immune regulators (e.g., signaling proteins and monocyte recruiters). Overall, our results delineate two gene regulatory strategies that decode stimulus-specific NFκB dynamics and determine distinct biological functions.

Journal ArticleDOI
TL;DR: Together, single-cell transcriptomic profiling of CRISPRa-perturbed cells provides both system-level and molecular insights into the mechanisms that orchestrate ZGA.
Abstract: Summary Zygotic genome activation (ZGA) is an essential transcriptional event in embryonic development that coincides with extensive epigenetic reprogramming. Complex manipulation techniques and maternal stores of proteins preclude large-scale functional screens for ZGA regulators within early embryos. Here, we combined pooled CRISPR activation (CRISPRa) with single-cell transcriptomics to identify regulators of ZGA-like transcription in mouse embryonic stem cells, which serve as a tractable, in vitro proxy of early mouse embryos. Using multi-omics factor analysis (MOFA+) applied to ∼200,000 single-cell transcriptomes comprising 230 CRISPRa perturbations, we characterized molecular signatures of ZGA and uncovered 24 factors that promote a ZGA-like response. Follow-up assays validated top screen hits, including the DNA-binding protein Dppa2, the chromatin remodeler Smarca5, and the transcription factor Patz1, and functional experiments revealed that Smarca5’s regulation of ZGA-like transcription is dependent on Dppa2. Together, our single-cell transcriptomic profiling of CRISPRa-perturbed cells provides both system-level and molecular insights into the mechanisms that orchestrate ZGA.

Journal ArticleDOI
TL;DR: The genetic ECG signature for dilated cardiomyopathy is established, the BAG3, HSPB7/CLCNKA, PRKCA, TMEM43, and OBSCN loci are associated with disease risk, and over 300 genetic loci that are statistically associated with the high-dimensional representation of the ECG are identified.
Abstract: The electrocardiogram (ECG) is one of the most useful non-invasive diagnostic tests for a wide array of cardiac disorders. Traditional approaches to analyzing ECGs focus on individual segments. Here, we performed comprehensive deep phenotyping of 77,190 ECGs in the UK Biobank across the complete cycle of cardiac conduction, resulting in 500 spatial-temporal datapoints, across 10 million genetic variants. In addition to characterizing polygenic risk scores for the traditional ECG segments, we identified over 300 genetic loci that are statistically associated with the high-dimensional representation of the ECG. We established the genetic ECG signature for dilated cardiomyopathy, associated the BAG3, HSPB7/CLCNKA, PRKCA, TMEM43, and OBSCN loci with disease risk and confirmed this association in an independent cohort. In total, our work demonstrates that a high-dimensional analysis of the entire ECG provides unique opportunities for studying cardiac biology and disease and furthering drug development. A record of this paper's transparent peer review process is included in the Supplemental Information.