scispace - formally typeset
Search or ask a question

Showing papers in "Molecular Systems Biology in 2021"


Journal ArticleDOI
TL;DR: In this paper, the authors proposed a semi-supervised variant of scVI, called single-cell ANnotation using Variational Inference (scANVI), to leverage existing cell state annotations.
Abstract: As the number of single-cell transcriptomics datasets grows, the natural next step is to integrate the accumulating data to achieve a common ontology of cell types and states. However, it is not straightforward to compare gene expression levels across datasets and to automatically assign cell type labels in a new dataset based on existing annotations. In this manuscript, we demonstrate that our previously developed method, scVI, provides an effective and fully probabilistic approach for joint representation and analysis of scRNA-seq data, while accounting for uncertainty caused by biological and measurement noise. We also introduce single-cell ANnotation using Variational Inference (scANVI), a semi-supervised variant of scVI designed to leverage existing cell state annotations. We demonstrate that scVI and scANVI compare favorably to state-of-the-art methods for data integration and cell state annotation in terms of accuracy, scalability, and adaptability to challenging settings. In contrast to existing methods, scVI and scANVI integrate multiple datasets with a single generative model that can be directly used for downstream tasks, such as differential expression. Both methods are easily accessible through scvi-tools.

126 citations


Journal ArticleDOI
TL;DR: OmniPath provides a single access point to knowledge spanning intra‐ and intercellular processes for data analysis, as it demonstrates in applications studying SARS‐CoV‐2 infection and ulcerative colitis.
Abstract: Molecular knowledge of biological processes is a cornerstone in omics data analysis. Applied to single-cell data, such analyses provide mechanistic insights into individual cells and their interactions. However, knowledge of intercellular communication is scarce, scattered across resources, and not linked to intracellular processes. To address this gap, we combined over 100 resources covering interactions and roles of proteins in inter- and intracellular signaling, as well as transcriptional and post-transcriptional regulation. We added protein complex information and annotations on function, localization, and role in diseases for each protein. The resource is available for human, and via homology translation for mouse and rat. The data are accessible via OmniPath's web service (https://omnipathdb.org/), a Cytoscape plug-in, and packages in R/Bioconductor and Python, providing access options for computational and experimental scientists. We created workflows with tutorials to facilitate the analysis of cell-cell interactions and affected downstream intracellular signaling processes. OmniPath provides a single access point to knowledge spanning intra- and intercellular processes for data analysis, as we demonstrate in applications studying SARS-CoV-2 infection and ulcerative colitis.

116 citations


Journal ArticleDOI
TL;DR: A review of the current state of RNA velocity modeling approaches can be found in this article, where the authors provide guidance on how the ensuing challenges may be addressed and outline future directions on how to generalize the concept of RNA velocities to a wider variety of biological systems and modalities.
Abstract: RNA velocity has enabled the recovery of directed dynamic information from single-cell transcriptomics by connecting measurements to the underlying kinetics of gene expression. This approach has opened up new ways of studying cellular dynamics. Here, we review the current state of RNA velocity modeling approaches, discuss various examples illustrating limitations and potential pitfalls, and provide guidance on how the ensuing challenges may be addressed. We then outline future directions on how to generalize the concept of RNA velocity to a wider variety of biological systems and modalities.

75 citations


Journal ArticleDOI
TL;DR: In this article, a single-cell transcriptomics of SARS-CoV-2-infected intestinal organoids was performed, and the authors identified a subpopulation of enterocytes as the prime target of infection and found the lack of positive correlation between susceptibility to infection and the expression of ACE2.
Abstract: Exacerbated pro-inflammatory immune response contributes to COVID-19 pathology. However, despite the mounting evidence about SARS-CoV-2 infecting the human gut, little is known about the antiviral programs triggered in this organ. To address this gap, we performed single-cell transcriptomics of SARS-CoV-2-infected intestinal organoids. We identified a subpopulation of enterocytes as the prime target of SARS-CoV-2 and, interestingly, found the lack of positive correlation between susceptibility to infection and the expression of ACE2. Infected cells activated strong pro-inflammatory programs and produced interferon, while expression of interferon-stimulated genes was limited to bystander cells due to SARS-CoV-2 suppressing the autocrine action of interferon. These findings reveal that SARS-CoV-2 curtails the immune response and highlights the gut as a pro-inflammatory reservoir that should be considered to fully understand SARS-CoV-2 pathogenesis.

61 citations


Journal ArticleDOI
TL;DR: In this paper, a causal-oriented search of multi-Omics space (COSMOS) method was proposed to integrate phosphoproteomics, transcriptomics, and metabolomics datasets.
Abstract: Multi-omics datasets can provide molecular insights beyond the sum of individual omics. Various tools have been recently developed to integrate such datasets, but there are limited strategies to systematically extract mechanistic hypotheses from them. Here, we present COSMOS (Causal Oriented Search of Multi-Omics Space), a method that integrates phosphoproteomics, transcriptomics, and metabolomics datasets. COSMOS combines extensive prior knowledge of signaling, metabolic, and gene regulatory networks with computational methods to estimate activities of transcription factors and kinases as well as network-level causal reasoning. COSMOS provides mechanistic hypotheses for experimental observations across multi-omics datasets. We applied COSMOS to a dataset comprising transcriptomics, phosphoproteomics, and metabolomics data from healthy and cancerous tissue from eleven clear cell renal cell carcinoma (ccRCC) patients. COSMOS was able to capture relevant crosstalks within and between multiple omics layers, such as known ccRCC drug targets. We expect that our freely available method will be broadly useful to extract mechanistic insights from multi-omics studies.

58 citations


Journal ArticleDOI
TL;DR: In this paper, the authors assess mathematical model reproducibility and propose a scorecard for improving reproducible results in this field, which is a key element of science and credibility.
Abstract: Reproducibility of scientific results is a key element of science and credibility. The lack of reproducibility across many scientific fields has emerged as an important concern. In this piece, we assess mathematical model reproducibility and propose a scorecard for improving reproducibility in this field.

57 citations


Journal ArticleDOI
TL;DR: A review of MS techniques that have been instrumental for the identification of protein-protein interactions at a system-level can be found in this article, where the challenges associated with these methodologies as well as novel MS advancements that aim to address these challenges are discussed.
Abstract: A better understanding of the molecular mechanisms underlying disease is key for expediting the development of novel therapeutic interventions. Disease mechanisms are often mediated by interactions between proteins. Insights into the physical rewiring of protein-protein interactions in response to mutations, pathological conditions, or pathogen infection can advance our understanding of disease etiology, progression, and pathogenesis and can lead to the identification of potential druggable targets. Advances in quantitative mass spectrometry (MS)-based approaches have allowed unbiased mapping of these disease-mediated changes in protein-protein interactions on a global scale. Here, we review MS techniques that have been instrumental for the identification of protein-protein interactions at a system-level, and we discuss the challenges associated with these methodologies as well as novel MS advancements that aim to address these challenges. An overview of examples from diverse disease contexts illustrates the potential of MS-based protein-protein interaction mapping approaches for revealing disease mechanisms, pinpointing new therapeutic targets, and eventually moving toward personalized applications.

55 citations


Journal ArticleDOI
TL;DR: In this paper, the authors developed a machine learning framework to identify protein complexes in over 15,000 mass spectrometry experiments which resulted in the identification of nearly 7,000 physical assemblies.
Abstract: A general principle of biology is the self-assembly of proteins into functional complexes. Characterizing their composition is, therefore, required for our understanding of cellular functions. Unfortunately, we lack knowledge of the comprehensive set of identities of protein complexes in human cells. To address this gap, we developed a machine learning framework to identify protein complexes in over 15,000 mass spectrometry experiments which resulted in the identification of nearly 7,000 physical assemblies. We show our resource, hu.MAP 2.0, is more accurate and comprehensive than previous state of the art high-throughput protein complex resources and gives rise to many new hypotheses, including for 274 completely uncharacterized proteins. Further, we identify 253 promiscuous proteins that participate in multiple complexes pointing to possible moonlighting roles. We have made hu.MAP 2.0 easily searchable in a web interface (http://humap2.proteincomplexes.org/), which will be a valuable resource for researchers across a broad range of interests including systems biology, structural biology, and molecular explanations of disease.

50 citations


Journal ArticleDOI
TL;DR: In this article, the authors developed a versatile mass spectrometric workflow based on data-independent acquisition proteomics (DIA/SWATH) together with a novel protein inference algorithm (xTop) to accurately quantify absolute protein abundances in Escherichia coli for > 2,000 proteins over > 60 growth conditions, including nutrient limitations, non-metabolic stresses, and nonplanktonic states.
Abstract: Accurate measurements of cellular protein concentrations are invaluable to quantitative studies of gene expression and physiology in living cells. Here, we developed a versatile mass spectrometric workflow based on data-independent acquisition proteomics (DIA/SWATH) together with a novel protein inference algorithm (xTop). We used this workflow to accurately quantify absolute protein abundances in Escherichia coli for > 2,000 proteins over > 60 growth conditions, including nutrient limitations, non-metabolic stresses, and non-planktonic states. The resulting high-quality dataset of protein mass fractions allowed us to characterize proteome responses from a coarse (groups of related proteins) to a fine (individual) protein level. Hereby, a plethora of novel biological findings could be elucidated, including the generic upregulation of low-abundant proteins under various metabolic limitations, the non-specificity of catabolic enzymes upregulated under carbon limitation, the lack of large-scale proteome reallocation under stress compared to nutrient limitations, as well as surprising strain-dependent effects important for biofilm formation. These results present valuable resources for the systems biology community and can be used for future multi-omics studies of gene regulation and metabolic control in E. coli.

47 citations


Journal ArticleDOI
TL;DR: The findings raise the question whether and to which degree these reciprocal drug–microbiome interactions will differ across individuals, and how to take them into account in drug discovery and precision medicine.
Abstract: Broad-spectrum antibiotics target multiple gram-positive and gram-negative bacteria, and can collaterally damage the gut microbiota. Yet, our knowledge of the extent of damage, the antibiotic activity spectra, and the resistance mechanisms of gut microbes is sparse. This limits our ability to mitigate microbiome-facilitated spread of antibiotic resistance. In addition to antibiotics, non-antibiotic drugs affect the human microbiome, as shown by metagenomics as well as in vitro studies. Microbiome-drug interactions are bidirectional, as microbes can also modulate drugs. Chemical modifications of antibiotics mostly function as antimicrobial resistance mechanisms, while metabolism of non-antibiotics can also change the drugs' pharmacodynamic, pharmacokinetic, and toxic properties. Recent studies have started to unravel the extensive capacity of gut microbes to metabolize drugs, the mechanisms, and the relevance of such events for drug treatment. These findings raise the question whether and to which degree these reciprocal drug-microbiome interactions will differ across individuals, and how to take them into account in drug discovery and precision medicine. This review describes recent developments in the field and discusses future study areas that will benefit from systems biology approaches to better understand the mechanistic role of the human gut microbiota in drug actions.

45 citations


Journal ArticleDOI
TL;DR: ProBatch as mentioned in this paper is a step-by-step protocol for the assessment, normalization, and batch correction of proteomic data, which is based on a set of techniques that enable control of batch effect adjustment quality.
Abstract: Advancements in mass spectrometry-based proteomics have enabled experiments encompassing hundreds of samples. While these large sample sets deliver much-needed statistical power, handling them introduces technical variability known as batch effects. Here, we present a step-by-step protocol for the assessment, normalization, and batch correction of proteomic data. We review established methodologies from related fields and describe solutions specific to proteomic challenges, such as ion intensity drift and missing values in quantitative feature matrices. Finally, we compile a set of techniques that enable control of batch effect adjustment quality. We provide an R package, "proBatch", containing functions required for each step of the protocol. We demonstrate the utility of this methodology on five proteomic datasets each encompassing hundreds of samples and consisting of multiple experimental designs. In conclusion, we provide guidelines and tools to make the extraction of true biological signal from large proteomic studies more robust and transparent, ultimately facilitating reliable and reproducible research in clinical proteomics and systems biology.

Journal ArticleDOI
TL;DR: The large‐scale measurement of the genotype‐phenotype landscape for an allosteric protein: the lac repressor from Escherichia coli, LacI is reported, and a new band‐stop phenotype is discovered that challenges conventional models of allostery and that emerges from combinations of nearly silent amino acid substitutions.
Abstract: Allostery is a fundamental biophysical mechanism that underlies cellular sensing, signaling, and metabolism Yet a quantitative understanding of allosteric genotype-phenotype relationships remains elusive Here, we report the large-scale measurement of the genotype-phenotype landscape for an allosteric protein: the lac repressor from Escherichia coli, LacI Using a method that combines long-read and short-read DNA sequencing, we quantitatively measure the dose-response curves for nearly 105 variants of the LacI genetic sensor The resulting data provide a quantitative map of the effect of amino acid substitutions on LacI allostery and reveal systematic sequence-structure-function relationships We find that in many cases, allosteric phenotypes can be quantitatively predicted with additive or neural-network models, but unpredictable changes also occur For example, we were surprised to discover a new band-stop phenotype that challenges conventional models of allostery and that emerges from combinations of nearly silent amino acid substitutions

Journal ArticleDOI
TL;DR: A permutation‐based method is designed that empirically evaluates GO terms reported by AMI methods and is used to fashion five novel AMI performance criteria that outperformed the other six algorithms in extensive testing on GE and GWAS data.
Abstract: Algorithms for active module identification (AMI) are central to analysis of omics data. Such algorithms receive a gene network and nodes' activity scores as input and report subnetworks that show significant over-representation of accrued activity signal ("active modules"), thus representing biological processes that presumably play key roles in the analyzed conditions. Here, we systematically evaluated six popular AMI methods on gene expression and GWAS data. We observed that GO terms enriched in modules detected on the real data were often also enriched on modules found on randomly permuted data. This indicated that AMI methods frequently report modules that are not specific to the biological context measured by the analyzed omics dataset. To tackle this bias, we designed a permutation-based method that empirically evaluates GO terms reported by AMI methods. We used the method to fashion five novel AMI performance criteria. Last, we developed DOMINO, a novel AMI algorithm, that outperformed the other six algorithms in extensive testing on GE and GWAS data. Software is available at https://github.com/Shamir-Lab.

Journal ArticleDOI
TL;DR: In this paper, the authors evaluate how well the simple Arrhenius equation predicts complex multi-step biological processes, using frog and fruit fly embryogenesis as two canonical models, and they find that the arithm provides a good approximation for the temperature dependence of embryogenesis, even though individual developmental intervals scale differently with temperature.
Abstract: The famous Arrhenius equation is well suited to describing the temperature dependence of chemical reactions but has also been used for complicated biological processes. Here, we evaluate how well the simple Arrhenius equation predicts complex multi-step biological processes, using frog and fruit fly embryogenesis as two canonical models. We find that the Arrhenius equation provides a good approximation for the temperature dependence of embryogenesis, even though individual developmental intervals scale differently with temperature. At low and high temperatures, however, we observed significant departures from idealized Arrhenius Law behavior. When we model multi-step reactions of idealized chemical networks, we are unable to generate comparable deviations from linearity. In contrast, we find the two enzymes GAPDH and β-galactosidase show non-linearity in the Arrhenius plot similar to our observations of embryonic development. Thus, we find that complex embryonic development can be well approximated by the simple Arrhenius equation regardless of non-uniform developmental scaling and propose that the observed departure from this law likely results more from non-idealized individual steps rather than from the complexity of the system.

Journal ArticleDOI
TL;DR: In this paper, a computational framework for joint cell segmentation and cell type annotation that utilizes prior knowledge of cell type-specific gene expression was developed, which can be leveraged to improve the accuracy of RNA hybridization-based spatial transcriptomics while providing highly granular cell (sub)type information.
Abstract: RNA hybridization-based spatial transcriptomics provides unparalleled detection sensitivity. However, inaccuracies in segmentation of image volumes into cells cause misassignment of mRNAs which is a major source of errors. Here, we develop JSTA, a computational framework for joint cell segmentation and cell type annotation that utilizes prior knowledge of cell type-specific gene expression. Simulation results show that leveraging existing cell type taxonomy increases RNA assignment accuracy by more than 45%. Using JSTA, we were able to classify cells in the mouse hippocampus into 133 (sub)types revealing the spatial organization of CA1, CA3, and Sst neuron subtypes. Analysis of within cell subtype spatial differential gene expression of 80 candidate genes identified 63 with statistically significant spatial differential gene expression across 61 (sub)types. Overall, our work demonstrates that known cell type expression patterns can be leveraged to improve the accuracy of RNA hybridization-based spatial transcriptomics while providing highly granular cell (sub)type information. The large number of newly discovered spatial gene expression patterns substantiates the need for accurate spatial transcriptomic measurements that can provide information beyond cell (sub)type labels.

Journal ArticleDOI
Marek Ostaszewski1, Anna Niarakis2, Anna Niarakis3, Alexander Mazein1  +155 moreInstitutions (52)
TL;DR: The COVID-19 Disease Map (C19DMap) as mentioned in this paper is a large-scale community effort to build an open access, interoperable and computable repository of COVID19 molecular mechanisms.
Abstract: We need to effectively combine the knowledge from surging literature with complex datasets to propose mechanistic models of SARS-CoV-2 infection, improving data interpretation and predicting key targets of intervention. Here, we describe a large-scale community effort to build an open access, interoperable and computable repository of COVID-19 molecular mechanisms. The COVID-19 Disease Map (C19DMap) is a graphical, interactive representation of disease-relevant molecular mechanisms linking many knowledge sources. Notably, it is a computational resource for graph-based analyses and disease modelling. To this end, we established a framework of tools, platforms and guidelines necessary for a multifaceted community of biocurators, domain experts, bioinformaticians and computational biologists. The diagrams of the C19DMap, curated from the literature, are integrated with relevant interaction and text mining databases. We demonstrate the application of network analysis and modelling approaches by concrete examples to highlight new testable hypotheses. This framework helps to find signatures of SARS-CoV-2 predisposition, treatment response or prioritisation of drug candidates. Such an approach may help deal with new waves of COVID-19 or similar pandemics in the long-term perspective.

Journal ArticleDOI
TL;DR: A comprehensive review of tissue clearing protocols can be found in this article, with an emphasis on DISCO clearing protocols, which have been widely used not only due to their robustness, but also owing to their relatively straightforward application.
Abstract: Histological analysis of biological tissues by mechanical sectioning is significantly time-consuming and error-prone due to loss of important information during sample slicing. In the recent years, the development of tissue clearing methods overcame several of these limitations and allowed exploring intact biological specimens by rendering tissues transparent and subsequently imaging them by laser scanning fluorescence microscopy. In this review, we provide a guide for scientists who would like to perform a clearing protocol from scratch without any prior knowledge, with an emphasis on DISCO clearing protocols, which have been widely used not only due to their robustness, but also owing to their relatively straightforward application. We discuss diverse tissue-clearing options and propose solutions for several possible pitfalls. Moreover, after surveying more than 30 researchers that employ tissue clearing techniques in their laboratories, we compiled the most frequently encountered issues and propose solutions. Overall, this review offers an informative and detailed guide through the growing literature of tissue clearing and can help with finding the easiest way for hands-on implementation.

Journal ArticleDOI
TL;DR: In this paper, a proteome-constrained genome-scale metabolic model of Lactococcus lactis is presented to interpret growth on multiple nutrients, and the model predicts glucose and arginine uptake as dominant constraints at low growth rates.
Abstract: Cells adapt to different conditions via gene expression that tunes metabolism for maximal fitness. Constraints on cellular proteome may limit such expression strategies and introduce trade-offs. Resource allocation under proteome constraints has explained regulatory strategies in bacteria. It is unclear, however, to what extent these constraints can predict evolutionary changes, especially for microorganisms that evolved under nutrient-rich conditions, i.e., multiple available nitrogen sources, such as Lactococcus lactis. Here, we present a proteome-constrained genome-scale metabolic model of L. lactis (pcLactis) to interpret growth on multiple nutrients. Through integration of proteomics and flux data, in glucose-limited chemostats, the model predicted glucose and arginine uptake as dominant constraints at low growth rates. Indeed, glucose and arginine catabolism were found upregulated in evolved mutants. At high growth rates, pcLactis correctly predicted the observed shutdown of arginine catabolism because limited proteome availability favored lactate for ATP production. Thus, our model-based analysis is able to identify and explain the proteome constraints that limit growth rate in nutrient-rich environments and thus form targets of fitness improvement.

Journal ArticleDOI
TL;DR: In this article, the authors optimized the transfer efficiency of conjugative plasmid TP114 using accelerated laboratory evolution, which can eliminate > 99.9% of targeted antibiotic-resistant Escherichia coli in the mouse gut microbiota using a single dose.
Abstract: Antibiotic resistance threatens our ability to treat infectious diseases, spurring interest in alternative antimicrobial technologies. The use of bacterial conjugation to deliver CRISPR-cas systems programmed to precisely eliminate antibiotic-resistant bacteria represents a promising approach but requires high in situ DNA transfer rates. We have optimized the transfer efficiency of conjugative plasmid TP114 using accelerated laboratory evolution. We hence generated a potent conjugative delivery vehicle for CRISPR-cas9 that can eliminate > 99.9% of targeted antibiotic-resistant Escherichia coli in the mouse gut microbiota using a single dose. We then applied this system to a Citrobacter rodentium infection model, achieving full clearance within four consecutive days of treatment.

Journal ArticleDOI
TL;DR: In this paper, the authors applied both affinity purification mass spectrometry (AP-MS) and the complementary proximity-based labeling method (BioID-MS), to map the interactions relevant to viral processing.
Abstract: Treatment options for COVID-19, caused by SARS-CoV-2, remain limited. Understanding viral pathogenesis at the molecular level is critical to develop effective therapy. Some recent studies have explored SARS-CoV-2-host interactomes and provided great resources for understanding viral replication. However, host proteins that functionally associate with SARS-CoV-2 are localized in the corresponding subnetwork within the comprehensive human interactome. Therefore, constructing a downstream network including all potential viral receptors, host cell proteases, and cofactors is necessary and should be used as an additional criterion for the validation of critical host machineries used for viral processing. This study applied both affinity purification mass spectrometry (AP-MS) and the complementary proximity-based labeling MS method (BioID-MS) on 29 viral ORFs and 18 host proteins with potential roles in viral replication to map the interactions relevant to viral processing. The analysis yields a list of 693 hub proteins sharing interactions with both viral baits and host baits and revealed their biological significance for SARS-CoV-2. Those hub proteins then served as a rational resource for drug repurposing via a virtual screening approach. The overall process resulted in the suggested repurposing of 59 compounds for 15 protein targets. Furthermore, antiviral effects of some candidate drugs were observed in vitro validation using image-based drug screen with infectious SARS-CoV-2. In addition, our results suggest that the antiviral activity of methotrexate could be associated with its inhibitory effect on specific protein-protein interactions.

Journal ArticleDOI
TL;DR: In this paper, the authors found that de novo transconjugants grew significantly slower and/or with overall prolonged lag times compared to lineages that had been replicating for several generations, indicating the presence of a plasmid acquisition cost.
Abstract: Plasmid conjugation is a major mechanism responsible for the spread of antibiotic resistance. Plasmid fitness costs are known to impact long-term growth dynamics of microbial populations by providing plasmid-carrying cells a relative (dis)advantage compared to plasmid-free counterparts. Separately, plasmid acquisition introduces an immediate, but transient, metabolic perturbation. However, the impact of these short-term effects on subsequent growth dynamics has not previously been established. Here, we observed that de novo transconjugants grew significantly slower and/or with overall prolonged lag times, compared to lineages that had been replicating for several generations, indicating the presence of a plasmid acquisition cost. These effects were general to diverse incompatibility groups, well-characterized and clinically captured plasmids, Gram-negative recipient strains and species, and experimental conditions. Modeling revealed that both fitness and acquisition costs modulate overall conjugation dynamics, validated with previously published data. These results suggest that the hours immediately following conjugation may play a critical role in both short- and long-term plasmid prevalence. This time frame is particularly relevant to microbiomes with high plasmid/strain diversity considered to be hot spots for conjugation.

Journal ArticleDOI
TL;DR: Yeast Estradiol strains with Titratable Induction (YETI) as mentioned in this paper are a collection of yeast genes that have been engineered for transcriptional inducibility with single-gene precision at their native loci and without plasmids.
Abstract: The ability to switch a gene from off to on and monitor dynamic changes provides a powerful approach for probing gene function and elucidating causal regulatory relationships. Here, we developed and characterized YETI (Yeast Estradiol strains with Titratable Induction), a collection in which > 5,600 yeast genes are engineered for transcriptional inducibility with single-gene precision at their native loci and without plasmids. Each strain contains SGA screening markers and a unique barcode, enabling high-throughput genetics. We characterized YETI using growth phenotyping and BAR-seq screens, and we used a YETI allele to identify the regulon of Rof1, showing that it acts to repress transcription. We observed that strains with inducible essential genes that have low native expression can often grow without inducer. Analysis of data from eukaryotic and prokaryotic systems shows that native expression is a variable that can bias promoter-perturbing screens, including CRISPRi. We engineered a second expression system, Z3 EB42, that gives lower expression than Z3 EV, a feature enabling conditional activation and repression of lowly expressed essential genes that grow without inducer in the YETI library.

Journal ArticleDOI
TL;DR: In this article, live and super-resolution microscopy in zebrafish embryos was used to reveal RNA polymerase II (Pol II) clusters with a large variety of shapes, which can be explained by a theoretical model in which regulatory chromatin regions provide surfaces for liquid-phase condensation at concentrations that are too low for canonical liquid-liquid phase separation.
Abstract: It is essential for cells to control which genes are transcribed into RNA. In eukaryotes, two major control points are recruitment of RNA polymerase II (Pol II) into a paused state, and subsequent pause release toward transcription. Pol II recruitment and pause release occur in association with macromolecular clusters, which were proposed to be formed by a liquid-liquid phase separation mechanism. How such a phase separation mechanism relates to the interaction of Pol II with DNA during recruitment and transcription, however, remains poorly understood. Here, we use live and super-resolution microscopy in zebrafish embryos to reveal Pol II clusters with a large variety of shapes, which can be explained by a theoretical model in which regulatory chromatin regions provide surfaces for liquid-phase condensation at concentrations that are too low for canonical liquid-liquid phase separation. Model simulations and chemical perturbation experiments indicate that recruited Pol II contributes to the formation of these surface-associated condensates, whereas elongating Pol II is excluded from these condensates and thereby drives their unfolding.

Journal ArticleDOI
TL;DR: In this paper, the authors discuss emerging imaging techniques ranging from light microscopy to electron microscopy that enable investigation of genome folding and dynamics at high spatial and temporal resolution, unveiling principles underlying the spatial arrangement of the genome and its potential functional links to diverse biological activities in the nucleus.
Abstract: Probing the architecture, mechanism, and dynamics of genome folding is fundamental to our understanding of genome function in homeostasis and disease. Most chromosome conformation capture studies dissect the genome architecture with population- and time-averaged snapshots and thus have limited capabilities to reveal 3D nuclear organization and dynamics at the single-cell level. Here, we discuss emerging imaging techniques ranging from light microscopy to electron microscopy that enable investigation of genome folding and dynamics at high spatial and temporal resolution. Results from these studies complement genomic data, unveiling principles underlying the spatial arrangement of the genome and its potential functional links to diverse biological activities in the nucleus.

Journal ArticleDOI
TL;DR: Pyhamilton as mentioned in this paper is an open-source Python platform that can execute complex pipetting patterns required for custom high-throughput experiments such as the simulation of metapopulation dynamics.
Abstract: Our understanding of complex living systems is limited by our capacity to perform experiments in high throughput. While robotic systems have automated many traditional hand-pipetting protocols, software limitations have precluded more advanced maneuvers required to manipulate, maintain, and monitor hundreds of experiments in parallel. Here, we present Pyhamilton, an open-source Python platform that can execute complex pipetting patterns required for custom high-throughput experiments such as the simulation of metapopulation dynamics. With an integrated plate reader, we maintain nearly 500 remotely monitored bacterial cultures in log-phase growth for days without user intervention by taking regular density measurements to adjust the robotic method in real-time. Using these capabilities, we systematically optimize bioreactor protein production by monitoring the fluorescent protein expression and growth rates of a hundred different continuous culture conditions in triplicate to comprehensively sample the carbon, nitrogen, and phosphorus fitness landscape. Our results demonstrate that flexible software can empower existing hardware to enable new types and scales of experiments, empowering areas from biomanufacturing to fundamental biology.

Journal ArticleDOI
TL;DR: In this article, a set of repurposable and investigational drugs are identified as potential therapeutics against COVID-19 using a quantitative system pharmacology approach, which were deduced from the gene expression signature of SARS-CoV-2-infected A549 cells screened against Connectivity Map and prioritized by network proximity analysis with respect to disease modules in the viral-host interactome.
Abstract: Understanding the mechanism of SARS-CoV-2 infection and identifying potential therapeutics are global imperatives. Using a quantitative systems pharmacology approach, we identified a set of repurposable and investigational drugs as potential therapeutics against COVID-19. These were deduced from the gene expression signature of SARS-CoV-2-infected A549 cells screened against Connectivity Map and prioritized by network proximity analysis with respect to disease modules in the viral-host interactome. We also identified immuno-modulating compounds aiming at suppressing hyperinflammatory responses in severe COVID-19 patients, based on the transcriptome of ACE2-overexpressing A549 cells. Experiments with Vero-E6 cells infected by SARS-CoV-2, as well as independent syncytia formation assays for probing ACE2/SARS-CoV-2 spike protein-mediated cell fusion using HEK293T and Calu-3 cells, showed that several predicted compounds had inhibitory activities. Among them, salmeterol, rottlerin, and mTOR inhibitors exhibited antiviral activities in Vero-E6 cells; imipramine, linsitinib, hexylresorcinol, ezetimibe, and brompheniramine impaired viral entry. These novel findings provide new paths for broadening the repertoire of compounds pursued as therapeutics against COVID-19.

Journal ArticleDOI
TL;DR: In this article, the genome-reduced human lung pathogen Mycoplasma pneumoniae was used as a live biotherapeutic to treat biofilm-associated bacterial infections.
Abstract: Bacteria present a promising delivery system for treating human diseases. Here, we engineered the genome-reduced human lung pathogen Mycoplasma pneumoniae as a live biotherapeutic to treat biofilm-associated bacterial infections. This strain has a unique genetic code, which hinders gene transfer to most other bacterial genera, and it lacks a cell wall, which allows it to express proteins that target peptidoglycans of pathogenic bacteria. We first determined that removal of the pathogenic factors fully attenuated the chassis strain in vivo. We then designed synthetic promoters and identified an endogenous peptide signal sequence that, when fused to heterologous proteins, promotes efficient secretion. Based on this, we equipped the chassis strain with a genetic platform designed to secrete antibiofilm and bactericidal enzymes, resulting in a strain capable of dissolving Staphylococcus aureus biofilms preformed on catheters in vitro, ex vivo, and in vivo. To our knowledge, this is the first engineered genome-reduced bacterium that can fight against clinically relevant biofilm-associated bacterial infections.

Journal ArticleDOI
TL;DR: In this paper, the authors show that HMGB1 is also a bona fide RNA-binding protein (RBP) binding hundreds of mRNAs and the mRNA of many of these RBPs are directly bound by HMGB 1 and regulate availability of SASP-relevant transcripts.
Abstract: Spatial organization and gene expression of mammalian chromosomes are maintained and regulated in conjunction with cell cycle progression. This is perturbed once cells enter senescence and the highly abundant HMGB1 protein is depleted from nuclei to act as an extracellular proinflammatory stimulus. Despite its physiological importance, we know little about the positioning of HMGB1 on chromatin and its nuclear roles. To address this, we mapped HMGB1 binding genome-wide in two primary cell lines. We integrated ChIP-seq and Hi-C with graph theory to uncover clustering of HMGB1-marked topological domains that harbor genes involved in paracrine senescence. Using simplified Cross-Linking and Immuno-Precipitation and functional tests, we show that HMGB1 is also a bona fide RNA-binding protein (RBP) binding hundreds of mRNAs. It presents an interactome rich in RBPs implicated in senescence regulation. The mRNAs of many of these RBPs are directly bound by HMGB1 and regulate availability of SASP-relevant transcripts. Our findings reveal a broader than hitherto assumed role for HMGB1 in coordinating chromatin folding and RNA homeostasis as part of a regulatory loop controlling cell-autonomous and paracrine senescence.

Journal ArticleDOI
TL;DR: In this article, the authors used single-cell RNA sequencing to examine the cell cycle states of expanding human neural stem cells (hNSCs) and constructed a cell cycle classifier that identifies traditional cell cycle phases and a putative quiescent-like state in neuroepithelial-derived cells during mammalian neurogenesis and in gliomas.
Abstract: Single-cell RNA sequencing has emerged as a powerful tool for resolving cellular states associated with normal and maligned developmental processes. Here, we used scRNA-seq to examine the cell cycle states of expanding human neural stem cells (hNSCs). From these data, we constructed a cell cycle classifier that identifies traditional cell cycle phases and a putative quiescent-like state in neuroepithelial-derived cell types during mammalian neurogenesis and in gliomas. The Neural G0 markers are enriched with quiescent NSC genes and other neurodevelopmental markers found in non-dividing neural progenitors. Putative glioblastoma stem-like cells were significantly enriched in the Neural G0 cell population. Neural G0 cell populations and gene expression are significantly associated with less aggressive tumors and extended patient survival for gliomas. Genetic screens to identify modulators of Neural G0 revealed that knockout of genes associated with the Hippo/Yap and p53 pathways diminished Neural G0 in vitro, resulting in faster G1 transit, down-regulation of quiescence-associated markers, and loss of Neural G0 gene expression. Thus, Neural G0 represents a dynamic quiescent-like state found in neuroepithelial-derived cells and gliomas.

Journal ArticleDOI
TL;DR: In this article, the consequences of mutation at each position were used to define a mutational landscape and identified 100 functional amino acid subtypes with a data-driven clustering analysis, including their frequencies and chemical properties such as tolerating polarity, hydrophobicity or being intolerant of charge or specific amino acids.
Abstract: Amino acids fulfil a diverse range of roles in proteins, each utilising its chemical properties in different ways in different contexts to create required functions. For example, cysteines form disulphide or hydrogen bonds in different circumstances and charged amino acids do not always make use of their charge. The repertoire of amino acid functions and the frequency at which they occur in proteins remains understudied. Measuring large numbers of mutational consequences, which can elucidate the role an amino acid plays, was prohibitively time-consuming until recent developments in deep mutational scanning. In this study, we gathered data from 28 deep mutational scanning studies, covering 6,291 positions in 30 proteins, and used the consequences of mutation at each position to define a mutational landscape. We demonstrated rich relationships between this landscape and biophysical or evolutionary properties. Finally, we identified 100 functional amino acid subtypes with a data-driven clustering analysis and studied their features, including their frequencies and chemical properties such as tolerating polarity, hydrophobicity or being intolerant of charge or specific amino acids. The mutational landscape and amino acid subtypes provide a foundational catalogue of amino acid functional diversity, which will be refined as the number of studied protein positions increases.