scispace - formally typeset
Search or ask a question

Showing papers by "Mikael Bodén published in 2022"


Journal ArticleDOI
TL;DR: Graphical Representation of Ancestral Sequence Predictions (GRASP) as mentioned in this paper is an ASR tool that maps indel evolution throughout a reconstruction and enables the evaluation of indel variants.
Abstract: Analyzing the natural evolution of proteins by ancestral sequence reconstruction (ASR) can provide valuable information about the changes in sequence and structure that drive the development of novel protein functions. However, ASR has also been used as a protein engineering tool, as it often generates thermostable proteins which can serve as robust and evolvable templates for enzyme engineering. Importantly, ASR has the potential to provide an insight into the history of insertions and deletions that have occurred in the evolution of a protein family. Indels are strongly associated with functional change during enzyme evolution and represent a largely unexplored source of genetic diversity for designing proteins with novel or improved properties. Current ASR methods differ in the way they handle indels; inclusion or exclusion of indels is often managed subjectively, based on assumptions the user makes about the likelihood of each recombination event, yet most currently available ASR tools provide limited, if any, opportunities for evaluating indel placement in a reconstructed sequence. Graphical Representation of Ancestral Sequence Predictions (GRASP) is an ASR tool that maps indel evolution throughout a reconstruction and enables the evaluation of indel variants. This chapter provides a general protocol for performing a reconstruction using GRASP and using the results to create indel variants. The method addresses protein template selection, sequence curation, alignment refinement, tree building, ancestor reconstruction, evaluation of indel variants and approaches to library development.

8 citations


Journal ArticleDOI
TL;DR:
Abstract: Abstract The cytochrome P450 family 1 enzymes (CYP1s) are a diverse family of hemoprotein monooxygenases, which metabolize many xenobiotics including numerous environmental carcinogens. However, their historical function and evolution remain largely unstudied. Here we investigate CYP1 evolution via the reconstruction and characterization of the vertebrate CYP1 ancestors. Younger ancestors and extant forms generally demonstrated higher activity toward typical CYP1 xenobiotic and steroid substrates than older ancestors, suggesting significant diversification away from the original CYP1 function. Caffeine metabolism appears to be a recently evolved trait of the CYP1A subfamily, observed in the mammalian CYP1A lineage, and may parallel the recent evolution of caffeine synthesis in multiple separate plant species. Likewise, the aryl hydrocarbon receptor agonist, 6-formylindolo[3,2-b]carbazole (FICZ) was metabolized to a greater extent by certain younger ancestors and extant forms, suggesting that activity toward FICZ increased in specific CYP1 evolutionary branches, a process that may have occurred in parallel to the exploitation of land where UV-exposure was higher than in aquatic environments. As observed with previous reconstructions of P450 enzymes, thermostability correlated with evolutionary age; the oldest ancestor was up to 35 °C more thermostable than the extant forms, with a 10T50 (temperature at which 50% of the hemoprotein remains intact after 10 min) of 71 °C. This robustness may have facilitated evolutionary diversification of the CYP1s by buffering the destabilizing effects of mutations that conferred novel functions, a phenomenon which may also be useful in exploiting the catalytic versatility of these ancestral enzymes for commercial application as biocatalysts.

6 citations


Posted ContentDOI
27 Oct 2022-bioRxiv
TL;DR: A versatile cell multiplexing and data analysis platform to accelerate knowledge gain into mechanisms of cell differentiation and a new computational analysis pipeline to study cell differentiation applicable to diverse fields of developmental biology, drug discovery, and disease modelling are developed.
Abstract: This study develops a versatile cell multiplexing and data analysis platform to gain knowledge gain into mechanisms of cell differentiation. We engineer a cell barcoding system in human cells enabling multiplexed single-cell RNA sequencing for high throughput perturbation of customisable and diverse experimental conditions. This is coupled with a new computational analysis pipeline that overcomes the limitations of conventional algorithms by using an unsupervised, genome-wide, orthogonal biological reference point to reveal the cell diversity and regulatory networks in the input scRNA-seq data set. We implement this pipeline by engineering transcribed barcodes into induced pluripotent stem cells and multiplex 62 independent experimental conditions comprising eight differentiation time points and nine developmental signalling perturbations in duplicates. We identify and deconstruct the temporal, signalling, and gene regulatory imperatives of iPSC differentiation into cell types of ectoderm, mesoderm, and endoderm lineages. This study provides a cellular and computational pipeline to study cell differentiation applicable to studies in developmental biology, drug discovery, and disease modelling.

2 citations


Journal ArticleDOI
TL;DR: It is indicated that many aspects of the hypothalamic gene regulatory flow can proceed without the key H3K27me3 epigenetic repressor mark, but points to a unique sensitivity of particular neuronal subtypes to a disrupted epigenomic landscape.
Abstract: ABSTRACT The hypothalamus displays staggering cellular diversity, chiefly established during embryogenesis by the interplay of several signalling pathways and a battery of transcription factors. However, the contribution of epigenetic cues to hypothalamus development remains unclear. We mutated the polycomb repressor complex 2 gene Eed in the developing mouse hypothalamus, which resulted in the loss of H3K27me3, a fundamental epigenetic repressor mark. This triggered ectopic expression of posteriorly expressed regulators (e.g. Hox homeotic genes), upregulation of cell cycle inhibitors and reduced proliferation. Surprisingly, despite these effects, single cell transcriptomic analysis revealed that most neuronal subtypes were still generated in Eed mutants. However, we observed an increase in glutamatergic/GABAergic double-positive cells, as well as loss/reduction of dopamine, hypocretin and Tac2-Pax6 neurons. These findings indicate that many aspects of the hypothalamic gene regulatory flow can proceed without the key H3K27me3 epigenetic repressor mark, but points to a unique sensitivity of particular neuronal subtypes to a disrupted epigenomic landscape.

1 citations


Journal ArticleDOI
TL;DR: A novel bioinformatics package that integrates transcriptomic and epigenetic datasets to identify regulatory underpinnings of heterogeneous biological processes and uncover a differential logic for the role of PRC2 upon functionally distinct gene cohorts that drive CNS anterior expansion is provided.
Abstract: Abstract A prominent aspect of most, if not all, central nervous systems (CNSs) is that anterior regions (brain) are larger than posterior ones (spinal cord). Studies in Drosophila and mouse have revealed that Polycomb Repressor Complex 2 (PRC2), a protein complex responsible for applying key repressive histone modifications, acts by several mechanisms to promote anterior CNS expansion. However, it is unclear what the full spectrum of PRC2 action is during embryonic CNS development and how PRC2 intersects with the epigenetic landscape. We removed PRC2 function from the developing mouse CNS, by mutating the key gene Eed, and generated spatio-temporal transcriptomic data. To decode the role of PRC2, we developed a method that incorporates standard statistical analyses with probabilistic deep learning to integrate the transcriptomic response to PRC2 inactivation with epigenetic data. This multi-variate analysis corroborates the central involvement of PRC2 in anterior CNS expansion, and also identifies several unanticipated cohorts of genes, such as proliferation and immune response genes. Furthermore, the analysis reveals specific profiles of regulation via PRC2 upon these gene cohorts. These findings uncover a differential logic for the role of PRC2 upon functionally distinct gene cohorts that drive CNS anterior expansion. To support the analysis of emerging multi-modal datasets, we provide a novel bioinformatics package that integrates transcriptomic and epigenetic datasets to identify regulatory underpinnings of heterogeneous biological processes.

1 citations


Posted ContentDOI
24 Oct 2022-bioRxiv
TL;DR: Applications to normal tissue, development, disease, and large-scale atlas data reveals the broad applicability and power of Cytocipher to generate biological insights in numerous contexts, including the identification of cell types not previously described in the datasets analyzed.
Abstract: Identification of cell types using single cell RNA-seq (scRNA-seq) is revolutionising the study of multicellular organisms. However, typical scRNA-seq analysis often involves post hoc manual curation to ensure clusters are transcriptionally distinct, which is time-consuming, error-prone, and irreproducible. To overcome these obstacles, we developed Cytocipher, a bioinformatics method and scverse compatible software package that statistically determines significant clusters. Application of Cytocipher to normal tissue, development, disease, and large-scale atlas data reveals the broad applicability and power of Cytocipher to generate biological insights in numerous contexts. This included the identification of cell types not previously described in the datasets analyzed, such as CD8+ T cell subtypes in human peripheral blood mononuclear cells; cell lineage intermediate states during mouse pancreas development; and subpopulations of luminal epithelial cells over-represented in prostate cancer. Cytocipher also scales to large datasets with high test performance, as shown by application to the Tabula Sapiens Atlas representing >480,000 cells. Cytocipher is a novel and generalisable method that statistically determines transcriptionally distinct and programmatically reproducible clusters from single cell data. Cytocipher is available at https://github.com/BradBalderson/Cytocipher.

Proceedings ArticleDOI
31 Dec 2022
TL;DR: In this paper , the authors used the TRIAGE model (a predictive method based on a score calculated using the inverse relationship of H3K27me3 and gene expression), modified here to analyze Nelore muscle samples contrasting for calcium (Ca) content.
Abstract: The histone modification H3K27me3 is linked to the regulation of different cell states. However, no study has focused on the influence of H3K27me3 over gene expression in Nelore cattle phenotypes. We used the TRIAGE model (a predictive method based on a score calculated using the inverse relationship of H3K27me3 and gene expression), modified here to analyze Nelore muscle samples contrasting for calcium (Ca) content. The aim was to identify putatively regulated genes, called Discordantly Ranked Genes (DRGs), related to this phenotype. We identified 209 DRGs, from which 10 (e.g. COMP, MAFB and ITAG11) were also previously reported as differentially expressed genes between animals contrasting for Ca content and 44 (e.g. KCNJ5, IGF2 and HOXA10) underlie enriched pathways related to signaling, synaptic, neural, regulatory and addiction events, all pathways related to Ca content. Our approach can identify candidate genes to be regulated in bovine muscle according to Ca content.

Journal ArticleDOI
TL;DR: A new method called TRIAGE-Cluster is described which uses genome-wide repressive epigenetic data from diverse bio-samples to identify genes demarcating cell diversity in any scRNA-seq data set, and integrates patterns of H3K27me3 domains deposited across hundreds of cell types with weighted density estimation.
Abstract: Methods for cell clustering and gene expression from single-cell RNA sequencing (scRNA-seq) data are essential for biological interpretation of cell processes. Here we present TRIAGE-Cluster which uses genome-wide epigenetic data from diverse bio-samples to identify genes demarcating cell diversity in scRNA-seq data. TRIAGE-Cluster integrates patterns of repressive chromatin deposited across diverse cell types with weighted density estimation to determine cell type clusters in a 2D UMAP space. We then present TRIAGE-ParseR, a machine learning method that evaluates gene expression rank lists to define gene groups governing the identity and function of cell types. We demonstrate the utility of this two-step approach using atlases of in vivo and in vitro cell diversification and organogenesis. We also provide a web accessible dashboard for analysis and download of data and software. Collectively, genome-wide epigenetic repression provides a versatile strategy to define cell diversity and study gene regulation of scRNA-seq data.

Journal ArticleDOI
TL;DR: The cryo-EM structure of a homododecameric Class I KARI is solved and it is demonstrated how a triad of amino acid side chains plays a crucial role in promoting the oligomerization of this enzyme.
Abstract: The branched-chain amino acids (BCAAs) leucine, isoleucine and valine are synthesized via a common biosynthetic pathway. Ketol-acid reductoisomerase (KARI) is the second enzyme in this pathway. In addition to its role in BCAA biosynthesis, KARI catalyzes two rate-limiting steps that are key components of a cell-free biofuel biosynthesis route. For industrial applications, reaction temperature and enzyme stability are key factors that affect process robustness and product yield. Here, we have solved the cryo-EM structure (2.94 Å resolution) of a homododecameric Class I KARI (from Campylobacter jejuni) and demonstrated how a triad of amino acid side chains plays a crucial role in promoting the oligomerization of this enzyme. Importantly, both its thermal and solvent stability are greatly enhanced in the dodecameric state when compared to its dimeric counterpart (apparent melting temperatures (Tm) of 83.1 °C and 51.5 °C, respectively). We also employed protein design (PROSS) for a tetrameric Class II KARI (from Escherichia coli) to generate a variant with improved thermal and solvent stabilities. In total, 34 mutations were introduced, which did not affect the oligomeric state of this enzyme but resulted in a fully functional catalyst with a significantly elevated Tm (58.5 °C vs. 47.9 °C for the native version).

Posted ContentDOI
04 Jul 2022-bioRxiv
TL;DR: SiRCle (Signature Regulatory Clustering), a novel method to integrate DNA methylation, RNA-seq and proteomics data, is presented and allows to identify metabolic enzymes and cell-type-specific markers associated with survival along with the likely molecular driver behind the gene's perturbations.
Abstract: Clear cell renal cell carcinoma (ccRCC) tumours develop and progress via complex remodelling of the kidney epigenome, transcriptome, proteome, and metabolome. Given the subsequent tumour and inter-patient heterogeneity, drug-based treatments report limited success, calling for multi-omics studies to extract regulatory relationships, and ultimately, to develop targeted therapies. However, current methods are unable to extract nonlinear multi-omics perturbations. Here, we present SiRCle (Signature Regulatory Clustering), a novel method to integrate DNA methylation, RNA-seq and proteomics data. Applying SiRCle to a case study of ccRCC, we disentangle the layer (DNA methylation, transcription and/or translation) where dys-regulation first occurs and find the primary biological processes altered. Next, we detect regulatory differences between patient subsets by using a variational autoencoder to integrate omics’ data followed by statistical comparisons on the integrated space. In ccRCC patients, SiRCle allows to identify metabolic enzymes and cell-type-specific markers associated with survival along with the likely molecular driver behind the gene’s perturbations.

Journal ArticleDOI
TL;DR: In this article , catalytic parameters and crystal structure of the dehydratase from Paralcaligenes ureilyticus (PuDHT), both in presence of Mg2+ and Mn2+) were investigated.
Abstract: Abstract Enzyme‐catalyzed reaction cascades play an increasingly important role for the sustainable manufacture of diverse chemicals from renewable feedstocks. For instance, dehydratases from the ilvD/EDD superfamily have been embedded into a cascade to convert glucose via pyruvate to isobutanol, a platform chemical for the production of aviation fuels and other valuable materials. These dehydratases depend on the presence of both a Fe−S cluster and a divalent metal ion for their function. However, they also represent the rate‐limiting step in the cascade. Here, catalytic parameters and the crystal structure of the dehydratase from Paralcaligenes ureilyticus (PuDHT, both in presence of Mg2+ and Mn2+) were investigated. Rate measurements demonstrate that the presence of stoichiometric concentrations Mn2+ promotes higher activity than Mg2+, but at high concentrations the former inhibits the activity of PuDHT. Molecular dynamics simulations identify the position of a second binding site for the divalent metal ion. Only binding of Mn2+ (not Mg2+) to this site affects the ligand environment of the catalytically essential divalent metal binding site, thus providing insight into an inhibitory mechanism of Mn2+ at higher concentrations. Furthermore, in silico docking identified residues that play a role in determining substrate binding and selectivity. The combined data inform engineering approaches to design an optimal dehydratase for the cascade.