scispace - formally typeset
Search or ask a question

Showing papers by "Jens Lagergren published in 2022"


Posted ContentDOI
16 Feb 2022-bioRxiv
TL;DR: Studying clonal human lymphocytes and mouse brain cells, a vast diversity of heritable transcriptional states among different clones of cells of the same type in vivo is uncovered, and it is shown that this diversity is coupled to clone specific chromatin accessibility, resulting in distinct expression of genes by different clones.
Abstract: Cell types can be classified based on shared patterns of transcription. Variability in gene expression between individual cells of the same type has been ascribed to stochastic transcriptional bursting and transient cell states. We asked whether long-term, heritable differences in transcription can impart diversity within a cell type. Studying clonal human lymphocytes and mouse brain cells, we uncover a vast diversity of heritable transcriptional states among different clones of cells of the same type in vivo. In lymphocytes we show that this diversity is coupled to clone specific chromatin accessibility, resulting in distinct expression of genes by different clones. Our findings identify a source of cellular diversity, which may have important implications for how cellular populations are shaped by selective processes in development, aging and disease.

10 citations


Proceedings Article
22 Feb 2022
TL;DR: This work proposes the multiple importance sampling ELBO (MISELBO), a versatile yet simple framework that allows to unveil connections between VI and recent advances in the importance sampling literature, paving the way for further methodological advances.
Abstract: In variational inference (VI), the marginal log-likelihood is estimated using the standard evidence lower bound (ELBO), or improved versions as the importance weighted ELBO (IWELBO). We propose the multiple importance sampling ELBO (MISELBO), a \textit{versatile} yet \textit{simple} framework. MISELBO is applicable in both amortized and classical VI, and it uses ensembles, e.g., deep ensembles, of independently inferred variational approximations. As far as we are aware, the concept of deep ensembles in amortized VI has not previously been established. We prove that MISELBO provides a tighter bound than the average of standard ELBOs, and demonstrate empirically that it gives tighter bounds than the average of IWELBOs. MISELBO is evaluated in density-estimation experiments that include MNIST and several real-data phylogenetic tree inference problems. First, on the MNIST dataset, MISELBO boosts the density-estimation performances of a state-of-the-art model, nouveau VAE. Second, in the phylogenetic tree inference setting, our framework enhances a state-of-the-art VI algorithm that uses normalizing flows. On top of the technical benefits of MISELBO, it allows to unveil connections between VI and recent advances in the importance sampling literature, paving the way for further methodological advances. We provide our code at \url{https://github.com/Lagergren-Lab/MISELBO}.

7 citations


Journal ArticleDOI
TL;DR: In this paper , a probabilistic model, called Celloscope, was proposed for cell type deconvolution from spatial transcriptomics data, which utilizes established prior knowledge on marker genes for cell-type decoding.
Abstract: Spatial transcriptomics maps gene expression across tissues, posing the challenge of determining the spatial arrangement of different cell types. However, spatial transcriptomics spots contain multiple cells. Therefore, the observed signal comes from mixtures of cells of different types. Here, we propose an innovative probabilistic model, Celloscope, that utilizes established prior knowledge on marker genes for cell type deconvolution from spatial transcriptomics data. Celloscope outperforms other methods on simulated data, successfully indicates known brain structures and spatially distinguishes between inhibitory and excitatory neuron types based in mouse brain tissue, and dissects large heterogeneity of immune infiltrate composition in prostate gland tissue.

4 citations


Proceedings ArticleDOI
01 Mar 2022
TL;DR: VaiPhy is proposed, a remarkably fast VI based algorithm for approximate posterior inference in an augmented tree space that produces marginal log-likelihood estimates on par with the state-of-the-art methods on real data and is considerably faster since it does not require auto-differentiation.
Abstract: Phylogenetics is a classical methodology in computational biology that today has become highly relevant for medical investigation of single-cell data, e.g., in the context of cancer development. The exponential size of the tree space is, unfortunately, a substantial obstacle for Bayesian phylogenetic inference using Markov chain Monte Carlo based methods since these rely on local operations. And although more recent variational inference (VI) based methods offer speed improvements, they rely on expensive auto-differentiation operations for learning the variational parameters. We propose VaiPhy, a remarkably fast VI based algorithm for approximate posterior inference in an augmented tree space. VaiPhy produces marginal log-likelihood estimates on par with the state-of-the-art methods on real data and is considerably faster since it does not require auto-differentiation. Instead, VaiPhy combines coordinate ascent update equations with two novel sampling schemes: (i) SLANTIS, a proposal distribution for tree topologies in the augmented tree space, and (ii) the JC sampler, to the best of our knowledge, the first-ever scheme for sampling branch lengths directly from the popular Jukes-Cantor model. We compare VaiPhy in terms of density estimation and runtime. Additionally, we evaluate the reproducibility of the baselines. We provide our code on GitHub: \url{https://github.com/Lagergren-Lab/VaiPhy}.

3 citations


Posted ContentDOI
24 Nov 2022-bioRxiv
TL;DR: Spatial transcriptomics of VDJ sequences (Spatial VDJ) as mentioned in this paper was developed to map immunoglobulin and TR antigen receptors in human tissue sections, which can capture lymphocyte spatial clonal architecture across tissues, which could have important therapeutic implications.
Abstract: The spatial distribution of lymphocyte clones within tissues is critical to their development, selection, and expansion. We have developed Spatial Transcriptomics of VDJ sequences (Spatial VDJ), which maps immunoglobulin and TR antigen receptors in human tissue sections. Spatial VDJ captures lymphocyte clones matching canonical T, B, and plasma cell distributions in tissues and amplifies clonal sequences confirmed by orthogonal methods. We confirm spatial congruency between paired receptor chains, develop a computational framework to predict receptor pairs, and link the expansion of distinct B cell clones to different tumor-associated gene expression programs. Spatial VDJ delineates B cell clonal diversity, class switch recombination, and lineage trajectories within their spatial context. Taken together, Spatial VDJ captures lymphocyte spatial clonal architecture across tissues, which could have important therapeutic implications. One-Sentence Summary Spatial transcriptomics-based technology co-captures T and B cell receptors within their anatomical niche in human tissue.

2 citations


Journal ArticleDOI
TL;DR: ToMExO as mentioned in this paper models the mutation accumulation process using a tree, where each node includes a driver gene or a set of driver genes, and the mutual exclusivity patterns observed in mutations in specific cancer genes (by its nodes) and the temporal order of events by its edges.
Abstract: Identifying the interrelations among cancer driver genes and the patterns in which the driver genes get mutated is critical for understanding cancer. In this paper, we study cross-sectional data from cohorts of tumors to identify the cancer-type (or subtype) specific process in which the cancer driver genes accumulate critical mutations. We model this mutation accumulation process using a tree, where each node includes a driver gene or a set of driver genes. A mutation in each node enables its children to have a chance of mutating. This model simultaneously explains the mutual exclusivity patterns observed in mutations in specific cancer genes (by its nodes) and the temporal order of events (by its edges). We introduce a computationally efficient dynamic programming procedure for calculating the likelihood of our noisy datasets and use it to build our Markov Chain Monte Carlo (MCMC) inference algorithm, ToMExO. Together with a set of engineered MCMC moves, our fast likelihood calculations enable us to work with datasets with hundreds of genes and thousands of tumors, which cannot be dealt with using available cancer progression analysis methods. We demonstrate our method’s performance on several synthetic datasets covering various scenarios for cancer progression dynamics. Then, a comparison against two state-of-the-art methods on a moderate-size biological dataset shows the merits of our algorithm in identifying significant and valid patterns. Finally, we present our analyses of several large biological datasets, including colorectal cancer, glioblastoma, and pancreatic cancer. In all the analyses, we validate the results using a set of method-independent metrics testing the causality and significance of the relations identified by ToMExO or competing methods.

1 citations


Journal ArticleDOI
TL;DR: This work proposes the first ever mixture of variational approximations for a normalizing-based hierarchical variational autoencoder (VAE) with VampPrior and a PixelCNN decoder network, and explains this cooperative behavior by drawing a novel connection between VI and adaptive importance sampling.
Abstract: Mixture models in variational inference (VI) is an active field of research. Recent works have established their connection to multiple importance sampling (MIS) through the MISELBO and advanced the use of ensemble approximations for large-scale problems. However, as we show here, an independent learning of the ensemble components can lead to suboptimal diversity. Hence, we study the effect of instead using MISELBO as an objective function for learning mixtures, and we propose the first ever mixture of variational approximations for a normalizing flow-based hierarchical variational autoencoder (VAE) with VampPrior and a PixelCNN decoder network. Two major insights led to the construction of this novel composite model . First, mixture models have potential to be off-the-shelf tools for practitioners to obtain more flexible posterior approximations in VAEs. Therefore, we make them more accessible by demonstrating how to apply them to four popular architectures. Second, the mixture components cooperate in order to cover the target distribution while trying to maximize their diversity when MISELBO is the objective function. We explain this cooperative behavior by drawing a novel connection between VI and adaptive importance sampling. Finally, we demonstrate the superiority of the Mixture VAEs’ learned feature representations on both image and single-cell transcriptome data, and obtain state-of-the-art results among VAE architectures in terms of negative log-likelihood on the MNIST and FashionMNIST datasets. Code available here: https://github.com/Lagergren-Lab/ MixtureVAEs .

1 citations


DOI
10 Feb 2022-bioRxiv
TL;DR: This paper introduces ToMExO, a probabilistic method to infer cancer driver genes and how they affect each other, using cross-sectional data from cohorts of tumors, and model cancer progression dynamics using a tree with sets of driver genes in the nodes.
Abstract: Identifying cancer driver genes and their interrelations is critical in understanding cancer progression mechanisms. In this paper, we introduce ToMExO, a probabilistic method to infer cancer driver genes and how they affect each other, using cross-sectional data from cohorts of tumors. We model cancer progression dynamics using a tree with sets of driver genes in the nodes. This model explains the temporal orders among driver mutations and their mutual exclusivity patterns. We introduce a dynamic programming procedure for the likelihood calculation and build an MCMC inference algorithm. Together with our engineered MCMC moves, our efficient likelihood calculations enable us to work with datasets having hundreds of genes and thousands of tumors in the datasets. We demonstrate our method’s performance on several synthetic datasets covering various scenarios for cancer progression dynamics. We then present the analyses of several biological datasets using the ToMExO method and validate the results using a set of method-independent metrics.

Posted ContentDOI
23 Sep 2022-bioRxiv
TL;DR: Tumoroscope is proposed, the first probabilistic model that accurately infers cancer clones and their high-resolution localization by integrating pathological images, whole exome sequencing, and spatial transcriptomics data and enables an integrated study of the spatial, genomic, and phenotypic organization of tumors.
Abstract: Spatial and genomic heterogeneity of tumors is the key for cancer progression, treatment, and survival. However, a technology for direct mapping the clones in the tumor tissue based on point mutations is lacking. Here, we propose Tumoroscope, the first probabilistic model that accurately infers cancer clones and their high-resolution localization by integrating pathological images, whole exome sequencing, and spatial transcriptomics data. In contrast to previous methods, Tumoroscope explicitly addresses the problem of deconvoluting the proportions of clones in spatial transcriptomics spots. Applied to a reference prostate cancer dataset and a newly generated breast cancer dataset, Tumoroscope reveals spatial patterns of clone colocalization and mutual exclusion in sub-areas of the tumor tissue. We further infer clone-specific gene expression levels and the most highly expressed genes for each clone. In summary, Tumoroscope enables an integrated study of the spatial, genomic, and phenotypic organization of tumors.

Posted ContentDOI
23 Dec 2022-bioRxiv
TL;DR: In this paper , a Markov Chain Monte Carlo (MCMCMC) based approach was proposed to infer cancer progression models from the clone-level data gathered from a cohort of tumors.
Abstract: Cancer is an evolutionary process involving the accumulation of somatic mutations in the genome. The tumor’s evolution is known to be highly influenced by specific somatic mutations in so-called cancer driver genes. Cancer progression models are computational tools used to infer the interactions among cancer driver genes by analyzing the pattern of absence/presence of mutations in different tumors of a cohort. In an abundance of subclonal mutations, discarding the heterogeneity of tumors and investigating the interrelations among the driver genes solely based on tumor-level data can result in misleading interpretations. In this paper, we introduce a computational approach to infer cancer progression models from the clone-level data gathered from a cohort of tumors. Our method leverages the rich clone-level data to identify the patterns of interactions among cancer driver genes and produce significantly more robust and reliable cancer progression models. Using a novel efficient Markov Chain Monte Carlo inference algorithm, our method provides outstanding scalability to the rapidly increasing size of available datasets. Using an extensive set of synthetic data experiments, we demonstrate the performance of our inference method in recovering the generative progression models. Finally, we present our analysis of two sub-types of lung cancer using biological multi-regional bulk data.

Journal ArticleDOI
TL;DR: In this paper , a broad range of clinically relevant drugs were tested on the whole-tumor cell cultures (WTCs) with a high success rate (±90%) for all subtypes of breast tumors.

30 Sep 2022
TL;DR: Lagergren et al. as discussed by the authors showed that the mixture components cooperate when they jointly adapt to maximize the ELBO and showed that increasing the number of mixture components improves the latent representation capabilities of VAE on both image and single-cell datasets.
Abstract: In this paper, we show how the mixture components cooperate when they jointly adapt to maximize the ELBO. We build upon recent advances in the multiple and adaptive importance sampling literature. We then model the mixture components using separate encoder networks and show empirically that the ELBO is monotonically non-decreasing as a function of the number of mixture components. These results hold for a range of different VAE architectures on the MNIST, FashionMNIST, and CIFAR-10 datasets. In this work, we also demonstrate that increasing the number of mixture components improves the latent-representation capabilities of the VAE on both image and single-cell datasets. This cooperative behavior motivates that using Mixture VAEs should be considered a standard approach for obtaining more flexible variational approximations. Finally, Mixture VAEs are here, for the first time, compared and combined with normalizing flows, hierarchical models and/or the VampPrior in an extensive ablation study. Multiple of our Mixture VAEs achieve state-of-the-art log-likelihood results for VAE architectures on the MNIST and FashionMNIST datasets. The experiments are reproducible using our code, provided here: https://github.com/lagergren-lab/mixturevaes.