Showing papers by "Jens Lagergren published in 2013"

PDF

Open Access

Journal Article•DOI•

Learning oncogenetic networks by reducing to mixed integer linear programming.

[...]

Hossein Farahani¹, Jens Lagergren¹•Institutions (1)

14 Jun 2013-PLOS ONE

TL;DR: This work proposes Progression Networks, a special case of Bayesian networks that are tailored to model disease progression that have similarities with Conjunctive Bayesian Networks (CBNs) and reduces the hard problem of learning the Bayesian and progression networks to Mixed Integer Linear Programming (MILP).

...read moreread less

Abstract: Cancer can be a result of accumulation of different types of genetic mutations such as copy number aberrations. The data from tumors are cross-sectional and do not contain the temporal order of the genetic events. Finding the order in which the genetic events have occurred and progression pathways are of vital importance in understanding the disease. In order to model cancer progression, we propose Progression Networks, a special case of Bayesian networks, that are tailored to model disease progression. Progression networks have similarities with Conjunctive Bayesian Networks (CBNs) [1],a variation of Bayesian networks also proposed for modeling disease progression. We also describe a learning algorithm for learning Bayesian networks in general and progression networks in particular. We reduce the hard problem of learning the Bayesian and progression networks to Mixed Integer Linear Programming (MILP). MILP is a Non-deterministic Polynomial-time complete (NP-complete) problem for which very good heuristics exists. We tested our algorithm on synthetic and real cytogenetic data from renal cell carcinoma. We also compared our learned progression networks with the networks proposed in earlier publications. The software is available on the website https://bitbucket.org/farahani/diprog.

...read moreread less

50 citations

Journal Article•DOI•

GenPhyloData: realistic simulation of gene family evolution

[...]

Joel Sjöstrand¹, Joel Sjöstrand², Lars Arvestad², Lars Arvestad¹, Jens Lagergren¹, Jens Lagergren³, Bengt Sennblad¹, Bengt Sennblad⁴ - Show less +4 more•Institutions (4)

Science for Life Laboratory¹, Stockholm University², Royal Institute of Technology³, Karolinska Institutet⁴

27 Jun 2013-BMC Bioinformatics

TL;DR: Simulated data created with PrIME-GenPhyloData can be used for benchmarking phylogenetic approaches, or for characterizing models or model parameters with respect to biological data.

...read moreread less

Abstract: PrIME-GenPhyloData is a suite of tools for creating realistic simulated phylogenetic trees, in particular for families of homologous genes. It supports generation of trees based on a birth-death process and—perhaps more interestingly—also supports generation of gene family trees guided by a known (synthetic or biological) species tree while accounting for events such as gene duplication, gene loss, and lateral gene transfer (LGT). The suite also supports a wide range of branch rate models enabling relaxation of the molecular clock. Simulated data created with PrIME-GenPhyloData can be used for benchmarking phylogenetic approaches, or for characterizing models or model parameters with respect to biological data. The concept of tree-in-tree evolution can also be used to model, for instance, biogeography or host-parasite co-evolution.

...read moreread less

48 citations

Journal Article•DOI•

Fastphylo: Fast tools for phylogenetics

[...]

Mehmood Alam Khan¹, Mehmood Alam Khan², Isaac Elias¹, Erik Sjölund³, Kristina Nylander¹, Roman Valls Guimera³, Richard Schobesberger¹, Peter Schmitzberger¹, Jens Lagergren¹, Lars Arvestad⁴, Lars Arvestad¹ - Show less +7 more•Institutions (4)

Royal Institute of Technology¹, University of Engineering and Technology, Peshawar², Science for Life Laboratory³, Stockholm University⁴

20 Nov 2013-BMC Bioinformatics

TL;DR: Fastphylo is a fast, memory efficient, and easy to use software suite containing implementations of efficient algorithms for two common problems in phylogenetics: estimating DNA/protein sequence distances and reconstructing a phylogeny from a distance matrix.

...read moreread less

Abstract: Distance methods are ubiquitous tools in phylogenetics. Their primary purpose may be to reconstruct evolutionary history, but they are also used as components in bioinformatic pipelines. However, poor computational efficiency has been a constraint on the applicability of distance methods on very large problem instances. We present fastphylo, a software package containing implementations of efficient algorithms for two common problems in phylogenetics: estimating DNA/protein sequence distances and reconstructing a phylogeny from a distance matrix. We compare fastphylo with other neighbor joining based methods and report the results in terms of speed and memory efficiency. Fastphylo is a fast, memory efficient, and easy to use software suite. Due to its modular architecture, fastphylo is a flexible tool for many phylogenetic studies.

...read moreread less

20 citations

Journal Article•DOI•

Genome-wide probabilistic reconciliation analysis across vertebrates.

[...]

Owais Mahmudi¹, Joel Sjöstrand², Bengt Sennblad², Jens Lagergren¹•Institutions (2)

Royal Institute of Technology¹, Science for Life Laboratory²

15 Oct 2013

TL;DR: It is concluded that for many gene families, the most parsimonious reconciliation (MPR) - a reconciliation that minimizes the number of duplications - is far from the correct explanation of the evolutionary history.

...read moreread less

Abstract: Gene duplication is considered to be a major driving force in evolution that enables the genome of a species to acquire new functions. A reconciliation - a mapping of gene tree vertices to the edges or vertices of a species tree explains where gene duplications have occurred on the species tree. In this study, we sample reconciliations from a posterior over reconciliations, gene trees, edge lengths and other parameters, given a species tree and gene sequences. We employ a Bayesian analysis tool, based on the probabilistic model DLRS that integrates gene duplication, gene loss and sequence evolution under a relaxed molecular clock for substitution rates, to obtain this posterior. By applying these methods, we perform a genome-wide analysis of a nine species dataset, OPTIC, and conclude that for many gene families, the most parsimonious reconciliation (MPR) - a reconciliation that minimizes the number of duplications - is far from the correct explanation of the evolutionary history. For the given dataset, we observe that approximately 19% of the sampled reconciliations are different from MPR. This is in clear contrast with previous estimates, based on simpler models and less realistic assumptions, according to which 98% of the reconciliations can be expected to be identical to MPR. We also generate heatmaps showing where in the species trees duplications have been most frequent during the evolution of these species.

...read moreread less

20 citations

Journal Article•

Learning oncogenetic networks by reducing to MILP

[...]

Hossein Farahani, Jens Lagergren

01 Jan 2013-PLOS ONE

TL;DR: Two models for cancer progression and algorithms for learning Progression Networks, which are a special class of Bayesian networks, and a method for determining editing levels in mature microRNAs from the high-throughput RNA sequencing data from the mouse brain are presented.

...read moreread less

Abstract: Cancer is a multi-stage process resulting from accumulation of genetic mutations. Data obtained from assaying a tumor only contains the set of mutations in the tumor and lacks information about their temporal order. Learning the chronological order of the genetic mutations is an important step towards understanding the disease. The probability of introduction of a mutation to a tumor increases if certain mutations that promote it, already happened. Such dependencies induce what we call the monotonicity property in cancer progression. A realistic model of cancer progression should take this property into account.In this thesis, we present two models for cancer progression and algorithms for learning them. In the first model, we propose Progression Networks (PNs), which are a special class of Bayesian networks. In learning PNs the issue of monotonicity is taken into consideration. The problem of learning PNs is reduced to Mixed Integer Linear Programming (MILP), which is a NP-hard problem for which very good heuristics exist. We also developed a program, DiProg, for learning PNs.In the second model, the problem of noise in the biological experiments is addressed by introducing hidden variable. We call this model Hidden variable Oncogenetic Network (HON). In a HON, there are two variables assigned to each node, a hidden variable that represents the progression of cancer to the node and an observable random variable that represents the observation of the mutation corresponding to the node. We devised a structural Expectation Maximization (EM) algorithm for learning HONs. In the M-step of the structural EM algorithm, we need to perform a considerable number of inference tasks. Because exact inference is tractable only on Bayesian networks with bounded treewidth, we also developed an algorithm for learning bounded treewidth Bayesian networks by reducing the problem to a MILP.Our algorithms performed well on synthetic data. We also tested them on cytogenetic data from renal cell carcinoma. The learned progression networks from both algorithms are in agreement with the previously published results.MicroRNAs are short non-coding RNAs that are involved in post transcriptional regulation. A-to-I editing of microRNAs converts adenosine to inosine in the double stranded RNA. We developed a method for determining editing levels in mature microRNAs from the high-throughput RNA sequencing data from the mouse brain. Here, for the first time, we showed that the level of editing increases with development.

...read moreread less

3 citations