scispace - formally typeset
Search or ask a question
Author

Jere Koskela

Other affiliations: Technical University of Berlin
Bio: Jere Koskela is an academic researcher from University of Warwick. The author has contributed to research in topics: Coalescent theory & Particle filter. The author has an hindex of 6, co-authored 24 publications receiving 119 citations. Previous affiliations of Jere Koskela include Technical University of Berlin.

Papers
More filters
Journal ArticleDOI
TL;DR: A low dimensional function of the site frequency spectrum is introduced that is tailor-made for distinguishing coalescent models with multiple mergers from Kingman coalescence models with population growth, and this function is used to construct a hypothesis test between these model classes.
Abstract: We introduce a low dimensional function of the site frequency spectrum that is tailor-made for distinguishing coalescent models with multiple mergers from Kingman coalescent models with population growth, and use this function to construct a hypothesis test between these model classes. The null and alternative sampling distributions of the statistic are intractable, but its low dimensionality renders them amenable to Monte Carlo estimation. We construct kernel density estimates of the sampling distributions based on simulated data, and show that the resulting hypothesis test dramatically improves on the statistical power of a current state-of-the-art method. A key reason for this improvement is the use of multi-locus data, in particular averaging observed site frequency spectra across unlinked loci to reduce sampling variance. We also demonstrate the robustness of our method to nuisance and tuning parameters. Finally we show that the same kernel density estimates can be used to conduct parameter estimation, and argue that our method is readily generalisable for applications in model selection, parameter inference and experimental design.

21 citations

Journal ArticleDOI
TL;DR: It is shown that cryptic recombination and selection do not diminish the power of the test, but that misspecifying population structure does, and the singleton-tail statistic can also solve the more challenging model selection problem between multiple merger due to selective sweeps, and multiple mergers due to high fecundity with moderate power.
Abstract: We study the effect of biological confounders on the model selection problem between Kingman coalescents with population growth, and Ξ-coalescents involving simultaneous multiple mergers. We use a low dimensional, computationally tractable summary statistic, dubbed the singleton-tail statistic, to carry out approximate likelihood ratio tests between these model classes. The singleton-tail statistic has been shown to distinguish between them with high power in the simple setting of neutrally evolving, panmictic populations without recombination. We extend this work by showing that cryptic recombination and selection do not diminish the power of the test, but that misspecifying population structure does. Furthermore, we demonstrate that the singleton-tail statistic can also solve the more challenging model selection problem between multiple mergers due to selective sweeps, and multiple mergers due to high fecundity with moderate power of up to 30%.

20 citations

Journal ArticleDOI
TL;DR: The tractable n-coalescent can be used to predict the shape and size of SMC genealogies, as it is illustrated by characterising the limiting mean and variance of the tree height.
Abstract: We study weighted particle systems in which new generations are resampled from current particles with probabilities proportional to their weights. This covers a broad class of sequential Monte Carlo (SMC) methods, widely-used in applied statistics and cognate disciplines. We consider the genealogical tree embedded into such particle systems, and identify conditions, as well as an appropriate time-scaling, under which they converge to the Kingman $n$-coalescent in the infinite system size limit in the sense of finite-dimensional distributions. Thus, the tractable $n$-coalescent can be used to predict the shape and size of SMC genealogies, as we illustrate by characterising the limiting mean and variance of the tree height. SMC genealogies are known to be connected to algorithm performance, so that our results are likely to have applications in the design of new methods as well. Our conditions for convergence are strong, but we show by simulation that they do not appear to be necessary.

15 citations

Journal ArticleDOI
TL;DR: In this article, the authors derive families of approximate conditional sampling distributions for finite sites Λ- and Ξ-coalescents, and use them to obtain 'approximately optimal' importance sampling and approximate conditionals (PAC) algorithms.
Abstract: Full likelihood inference under Kingman's coalescent is a computationally challenging problem to which importance sampling (IS) and the product of approximate conditionals (PAC) methods have been applied successfully. Both methods can be expressed in terms of families of intractable conditional sampling distributions (CSDs), and rely on principled approximations for accurate inference. Recently, more general Λ- and Ξ-coalescents have been observed to provide better modelling fits to some genetic data sets. We derive families of approximate CSDs for finite sites Λ- and Ξ-coalescents, and use them to obtain 'approximately optimal' IS and PAC algorithms for Λ-coalescents, yielding substantial gains in efficiency over existing methods.

14 citations

Journal ArticleDOI
TL;DR: Families of approximate CSDs for finite sites $\Lambda$- and $\Xi$-coalescents are derived and used to obtain "approximately optimal" IS and PAC algorithms for $\Lamba$- coalescented, yielding substantial gains in efficiency over existing methods.
Abstract: Full likelihood inference under Kingman's coalescent is a computationally challenging problem to which importance sampling (IS) and the product of approximate conditionals (PAC) method have been applied successfully. Both methods can be expressed in terms of families of intractable conditional sampling distributions (CSDs), and rely on principled approximations for accurate inference. Recently, more general $\Lambda$- and $\Xi$-coalescents have been observed to provide better modelling fits to some genetic data sets. We derive families of approximate CSDs for finite sites $\Lambda$- and $\Xi$-coalescents, and use them to obtain "approximately optimal" IS and PAC algorithms for $\Lambda$-coalescents, yielding substantial gains in efficiency over existing methods.

14 citations


Cited by
More filters
Journal ArticleDOI

3,734 citations

Book
01 Jan 2013
TL;DR: In this paper, the authors consider the distributional properties of Levy processes and propose a potential theory for Levy processes, which is based on the Wiener-Hopf factorization.
Abstract: Preface to the revised edition Remarks on notation 1. Basic examples 2. Characterization and existence 3. Stable processes and their extensions 4. The Levy-Ito decomposition of sample functions 5. Distributional properties of Levy processes 6. Subordination and density transformation 7. Recurrence and transience 8. Potential theory for Levy processes 9. Wiener-Hopf factorizations 10. More distributional properties Supplement Solutions to exercises References and author index Subject index.

1,957 citations

Journal ArticleDOI
TL;DR: In this paper, a wide variety of inequalities which are established for either stochastic processes or sequences of random variables are presented, as well as relaxation of a spectral gap assumption.
Abstract: Lévy Processes” (eight papers), “III. Empirical Processes” (four papers), and “IV. Stochastic Differential Equations” (four papers). Here are some comments about the individual papers: In I.2 (paper 2 of Part I) the covariance representation method, which relies on Clark’s formula on path spaces, is used to obtain concentration inequalities for functionals of Brownian motion on a manifold, allowing one to obtain tail estimates for this Brownian motion. In I.4 a transportation inequality for the canonical Gaussian measure in R is obtained and applied to Khintchine–Kahane inequalities for norms of random series with nonsymmetric Bernoulli coefficients. In II.1 exponential inequalities for U -statistics of order two are presented; these rely upon the Talagrand inequality for empirical processes but also use martingale type inequalities. In II.2 the unconditional convergence of a Gaussian [and, more generally, independent, identically distributed (iid)] series in a Banach space is studied. Applications to Karhunen–Love representations of Gaussian processes are given. In II.3 estimates of tail properties and moments of multidimensional chaos generated by positive random variables with log concave tails are given. In II.4 a quantitative technique for studying the asymptotic distribution of sequences of Markov processes in infinite dimensions is proposed. The proof relies on the properties of an associated sequence of exponential martingales. In II.5 it is shown that a moving average process driven by a symmetric Lévy process and with a kernel with finite total 2-variation admits an almost surely bounded version. In II.6 a Markovian approach to the entropic convergence in the central limit theorem is presented. The emphasis is on the speed of convergence, as well as relaxing a spectral gap assumption. In II.7 a new version of the Khintchine–Kahane inequality for general Bernoulli random variables is presented with the help of hypercontractive methods. In III.1 necessary and sufficient conditions for the moderate deviations of empirical processes and sums of iid random vectors on a separable Banach space are given. In III.2 exponential concentration inequalities for subadditive functions of independent random variables are obtained. As a consequence, Talagrand’s inequality for empirical processes is refined thanks to further developments of the entropy method introduced by M. Ledoux. In III.3 ratio limit theorems for empirical processes are obtained with the help of concentration inequalities. In III.4 asymptotic distributions of trimmed Wasserstein distances between the true and the empirical distribution function are obtained via weighted approximation results for uniform empirical processes. In IV.1 sharp rates of convergence for splitting-up approximations of stochastic partial differential equations are obtained. The error is estimated in terms of Sobolev’s norm. In IV.4 the existence and uniqueness of a strong solution for a stochastic differential equation driven by a fractional Brownian motion with Hurst index H < 1/2 and with a possibly time-dependent drift which satisfies a suitable integrability condition is obtained. In short, the book presents a wide variety of inequalities which are established for either stochastic processes or sequences of random variables. In 2006 this is still an active domain of research in transportation problems.

141 citations

Journal ArticleDOI
TL;DR: In this paper, average-case error was proposed in the applied mathematics literature as an alternative criterion with which to assess numerical methods, in contrast to worst case error, this criterion was proposed as a new criterion for numerical methods.
Abstract: Over forty years ago average-case error was proposed in the applied mathematics literature as an alternative criterion with which to assess numerical methods. In contrast to worst-case error, this ...

134 citations

Journal ArticleDOI

117 citations