scispace - formally typeset
Search or ask a question
Author

Kenneth Lange

Bio: Kenneth Lange is an academic researcher from University of California, Los Angeles. The author has contributed to research in topics: Population & MM algorithm. The author has an hindex of 63, co-authored 341 publications receiving 29050 citations. Previous affiliations of Kenneth Lange include University of Southern California & University of North Carolina at Chapel Hill.


Papers
More filters
Journal ArticleDOI
TL;DR: The results show that ADMIXTURE's computational speed opens up the possibility of using a much larger set of markers in model-based ancestry estimation and that its estimates are suitable for use in correcting for population stratification in association studies.
Abstract: Population stratification has long been recognized as a confounding factor in genetic association studies. Estimated ancestries, derived from multi-locus genotype data, can be used to perform a statistical correction for population stratification. One popular technique for estimation of ancestry is the model-based approach embodied by the widely applied program structure. Another approach, implemented in the program EIGENSTRAT, relies on Principal Component Analysis rather than model-based estimation and does not directly deliver admixture fractions. EIGENSTRAT has gained in popularity in part owing to its remarkable speed in comparison to structure. We present a new algorithm and a program, ADMIXTURE, for model-based estimation of ancestry in unrelated individuals. ADMIXTURE adopts the likelihood model embedded in structure. However, ADMIXTURE runs considerably faster, solving problems in minutes that take structure hours. In many of our experiments, we have found that ADMIXTURE is almost as fast as EIGENSTRAT. The runtime improvements of ADMIXTURE rely on a fast block relaxation scheme using sequential quadratic programming for block updates, coupled with a novel quasi-Newton acceleration of convergence. Our algorithm also runs faster and with greater accuracy than the implementation of an Expectation-Maximization (EM) algorithm incorporated in the program FRAPPE. Our simulations show that ADMIXTURE's maximum likelihood estimates of the underlying admixture coefficients and ancestral allele frequencies are as accurate as structure's Bayesian estimates. On real-world data sets, ADMIXTURE's estimates are directly comparable to those from structure and EIGENSTRAT. Taken together, our results show that ADMIXTURE's computational speed opens up the possibility of using a much larger set of markers in model-based ancestry estimation and that its estimates are suitable for use in correcting for population stratification in association studies.

5,846 citations

Journal Article
TL;DR: The general principles behind all EM algorithms are discussed and in detail the specific algorithms for emission and transmission tomography are derived and the specification of necessary physical features such as source and detector geometries are discussed.
Abstract: Two proposed likelihood models for emission and transmission image reconstruction accurately incorporate the Poisson nature of photon counting noise and a number of other relevant physical features As in most algebraic schemes, the region to be reconstructed is divided into small pixels For each pixel a concentration or attenuation coefficient must be estimated In the maximum likelihood approach these parameters are estimated by maximizing the likelihood (probability of the observations) EM algorithms are iterative techniques for finding maximum likelihood estimates In this paper we discuss the general principles behind all EM algorithms and derive in detail the specific algorithms for emission and transmission tomography The virtues of the EM algorithms include (a) accurate incorporation of a good physical model, (b) automatic inclusion of non-negativity constraints on all parameters, (c) an excellent measure of the quality of a reconstruction, and (d) global convergence to a single vector of parameter estimates We discuss the specification of necessary physical features such as source and detector geometries Actual reconstructions are deferred to a later time

1,921 citations

Journal ArticleDOI
TL;DR: The principle behind MM algorithms is explained, some methods for constructing them are suggested, and some of their attractive features are discussed.
Abstract: Most problems in frequentist statistics involve optimization of a function such as a likelihood or a sum of squares. EM algorithms are among the most effective algorithms for maximum likelihood estimation because they consistently drive the likelihood uphill by maximizing a simple surrogate function for the log-likelihood. Iterative optimization of a surrogate function as exemplified by an EM algorithm does not necessarily require missing data. Indeed, every EM algorithm is a special case of the more general class of MM optimization algorithms, which typically exploit convexity rather than missing data in majorizing or minorizing an objective function. In our opinion, MM algorithms deserve to be part of the standard toolkit of professional statisticians. This article explains the principle behind MM algorithms, suggests some methods for constructing them, and discusses some of their attractive features. We include numerous examples throughout the article to illustrate the concepts described. In addition t...

1,756 citations

Journal ArticleDOI
TL;DR: In this paper, an analytical strategy based on maximum likelihood for a general model with multivariate t errors is suggested and applied to a variety of problems, including linear and nonlinear regression, robust estimation of the mean and covariance matrix with missing data, unbalanced multivariate repeated-measures data, multivariate modeling of pedigree data, and multivariate non-linear regression.
Abstract: The t distribution provides a useful extension of the normal for statistical modeling of data sets involving errors with longer-than-normal tails. An analytical strategy based on maximum likelihood for a general model with multivariate t errors is suggested and applied to a variety of problems, including linear and nonlinear regression, robust estimation of the mean and covariance matrix with missing data, unbalanced multivariate repeated-measures data, multivariate modeling of pedigree data, and multivariate nonlinear regression. The degrees of freedom parameter of the t distribution provides a convenient dimension for achieving robust statistical inference, with moderate increases in computational complexity for many models. Estimation of precision from asymptotic theory and the bootstrap is discussed, and graphical methods for checking the appropriateness of the t distribution are presented.

1,336 citations

Journal Article
TL;DR: Algorithms for implementing Thompson's suggestion for codominant markers in the context of automatic haplotyping, estimating location scores, and computing gene-clustering statistics for robust linkage analysis are explored.
Abstract: The introduction of stochastic methods in pedigree analysis has enabled geneticists to tackle computations intractable by standard deterministic methods. Until now these stochastic techniques have worked by running a Markov chain on the set of genetic descent states of a pedigree. Each descent state specifies the paths of gene flow in the pedigree and the founder alleles dropped down each path. The current paper follows up on a suggestion by Elizabeth Thompson that genetic descent graphs offer a more appropriate space for executing a Markov chain. A descent graph specifies the paths of gene flow but not the particular founder alleles traveling down the paths. This paper explores algorithms for implementing Thompson's suggestion for codominant markers in the context of automatic haplotyping, estimating location scores, and computing gene-clustering statistics for robust linkage analysis. Realistic numerical examples demonstrate the feasibility of the algorithms.

1,215 citations


Cited by
More filters
Book
D.L. Donoho1
01 Jan 2004
TL;DR: It is possible to design n=O(Nlog(m)) nonadaptive measurements allowing reconstruction with accuracy comparable to that attainable with direct knowledge of the N most important coefficients, and a good approximation to those N important coefficients is extracted from the n measurements by solving a linear program-Basis Pursuit in signal processing.
Abstract: Suppose x is an unknown vector in Ropfm (a digital image or signal); we plan to measure n general linear functionals of x and then reconstruct. If x is known to be compressible by transform coding with a known transform, and we reconstruct via the nonlinear procedure defined here, the number of measurements n can be dramatically smaller than the size m. Thus, certain natural classes of images with m pixels need only n=O(m1/4log5/2(m)) nonadaptive nonpixel samples for faithful recovery, as opposed to the usual m pixel samples. More specifically, suppose x has a sparse representation in some orthonormal basis (e.g., wavelet, Fourier) or tight frame (e.g., curvelet, Gabor)-so the coefficients belong to an lscrp ball for 0

18,609 citations

Journal ArticleDOI
TL;DR: Arlequin ver 3.0 as discussed by the authors is a software package integrating several basic and advanced methods for population genetics data analysis, like the computation of standard genetic diversity indices, the estimation of allele and haplotype frequencies, tests of departure from linkage equilibrium, departure from selective neutrality and demographic equilibrium, estimation or parameters from past population expansions, and thorough analyses of population subdivision under the AMOVA framework.
Abstract: Arlequin ver 3.0 is a software package integrating several basic and advanced methods for population genetics data analysis, like the computation of standard genetic diversity indices, the estimation of allele and haplotype frequencies, tests of departure from linkage equilibrium, departure from selective neutrality and demographic equilibrium, estimation or parameters from past population expansions, and thorough analyses of population subdivision under the AMOVA framework. Arlequin 3 introduces a completely new graphical interface written in C++, a more robust semantic analysis of input files, and two new methods: a Bayesian estimation of gametic phase from multi-locus genotypes, and an estimation of the parameters of an instantaneous spatial expansion from DNA sequence polymorphism. Arlequin can handle several data types like DNA sequences, microsatellite data, or standard multi-locus genotypes. A Windows version of the software is freely available on http://cmpg.unibe.ch/software/arlequin3.

14,271 citations

Journal ArticleDOI
TL;DR: In comparative timings, the new algorithms are considerably faster than competing methods and can handle large problems and can also deal efficiently with sparse features.
Abstract: We develop fast algorithms for estimation of generalized linear models with convex penalties. The models include linear regression, two-class logistic regression, and multinomial regression problems while the penalties include l(1) (the lasso), l(2) (ridge regression) and mixtures of the two (the elastic net). The algorithms use cyclical coordinate descent, computed along a regularization path. The methods can handle large problems and can also deal efficiently with sparse features. In comparative timings we find that the new algorithms are considerably faster than competing methods.

13,656 citations

Journal ArticleDOI
Adam Auton1, Gonçalo R. Abecasis2, David Altshuler3, Richard Durbin4  +514 moreInstitutions (90)
01 Oct 2015-Nature
TL;DR: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations, and has reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-generation sequencing, deep exome sequencing, and dense microarray genotyping.
Abstract: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.

12,661 citations

Journal Article
Fumio Tajima1
30 Oct 1989-Genomics
TL;DR: It is suggested that the natural selection against large insertion/deletion is so weak that a large amount of variation is maintained in a population.

11,521 citations