scispace - formally typeset
Search or ask a question
Author

Anthony D. Long

Bio: Anthony D. Long is an academic researcher from University of California, Irvine. The author has contributed to research in topics: Population & Quantitative trait locus. The author has an hindex of 40, co-authored 101 publications receiving 9438 citations. Previous affiliations of Anthony D. Long include University of California & University of California, Berkeley.


Papers
More filters
Journal ArticleDOI
TL;DR: A Bayesian probabilistic framework for microarray data analysis is developed that derives point estimates for both parameters and hyperparameters, and regularized expressions for the variance of each gene by combining the empirical variance with a local background variance associated with neighboring genes.
Abstract: Motivation: DNA microarrays are now capable of providing genome-wide patterns of gene expression across many different conditions. The first level of analysis of these patterns requires determining whether observed differences in expression are significant or not. Current methods are unsatisfactory due to the lack of a systematic framework that can accommodate noise, variability, and low replication often typical of microarray data. Results: We develop a Bayesian probabilistic framework for microarray data analysis. At the simplest level, we model log-expression values by independent normal distributions, parameterized by corresponding means and variances with hierarchical prior distributions. We derive point estimates for both parameters and hyperparameters, and regularized expressions for the variance of each gene by combining the empirical variance with a local background variance associated with neighboring genes. An additional hyperparameter, inversely related to the number of empirical observations, determines the strength of the background variance. Simulations show that these point estimates, combined with a t-test, provide a systematic inference approach that compares favorably with simple t-test or fold methods, and partly compensate for the lack of replication.

1,763 citations

Journal ArticleDOI
27 Jan 2012-Science
TL;DR: The pervasive presence of epistasis among beneficial mutations was inferred, which shaped adaptive trajectories into at least two distinct pathways involving mutations either in the RNA polymerase complex or the termination factor rho.
Abstract: To estimate the number and diversity of beneficial mutations, we experimentally evolved 115 populations of Escherichia coli to 42.2°C for 2000 generations and sequenced one genome from each population. We identified 1331 total mutations, affecting more than 600 different sites. Few mutations were shared among replicates, but a strong pattern of convergence emerged at the level of genes, operons, and functional complexes. Our experiment uncovered a set of primary functional targets of high temperature, but we estimate that many other beneficial mutations could contribute to similar adaptive outcomes. We inferred the pervasive presence of epistasis among beneficial mutations, which shaped adaptive trajectories into at least two distinct pathways involving mutations either in the RNA polymerase complex or the termination factor rho.

735 citations

Journal ArticleDOI
TL;DR: A comparison of genetic diversity between the landrace and inbred samples showed that inbreds retained 77% of the level of diversity of landraces, on average, suggesting that genome-wide surveys for association analyses require SNPs every 100–200 bp.
Abstract: We measured sequence diversity in 21 loci distributed along chromosome 1 of maize (Zea mays ssp. mays L.). For each locus, we sequenced a common sample of 25 individuals representing 16 exotic landraces and nine U.S. inbred lines. The data indicated that maize has an average of one single nucleotide polymorphism (SNP) every 104 bp between two randomly sampled sequences, a level of diversity higher than that of either humans or Drosophila melanogaster. A comparison of genetic diversity between the landrace and inbred samples showed that inbreds retained 77% of the level of diversity of landraces, on average. In addition, Tajima's D values suggest that the frequency distribution of polymorphisms in inbreds was skewed toward fewer rare variants. Tests for selection were applied to all loci, and deviations from neutrality were detected in three loci. Sequence diversity was heterogeneous among loci, but there was no pattern of diversity along the genetic map of chromosome 1. Nonetheless, diversity was correlated (r = 0.65) with sequence-based estimates of the recombination rate. Recombination in our sample was sufficient to break down linkage disequilibrium among SNPs. Intragenic linkage disequilibrium declines within 100-200 bp on average, suggesting that genome-wide surveys for association analyses require SNPs every 100-200 bp.

692 citations

Journal ArticleDOI
TL;DR: Estimates of 4Nc for a number of gene regions and human populations will be of use in determining the density of SNPs that are likely to be required for successful association studies.
Abstract: The statistical power of five association study test statistics (two haplotype-based tests, two marker-based tests, and the Transmission Disequilibrium Test-Q5) to detect single nucleotide polymorphism (SNP)/phenotype associations in a linkage-disequilibrium-based candidate gene scan employing a number of SNPs is examined. Power is estimated as a function of realistic parameters expected to affect the likelihood of detecting a significant association: the number of SNPs examined, the scaled recombination size of the region examined, the proportion of variance in the trait attributable to a hidden causative polymorphism within the region, and the number of individuals or families examined. For the different combinations of parameter values, power is estimated from a large number of realizations of a simulated coalescent describing a single random mating population with mutation, random genetic drift, and recombination. This explicit population genetics model results in a distribution of DNA marker heterozygosities and linkage disequilibria that are likely to resemble those expected in actual population samples. The study concludes that (1) marker-based permutation tests are more powerful than simple haplotype-based tests, (2) there is sufficient power to detect the presence of causative polymorphisms of small effect if on the order of 500 individuals are sampled, (3) greater power is achieved by increasing the sample size than by increasing the number of polymorphisms, (4) association studies are generally more powerful than transmission disequilibrium-based tests, and (5) for the range of parameters considered association studies have a low repeatability unless sample sizes are on the order of 500 individuals. Estimates of 4Nc for a number of gene regions and human populations will be of use in determining the density of SNPs that are likely to be required for successful association studies.

455 citations

Journal ArticleDOI
30 Sep 2010-Nature
TL;DR: In this article, the authors present whole-genome resequencing data from Drosophila melanogaster populations that have experienced over 600 generations of laboratory selection for accelerated development.
Abstract: Experimental evolution systems allow the genomic study of adaptation, and so far this has been done primarily in asexual systems with small genomes, such as bacteria and yeast. Here we present whole-genome resequencing data from Drosophila melanogaster populations that have experienced over 600 generations of laboratory selection for accelerated development. Flies in these selected populations develop from egg to adult ∼20% faster than flies of ancestral control populations, and have evolved a number of other correlated phenotypes. On the basis of 688,520 intermediate-frequency, high-quality single nucleotide polymorphisms, we identify several dozen genomic regions that show strong allele frequency differentiation between a pooled sample of five replicate populations selected for accelerated development and pooled controls. On the basis of resequencing data from a single replicate population with accelerated development, as well as single nucleotide polymorphism data from individual flies from each replicate population, we infer little allele frequency differentiation between replicate populations within a selection treatment. Signatures of selection are qualitatively different than what has been observed in asexual species; in our sexual populations, adaptation is not associated with 'classic' sweeps whereby newly arising, unconditionally advantageous mutations become fixed. More parsimonious explanations include 'incomplete' sweep models, in which mutations have not had enough time to fix, and 'soft' sweep models, in which selection acts on pre-existing, common genetic variants. We conclude that, at least for life history characters such as development time, unconditionally advantageous alleles rarely arise, are associated with small net fitness gains or cannot fix because selection coefficients change over time.

450 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: The hierarchical model of Lonnstedt and Speed (2002) is developed into a practical approach for general microarray experiments with arbitrary numbers of treatments and RNA samples and the moderated t-statistic is shown to follow a t-distribution with augmented degrees of freedom.
Abstract: The problem of identifying differentially expressed genes in designed microarray experiments is considered. Lonnstedt and Speed (2002) derived an expression for the posterior odds of differential expression in a replicated two-color experiment using a simple hierarchical parametric model. The purpose of this paper is to develop the hierarchical model of Lonnstedt and Speed (2002) into a practical approach for general microarray experiments with arbitrary numbers of treatments and RNA samples. The model is reset in the context of general linear models with arbitrary coefficients and contrasts of interest. The approach applies equally well to both single channel and two color microarray experiments. Consistent, closed form estimators are derived for the hyperparameters in the model. The estimators proposed have robust behavior even for small numbers of arrays and allow for incomplete data arising from spot filtering or spot quality weights. The posterior odds statistic is reformulated in terms of a moderated t-statistic in which posterior residual standard deviations are used in place of ordinary standard deviations. The empirical Bayes approach is equivalent to shrinkage of the estimated sample variances towards a pooled estimate, resulting in far more stable inference when the number of arrays is small. The use of moderated t-statistics has the advantage over the posterior odds that the number of hyperparameters which need to estimated is reduced; in particular, knowledge of the non-null prior for the fold changes are not required. The moderated t-statistic is shown to follow a t-distribution with augmented degrees of freedom. The moderated t inferential approach extends to accommodate tests of composite null hypotheses through the use of moderated F-statistics. The performance of the methods is demonstrated in a simulation study. Results are presented for two publicly available data sets.

11,864 citations

Journal Article
Fumio Tajima1
30 Oct 1989-Genomics
TL;DR: It is suggested that the natural selection against large insertion/deletion is so weak that a large amount of variation is maintained in a population.

11,521 citations

Journal Article
TL;DR: For the next few weeks the course is going to be exploring a field that’s actually older than classical population genetics, although the approach it’ll be taking to it involves the use of population genetic machinery.
Abstract: So far in this course we have dealt entirely with the evolution of characters that are controlled by simple Mendelian inheritance at a single locus. There are notes on the course website about gametic disequilibrium and how allele frequencies change at two loci simultaneously, but we didn’t discuss them. In every example we’ve considered we’ve imagined that we could understand something about evolution by examining the evolution of a single gene. That’s the domain of classical population genetics. For the next few weeks we’re going to be exploring a field that’s actually older than classical population genetics, although the approach we’ll be taking to it involves the use of population genetic machinery. If you know a little about the history of evolutionary biology, you may know that after the rediscovery of Mendel’s work in 1900 there was a heated debate between the “biometricians” (e.g., Galton and Pearson) and the “Mendelians” (e.g., de Vries, Correns, Bateson, and Morgan). Biometricians asserted that the really important variation in evolution didn’t follow Mendelian rules. Height, weight, skin color, and similar traits seemed to

9,847 citations

Journal ArticleDOI
TL;DR: Clustering algorithms for data sets appearing in statistics, computer science, and machine learning are surveyed, and their applications in some benchmark data sets, the traveling salesman problem, and bioinformatics, a new field attracting intensive efforts are illustrated.
Abstract: Data analysis plays an indispensable role for understanding various phenomena. Cluster analysis, primitive exploration with little or no prior knowledge, consists of research developed across a wide variety of communities. The diversity, on one hand, equips us with many tools. On the other hand, the profusion of options causes confusion. We survey clustering algorithms for data sets appearing in statistics, computer science, and machine learning, and illustrate their applications in some benchmark data sets, the traveling salesman problem, and bioinformatics, a new field attracting intensive efforts. Several tightly related topics, proximity measure, and cluster validation, are also discussed.

5,744 citations

Journal ArticleDOI
04 May 2011-PLOS ONE
TL;DR: A procedure for constructing GBS libraries based on reducing genome complexity with restriction enzymes (REs) is reported, which is simple, quick, extremely specific, highly reproducible, and may reach important regions of the genome that are inaccessible to sequence capture approaches.
Abstract: Advances in next generation technologies have driven the costs of DNA sequencing down to the point that genotyping-by-sequencing (GBS) is now feasible for high diversity, large genome species. Here, we report a procedure for constructing GBS libraries based on reducing genome complexity with restriction enzymes (REs). This approach is simple, quick, extremely specific, highly reproducible, and may reach important regions of the genome that are inaccessible to sequence capture approaches. By using methylation-sensitive REs, repetitive regions of genomes can be avoided and lower copy regions targeted with two to three fold higher efficiency. This tremendously simplifies computationally challenging alignment problems in species with high levels of genetic diversity. The GBS procedure is demonstrated with maize (IBM) and barley (Oregon Wolfe Barley) recombinant inbred populations where roughly 200,000 and 25,000 sequence tags were mapped, respectively. An advantage in species like barley that lack a complete genome sequence is that a reference map need only be developed around the restriction sites, and this can be done in the process of sample genotyping. In such cases, the consensus of the read clusters across the sequence tagged sites becomes the reference. Alternatively, for kinship analyses in the absence of a reference genome, the sequence tags can simply be treated as dominant markers. Future application of GBS to breeding, conservation, and global species and population surveys may allow plant breeders to conduct genomic selection on a novel germplasm or species without first having to develop any prior molecular tools, or conservation biologists to determine population structure without prior knowledge of the genome or diversity in the species.

5,163 citations