scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Arlequin suite ver 3.5: a new series of programs to perform population genetics analyses under Linux and Windows

01 May 2010-Molecular Ecology Resources (Mol Ecol Resour)-Vol. 10, Iss: 3, pp 564-567
TL;DR: The main innovations of the new version of the Arlequin program include enhanced outputs in XML format, the possibility to embed graphics displaying computation results directly into output files, and the implementation of a new method to detect loci under selection from genome scans.
Abstract: We present here a new version of the Arlequin program available under three different forms: a Windows graphical version (Winarl35), a console version of Arlequin (arlecore), and a specific console version to compute summary statistics (arlsumstat). The command-line versions run under both Linux and Windows. The main innovations of the new version include enhanced outputs in XML format, the possibility to embed graphics displaying computation results directly into output files, and the implementation of a new method to detect loci under selection from genome scans. Command-line versions are designed to handle large series of files, and arlsumstat can be used to generate summary statistics from simulated data sets within an Approximate Bayesian Computation framework.
Citations
More filters
Journal ArticleDOI
TL;DR: GenAlEx: Genetic Analysis in Excel is a cross-platform package for population genetic analyses that runs within Microsoft Excel that offers analysis of diploid codominant, haploid and binary genetic loci and DNA sequences.
Abstract: Summary: GenAlEx: Genetic Analysis in Excel is a cross-platform package for population genetic analyses that runs within Microsoft Excel. GenAlEx offers analysis of diploid codominant, haploid and binary genetic loci and DNA sequences. Both frequency-based (F-statistics, heterozygosity, HWE, population assignment, relatedness) and distance-based (AMOVA, PCoA, Mantel tests, multivariate spatial autocorrelation) analyses are provided. New features include calculation of new estimators of population structure: G′ST, G′′ST, Jost’s Dest and F′ST through AMOVA, Shannon Information analysis, linkage disequilibrium analysis for biallelic data and novel heterogeneity tests for spatial autocorrelation analysis. Export to more than 30 other data formats is provided. Teaching tutorials and expanded step-by-step output options are included. The comprehensive guide has been fully revised. Availability and implementation: GenAlEx is written in VBA and provided as a Microsoft Excel Add-in (compatible with Excel 2003, 2007, 2010 on PC; Excel 2004, 2011 on Macintosh). GenAlEx, and supporting documentation and tutorials are freely available at: http://biology.anu.edu.au/GenAlEx. Contact: rod.peakall@anu.edu.au

9,564 citations

Journal ArticleDOI
TL;DR: The DNA Sequence Polymorphism (DnaSP) software as mentioned in this paper is a popular tool for performing exhaustive population genetic analyses on multiple sequence alignments, such as single and multi-locus coalescent simulations under a wide range of demographic scenarios.
Abstract: We present version 6 of the DNA Sequence Polymorphism (DnaSP) software, a new version of the popular tool for performing exhaustive population genetic analyses on multiple sequence alignments. This major upgrade incorporates novel functionalities to analyze large data sets, such as those generated by high-throughput sequencing technologies. Among other features, DnaSP 6 implements: 1) modules for reading and analyzing data from genomic partitioning methods, such as RADseq or hybrid enrichment approaches, 2) faster methods scalable for high-throughput sequencing data, and 3) summary statistics for the analysis of multi-locus population genetics data. Furthermore, DnaSP 6 includes novel modules to perform single- and multi-locus coalescent simulations under a wide range of demographic scenarios. The DnaSP 6 program, with extensive documentation, is freely available at http://www.ub.edu/dnasp.

3,277 citations

Journal ArticleDOI
TL;DR: The expanded population genomics functions in Stacks will make it a useful tool to harness the newest generation of massively parallel genotyping data for ecological and evolutionary genetics.
Abstract: Massively parallel short-read sequencing technologies, coupled with powerful software platforms, are enabling investigators to analyse tens of thousands of genetic markers. This wealth of data is rapidly expanding and allowing biological questions to be addressed with unprecedented scope and precision. The sizes of the data sets are now posing significant data processing and analysis challenges. Here we describe an extension of the Stacks software package to efficiently use genotype-by-sequencing data for studies of populations of organisms. Stacks now produces core population genomic summary statistics and SNP-by-SNP statistical tests. These statistics can be analysed across a reference genome using a smoothed sliding window. Stacks also now provides several output formats for several commonly used downstream analysis packages. The expanded population genomics functions in Stacks will make it a useful tool to harness the newest generation of massively parallel genotyping data for ecological and evolutionary genetics.

2,958 citations

Journal ArticleDOI
TL;DR: PGDSpider, a Java program that can read 27 different file formats and export data into 29, partially overlapping, other file formats, is introduced, allowing its integration in complex data analysis pipelines.
Abstract: Summary: The analysis of genetic data often requires a combination of several approaches using different and sometimes incompatible programs. In order to facilitate data exchange and file conversions between population genetics programs, we introduce PGDSpider, a Java program that can read 27 different file formats and export data into 29, partially overlapping, other file formats. The PGDSpider package includes both an intuitive graphical user interface and a command-line version allowing its integration in complex data analysis pipelines. Availability: PGDSpider is freely available under the BSD 3-Clause license on http://cmpg.unibe.ch/software/PGDSpider/

960 citations

Journal ArticleDOI
TL;DR: In this paper, the relationship between the three classes of statistics (F(ST), F'(ST) and D), their estimation and their properties is discussed, and the authors illustrate the relationships between the statistics using a data set of estimates from 84 species taken from the last 4 years of Molecular Ecology.
Abstract: Although F(ST) is widely used as a measure of population structure, it has been criticized recently because of its dependency on within-population diversity. This dependency can lead to difficulties in interpretation and in the comparison of estimates among species or among loci and has led to the development of two replacement statistics, F'(ST) and D. F'(ST) is the normal F(ST) standardized by the maximum value it can obtain, given the observed within-population diversity. D uses a multiplicative partitioning of diversity, based on the effective number of alleles rather than on the expected heterozygosity. In this study, we review the relationships between the three classes of statistics (F(ST), F'(ST) and D), their estimation and their properties. We illustrate the relationships between the statistics using a data set of estimates from 84 species taken from the last 4 years of Molecular Ecology. As with F(ST), unbiased estimators are available for the two new statistics D and F'(ST). Here, we develop a new unbiased F'(ST) estimator based on G(ST), which we call G''(ST). However, F'(ST) can be calculated using any F(ST) estimator for which the maximum value can be obtained. As all three statistics have their advantages and their drawbacks, we recommend continued use of F(ST) in combination with either F'(ST) or D. In most cases, F'(ST) would be the best choice among the latter two as it is most suited for inferences of the influence of demographic processes such as genetic drift and migration on genetic population structure.

943 citations

References
More filters
Journal ArticleDOI
TL;DR: This work introduces PLINK, an open-source C/C++ WGAS tool set, and describes the five main domains of function: data management, summary statistics, population stratification, association analysis, and identity-by-descent estimation, which focuses on the estimation and use of identity- by-state and identity/descent information in the context of population-based whole-genome studies.
Abstract: Whole-genome association studies (WGAS) bring new computational, as well as analytic, challenges to researchers. Many existing genetic-analysis tools are not designed to handle such large data sets in a convenient manner and do not necessarily exploit the new opportunities that whole-genome data bring. To address these issues, we developed PLINK, an open-source C/C++ WGAS tool set. With PLINK, large data sets comprising hundreds of thousands of markers genotyped for thousands of individuals can be rapidly manipulated and analyzed in their entirety. As well as providing tools to make the basic analytic steps computationally efficient, PLINK also supports some novel approaches to whole-genome data that take advantage of whole-genome coverage. We introduce PLINK and describe the five main domains of function: data management, summary statistics, population stratification, association analysis, and identity-by-descent estimation. In particular, we focus on the estimation and use of identity-by-state and identity-by-descent information in the context of population-based whole-genome studies. This information can be used to detect and correct for population stratification and to identify extended chromosomal segments that are shared identical by descent between very distantly related individuals. Analysis of the patterns of segmental sharing has the potential to map disease loci that contain multiple rare variants in a population-based linkage analysis.

26,280 citations


"Arlequin suite ver 3.5: a new serie..." refers methods in this paper

  • ...Some software packages (e.g. plink Purcell et al. 2007) have been specifically developed to both handle such huge data sets and to directly perform statistical analyses on the data....

    [...]

Journal ArticleDOI
TL;DR: Arlequin ver 3.0 as discussed by the authors is a software package integrating several basic and advanced methods for population genetics data analysis, like the computation of standard genetic diversity indices, the estimation of allele and haplotype frequencies, tests of departure from linkage equilibrium, departure from selective neutrality and demographic equilibrium, estimation or parameters from past population expansions, and thorough analyses of population subdivision under the AMOVA framework.
Abstract: Arlequin ver 3.0 is a software package integrating several basic and advanced methods for population genetics data analysis, like the computation of standard genetic diversity indices, the estimation of allele and haplotype frequencies, tests of departure from linkage equilibrium, departure from selective neutrality and demographic equilibrium, estimation or parameters from past population expansions, and thorough analyses of population subdivision under the AMOVA framework. Arlequin 3 introduces a completely new graphical interface written in C++, a more robust semantic analysis of input files, and two new methods: a Bayesian estimation of gametic phase from multi-locus genotypes, and an estimation of the parameters of an instantaneous spatial expansion from DNA sequence polymorphism. Arlequin can handle several data types like DNA sequences, microsatellite data, or standard multi-locus genotypes. A Windows version of the software is freely available on http://cmpg.unibe.ch/software/arlequin3.

14,271 citations

Journal ArticleDOI
TL;DR: A technical review of template preparation, sequencing and imaging, genome alignment and assembly approaches, and recent advances in current and near-term commercially available NGS instruments is presented.
Abstract: Demand has never been greater for revolutionary technologies that deliver fast, inexpensive and accurate genome information. This challenge has catalysed the development of next-generation sequencing (NGS) technologies. The inexpensive production of large volumes of sequence data is the primary advantage over conventional methods. Here, I present a technical review of template preparation, sequencing and imaging, genome alignment and assembly approaches, and recent advances in current and near-term commercially available NGS instruments. I also outline the broad range of applications for NGS technologies, in addition to providing guidelines for platform selection to address biological questions of interest.

7,023 citations


"Arlequin suite ver 3.5: a new serie..." refers methods in this paper

  • ...Some software packages (e.g. plink Purcell et al. 2007) have been specifically developed to both handle such huge data sets and to directly perform statistical analyses on the data....

    [...]

Journal ArticleDOI
TL;DR: It is suggested that genetic variation at a discrepant locus, Identified under these conditions, is likely to have been influenced by natural selection, either acting on the locus itself or at a closely linked locus.
Abstract: Loci that show unusually low or high levels of genetic differentiation are often assumed to be subject to natural selection. We propose a method for the identification of loci showing such disparities. The differentiation can be quantified using the statistic F ST . For a range of population structures and demographic histories, the distribution of F ST is strongly related to the heterozygosity at a locus. Outlying values of F ST can be identified in a plot of F ST vs . heterozygosity using a null distribution generated by a simple genetic model. We use published data-sets to illustrate the importance of the relationship with heterozygosity. We investigate a number of models of population structure, and demonstrate that the null distribution is robust to a wide range of conditions. In particular, the distribution is robust to differing mutation rates, and therefore different molecular markers, such as allozymes, restriction fragment length polymorphisms (RFLPS) and single strand conformation polymorphisms (SSCPS) can be compared together. We suggest that genetic variation at a discrepant locus, Identified under these conditions, is likely to have been influenced by natural selection, either acting on the locus itself or at a closely linked locus.

1,832 citations


"Arlequin suite ver 3.5: a new serie..." refers methods in this paper

  • ...…ver 3.5 a procedure to detect loci under selection from genome scans that contrast patterns of genetic diversity within and between populations, extending the FDIST approach of Beaumont & Nichols (1996) to the case where populations are hierarchically structured (see Excoffier et al. 2009)....

    [...]

Journal ArticleDOI
TL;DR: It is demonstrated that the mean ratio of the number of alleles to the range in allele size, which is calculated from a population sample of microsatellite loci, can be used to detect reductions in population size and that the value of M consistently predicts the reported demographic history for these populations.
Abstract: We demonstrate that the mean ratio of the number of alleles to the range in allele size, which we term M, calculated from a population sample of microsatellite loci, can be used to detect reductions in population size. Using simulations, we show that, for a general class of mutation models, the value of M decreases when a population is reduced in size. The magnitude of the decrease is positively correlated with the severity and duration of the reduction in size. We also find that the rate of recovery of M following a reduction in size is positively correlated with post-reduction population size, but that recovery occurs in both small and large populations. This indicates that M can distinguish between populations that have been recently reduced in size and those which have been small for a long time. We employ M to develop a statistical test for recent reductions in population size that can detect such changes for more than 100 generations with the post-reduction demographic scenarios we examine. We also compute M for a variety of populations and species using microsatellite data collected from the literature. We find that the value of M consistently predicts the reported demographic history for these populations. This method, and others like it, promises to be an important tool for the conservation and management of populations that are in need of intervention or recovery.

1,537 citations


"Arlequin suite ver 3.5: a new serie..." refers background in this paper

  • ...…K, H, GW, R, and FIS, which stand for the average number of alleles per locus, the average heterozygosity, the average Garza-Williamson statistic 2010 Blackwell Publishing Ltd (Garza & Williamson 2001), the average microsatellite allelic range, and the inbreeding coefficient, respectively....

    [...]