scispace - formally typeset
Search or ask a question
Journal ArticleDOI

pophelper: an R package and web app to analyse and visualize population structure.

01 Jan 2017-Molecular Ecology Resources (Mol Ecol Resour)-Vol. 17, Iss: 1, pp 27-32
TL;DR: The pophelper r package and web app are software tools to aid in population structure analyses which can be used for the analyses and visualization of output generated from population assignment programs such as admixture, structure and tess.
Abstract: The pophelper r package and web app are software tools to aid in population structure analyses. They can be used for the analyses and visualization of output generated from population assignment programs such as admixture, structure and tess. Some of the functions include parsing output run files to tabulate data, estimating K using the Evanno method, generating files for clumpp and functionality to create barplots. These functions can be streamlined into standard r analysis workflows. The latest version of the package is available on github (https://github.com/royfrancis/pophelper). An interactive web version of the pophelper package is available which covers the same functionalities as the r package version with features such as interactive plots, cluster alignment during plotting, sorting individuals and ordering of population groups. The interactive version is available at http://pophelper.com/.
Citations
More filters
Journal ArticleDOI
03 Sep 2020-Cell
TL;DR: The genetic basis common to ticks is explored, including heme and hemoglobin digestion, iron metabolism, and reactive oxygen species, and it is unveiled for the first time that genetic structure and pathogen composition in different tick species are mainly shaped by ecological and geographic factors.

124 citations

Journal ArticleDOI
03 Dec 2019-eLife
TL;DR: It is suggested that the deep diversity observed in Hawaii might represent patterns of ancestral genetic diversity in the C. elegans species before human influence.
Abstract: Hawaiian isolates of the nematode species Caenorhabditis elegans have long been known to harbor genetic diversity greater than the rest of the worldwide population, but this observation was supported by only a small number of wild strains. To better characterize the niche and genetic diversity of Hawaiian C. elegans and other Caenorhabditis species, we sampled different substrates and niches across the Hawaiian islands. We identified hundreds of new Caenorhabditis strains from known species and a new species, Caenorhabditis oiwi. Hawaiian C. elegans are found in cooler climates at high elevations but are not associated with any specific substrate, as compared to other Caenorhabditis species. Surprisingly, admixture analysis revealed evidence of shared ancestry between some Hawaiian and non-Hawaiian C. elegans strains. We suggest that the deep diversity we observed in Hawaii might represent patterns of ancestral genetic diversity in the C. elegans species before human influence.

79 citations

Journal ArticleDOI
TL;DR: Comparative analyses between scenarios modelling two and three domestication events consistently favour a model with only two episodes and suggest that the additional genetic variation component usually detected in African taurine cattle may be explained by hybridization with local aurochs in Africa after the domestication of taurus cattle in the Fertile Crescent.
Abstract: Cattle have been invaluable for the transition of human society from nomadic hunter-gatherers to sedentary farming communities throughout much of Europe, Asia and Africa since the earliest domestication of cattle more than 10,000 years ago. Although current understanding of relationships among ancestral populations remains limited, domestication of cattle is thought to have occurred on two or three occasions, giving rise to the taurine (Bos taurus) and indicine (Bos indicus) species that share the aurochs (Bos primigenius) as common ancestor ~250,000 years ago. Indicine and taurine cattle were domesticated in the Indus Valley and Fertile Crescent, respectively; however, an additional domestication event for taurine in the Western Desert of Egypt has also been proposed. We analysed medium density Illumina Bovine SNP array (~54,000 loci) data across 3,196 individuals, representing 180 taurine and indicine populations to investigate population structure within and between populations, and domestication and demographic dynamics using approximate Bayesian computation (ABC). Comparative analyses between scenarios modelling two and three domestication events consistently favour a model with only two episodes and suggest that the additional genetic variation component usually detected in African taurine cattle may be explained by hybridization with local aurochs in Africa after the domestication of taurine cattle in the Fertile Crescent. African indicine cattle exhibit high levels of shared genetic variation with Asian indicine cattle due to their recent divergence and with African taurine cattle through relatively recent gene flow. Scenarios with unidirectional or bidirectional migratory events between European taurine and Asian indicine cattle are also plausible, although further studies are needed to disentangle the complex human-mediated dispersion patterns of domestic cattle. This study therefore helps to clarify the effect of past demographic history on the genetic variation of modern cattle, providing a basis for further analyses exploring alternative migratory routes for early domestic populations.

71 citations


Cites methods from "pophelper: an R package and web app..."

  • ...The partition solutions were visualized using the POPHELPER R package (Francis, 2017)....

    [...]

Journal ArticleDOI
TL;DR: The first genome-wide association study in Populus deltoides, a genetically diverse keystone forest species in North America and an important short rotation woody crop for the bioenergy industry, suggests both common and low-frequency variants need to be considered for a comprehensive understanding of the genetic regulation of complex traits.
Abstract: Summary Genome-wide association studies (GWAS) have been used extensively to dissect the genetic regulation of complex traits in plants. These studies have focused largely on the analysis of common genetic variants despite the abundance of rare polymorphisms in several species, and their potential role in trait variation. Here, we conducted the first GWAS in Populus deltoides, a genetically diverse keystone forest species in North America and an important short rotation woody crop for the bioenergy industry. We searched for associations between eight growth and wood composition traits, and common and low-frequency single-nucleotide polymorphisms detected by targeted resequencing of 18 153 genes in a population of 391 unrelated individuals. To increase power to detect associations with low-frequency variants, multiple-marker association tests were used in combination with single-marker association tests. Significant associations were discovered for all phenotypes and are indicative that low-frequency polymorphisms contribute to phenotypic variance of several bioenergy traits. Our results suggest that both common and low-frequency variants need to be considered for a comprehensive understanding of the genetic regulation of complex traits, particularly in species that carry large numbers of rare polymorphisms. These polymorphisms may be critical for the development of specialized plant feedstocks for bioenergy.

66 citations

Journal ArticleDOI
TL;DR: It is found that toads form three genetic clusters: 1) native range toads, 2) toads from the source population in Hawaii and long-established areas near introduction sites in Australia, and 3) toad from more recently established northern Australian sites.
Abstract: Invasive species often evolve rapidly following introduction despite genetic bottlenecks that may result from small numbers of founders; however, some invasions may not fit this “genetic paradox.” The invasive cane toad (Rhinella marina) displays high phenotypic variation across its introduced Australian range. Here, we used three genome-wide datasets to characterize their population structure and genetic diversity. We found that toads form three genetic clusters: (1) native range toads, (2) toads from the source population in Hawai’i and long-established areas near introduction sites in Australia, and (3) toads from more recently established northern Australian sites. Although we find an overall reduction in genetic diversity following introduction, we do not see this reduction in loci putatively under selection, suggesting that genetic diversity may have been maintained at ecologically relevant traits, or that mutation rates were high enough to maintain adaptive potential. Nonetheless, toads encounter novel environmental challenges in Australia, and the transition between genetic clusters occurs at a point along the invasion transect where temperature rises and rainfall decreases. We identify environmentally-associated loci known to be involved in resistance to heat and dehydration. This study highlights that natural selection occurs rapidly and plays a vital role in shaping the structure of invasive populations.

64 citations


Cites methods from "pophelper: an R package and web app..."

  • ...We then took the resulting meanQ files from fastStructure and plotted them using the pophelper package (Francis, 2016) in R (Team, 2016)....

    [...]

References
More filters
Journal Article
TL;DR: Copyright (©) 1999–2012 R Foundation for Statistical Computing; permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and permission notice are preserved on all copies.
Abstract: Copyright (©) 1999–2012 R Foundation for Statistical Computing. Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice are preserved on all copies. Permission is granted to copy and distribute modified versions of this manual under the conditions for verbatim copying, provided that the entire resulting derived work is distributed under the terms of a permission notice identical to this one. Permission is granted to copy and distribute translations of this manual into another language, under the above conditions for modified versions, except that this permission notice may be stated in a translation approved by the R Core Team.

272,030 citations

Book
13 Aug 2009
TL;DR: This book describes ggplot2, a new data visualization package for R that uses the insights from Leland Wilkisons Grammar of Graphics to create a powerful and flexible system for creating data graphics.
Abstract: This book describes ggplot2, a new data visualization package for R that uses the insights from Leland Wilkisons Grammar of Graphics to create a powerful and flexible system for creating data graphics. With ggplot2, its easy to: produce handsome, publication-quality plots, with automatic legends created from the plot specification superpose multiple layers (points, lines, maps, tiles, box plots to name a few) from different data sources, with automatically adjusted common scales add customisable smoothers that use the powerful modelling capabilities of R, such as loess, linear models, generalised additive models and robust regression save any ggplot2 plot (or part thereof) for later modification or reuse create custom themes that capture in-house or journal style requirements, and that can easily be applied to multiple plots approach your graph from a visual perspective, thinking about how each component of the data is represented on the final plot. This book will be useful to everyone who has struggled with displaying their data in an informative and attractive way. You will need some basic knowledge of R (i.e. you should be able to get your data into R), but ggplot2 is a mini-language specifically tailored for producing graphics, and youll learn everything you need in the book. After reading this book youll be able to produce graphics customized precisely for your problems,and youll find it easy to get graphics out of your head and on to the screen or page.

29,504 citations


"pophelper: an R package and web app..." refers methods in this paper

  • ...The POPHELPER package employs the ggplot (Wickham 2009) graphing package to create high-quality graphics....

    [...]

Journal ArticleDOI
01 Jun 2000-Genetics
TL;DR: Pritch et al. as discussed by the authors proposed a model-based clustering method for using multilocus genotype data to infer population structure and assign individuals to populations, which can be applied to most of the commonly used genetic markers, provided that they are not closely linked.
Abstract: We describe a model-based clustering method for using multilocus genotype data to infer population structure and assign individuals to populations. We assume a model in which there are K populations (where K may be unknown), each of which is characterized by a set of allele frequencies at each locus. Individuals in the sample are assigned (probabilistically) to populations, or jointly to two or more populations if their genotypes indicate that they are admixed. Our model does not assume a particular mutation process, and it can be applied to most of the commonly used genetic markers, provided that they are not closely linked. Applications of our method include demonstrating the presence of population structure, assigning individuals to populations, studying hybrid zones, and identifying migrants and admixed individuals. We show that the method can produce highly accurate assignments using modest numbers of loci— e.g. , seven microsatellite loci in an example using genotype data from an endangered bird species. The software used for this article is available from http://www.stats.ox.ac.uk/~pritch/home.html.

27,454 citations

Journal ArticleDOI
TL;DR: It is found that in most cases the estimated ‘log probability of data’ does not provide a correct estimation of the number of clusters, K, and using an ad hoc statistic ΔK based on the rate of change in the log probability between successive K values, structure accurately detects the uppermost hierarchical level of structure for the scenarios the authors tested.
Abstract: The identification of genetically homogeneous groups of individuals is a long standing issue in population genetics. A recent Bayesian algorithm implemented in the software STRUCTURE allows the identification of such groups. However, the ability of this algorithm to detect the true number of clusters (K) in a sample of individuals when patterns of dispersal among populations are not homogeneous has not been tested. The goal of this study is to carry out such tests, using various dispersal scenarios from data generated with an individual-based model. We found that in most cases the estimated 'log probability of data' does not provide a correct estimation of the number of clusters, K. However, using an ad hoc statistic DeltaK based on the rate of change in the log probability of data between successive K values, we found that STRUCTURE accurately detects the uppermost hierarchical level of structure for the scenarios we tested. As might be expected, the results are sensitive to the type of genetic marker used (AFLP vs. microsatellite), the number of loci scored, the number of populations sampled, and the number of individuals typed in each sample.

18,572 citations


"pophelper: an R package and web app..." refers methods in this paper

  • ...The approach by Evanno et al. (2005) is one method to estimate the value of K and has been widely cited (Morgan et al. 2007; Blackburn & Maddison 2014; Diez et al. 2015)....

    [...]

Journal ArticleDOI
TL;DR: STRUCTURE HARVESTER is presented, a web-based program for collating results generated by the program STRUCTURE, which provides a fast way to assess and visualize likelihood values across multiple values of K and hundreds of iterations for easier detection of the number of genetic groups that best fit the data.
Abstract: We present STRUCTURE HARVESTER (available at http://taylor0.biology.ucla.edu/structureHarvester/ ), a web-based program for collating results generated by the program STRUCTURE. The program provides a fast way to assess and visualize likelihood values across multiple values of K and hundreds of iterations for easier detection of the number of genetic groups that best fit the data. In addition, STRUCTURE HARVESTER will reformat data for use in downstream programs, such as CLUMPP.

9,960 citations


"pophelper: an R package and web app..." refers background in this paper

  • ...The most popular of these ‘helper’ programs was STRUCTURE HARVESTER (Earl 2012)....

    [...]

  • ...STRUCTURE HARVESTER is a web-based utility with a graphical user interface that can accept STRUCTURE runs to generate Evanno plots and input files for CLUMPP, but does not produce plots or allow to work with CLUMPP output....

    [...]