scispace - formally typeset
Search or ask a question
JournalISSN: 1471-2156

BMC Genetics 

Springer Nature
About: BMC Genetics is an academic journal published by Springer Nature. The journal publishes majorly in the area(s): Population & Quantitative trait locus. It has an ISSN identifier of 1471-2156. It is also open access. Over the lifetime, 2206 publications have been published receiving 71535 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: The Discriminant Analysis of Principal Components (DAPC) is introduced, a multivariate method designed to identify and describe clusters of genetically related individuals that performs generally better than STRUCTURE at characterizing population subdivision.
Abstract: The dramatic progress in sequencing technologies offers unprecedented prospects for deciphering the organization of natural populations in space and time. However, the size of the datasets generated also poses some daunting challenges. In particular, Bayesian clustering algorithms based on pre-defined population genetics models such as the STRUCTURE or BAPS software may not be able to cope with this unprecedented amount of data. Thus, there is a need for less computer-intensive approaches. Multivariate analyses seem particularly appealing as they are specifically devoted to extracting information from large datasets. Unfortunately, currently available multivariate methods still lack some essential features needed to study the genetic structure of natural populations. We introduce the Discriminant Analysis of Principal Components (DAPC), a multivariate method designed to identify and describe clusters of genetically related individuals. When group priors are lacking, DAPC uses sequential K-means and model selection to infer genetic clusters. Our approach allows extracting rich information from genetic data, providing assignment of individuals to groups, a visual assessment of between-population differentiation, and contribution of individual alleles to population structuring. We evaluate the performance of our method using simulated data, which were also analyzed using STRUCTURE as a benchmark. Additionally, we illustrate the method by analyzing microsatellite polymorphism in worldwide human populations and hemagglutinin gene sequence variation in seasonal influenza. Analysis of simulated data revealed that our approach performs generally better than STRUCTURE at characterizing population subdivision. The tools implemented in DAPC for the identification of clusters and graphical representation of between-group structures allow to unravel complex population structures. Our approach is also faster than Bayesian clustering algorithms by several orders of magnitude, and may be applicable to a wider range of datasets.

3,770 citations

Journal ArticleDOI
TL;DR: Isolation by Distance Web Service (IBDWS) is a user-friendly web interface for determining patterns of isolation by distance and population genetics analysis software is hosted at http://phage.net/projects/ibdws/.
Abstract: The population genetic pattern known as "isolation by distance" results from spatially limited gene flow and is a commonly observed phenomenon in natural populations. However, few software programs exist for estimating the degree of isolation by distance among populations, and they tend not to be user-friendly. We have created Isolation by Distance Web Service (IBDWS) a user-friendly web interface for determining patterns of isolation by distance. Using this site, population geneticists can perform a variety of powerful statistical tests including Mantel tests, Reduced Major Axis (RMA) regression analysis, as well as calculate F ST between all pairs of populations and perform basic summary statistics (e.g., heterozygosity). All statistical results, including publication-quality scatter plots in Postscript format, are returned rapidly to the user and can be easily downloaded. IBDWS population genetics analysis software is hosted at http://phage.sdsu.edu/~jensen/ and documentation is available at http://www.bio.sdsu.edu/pub/andy/IBD.html . The source code has been made available on Source Forge at http://sourceforge.net/projects/ibdws/ .

1,477 citations

Journal ArticleDOI
TL;DR: No decline of linkage disequilibrium within a few hundred base pairs was found in the elite maize germplasm, consistent with the effects of breeding-induced bottlenecks and selection on the elite germplas pool.
Abstract: Recent studies of ancestral maize populations indicate that linkage disequilibrium tends to dissipate rapidly, sometimes within 100 bp. We set out to examine the linkage disequilibrium and diversity in maize elite inbred lines, which have been subject to population bottlenecks and intense selection by breeders. Such population events are expected to increase the amount of linkage disequilibrium, but reduce diversity. The results of this study will inform the design of genetic association studies. We examined the frequency and distribution of DNA polymorphisms at 18 maize genes in 36 maize inbreds, chosen to represent most of the genetic diversity in U.S. elite maize breeding pool. The frequency of nucleotide changes is high, on average one polymorphism per 31 bp in non-coding regions and 1 polymorphism per 124 bp in coding regions. Insertions and deletions are frequent in non-coding regions (1 per 85 bp), but rare in coding regions. A small number (2–8) of distinct and highly diverse haplotypes can be distinguished at all loci examined. Within genes, SNP loci comprising the haplotypes are in linkage disequilibrium with each other. No decline of linkage disequilibrium within a few hundred base pairs was found in the elite maize germplasm. This finding, as well as the small number of haplotypes, relative to neutral expectation, is consistent with the effects of breeding-induced bottlenecks and selection on the elite germplasm pool. The genetic distance between haplotypes is large, indicative of an ancient gene pool and of possible interspecific hybridization events in maize ancestry.

508 citations

Journal ArticleDOI
TL;DR: The combined BXD strain set is the largest mouse RI mapping panel and is a powerful tool for collaborative analysis of quantitative traits and gene function that will be especially useful to study variation in transcriptome and proteome data sets under multiple environments.
Abstract: Recombinant inbred (RI) strains are an important resource for mapping complex traits in many species. While large RI panels are available for Arabidopsis, maize, C. elegans, and Drosophila, mouse RI panels typically consist of fewer than 30 lines. This is a severe constraint on the power and precision of mapping efforts and greatly hampers analysis of epistatic interactions. In order to address these limitations and to provide the community with a more effective collaborative RI mapping panel we generated new BXD RI strains from two independent advanced intercrosses (AI) between C57BL/6J (B6) and DBA/2J (D2) progenitor strains. Progeny were intercrossed for 9 to 14 generations before initiating inbreeding, which is still ongoing for some strains. Since this AI base population is highly recombinant, the 46 advanced recombinant inbred (ARI) strains incorporate approximately twice as many recombinations as standard RI strains, a fraction of which are inevitably shared by descent. When combined with the existing BXD RI strains, the merged BXD strain set triples the number of previously available unique recombinations and quadruples the total number of recombinations in the BXD background. The combined BXD strain set is the largest mouse RI mapping panel. It is a powerful tool for collaborative analysis of quantitative traits and gene function that will be especially useful to study variation in transcriptome and proteome data sets under multiple environments. Additional strains also extend the value of the extensive phenotypic characterization of the previously available strains. A final advantage of expanding the BXD strain set is that both progenitors have been sequenced, and approximately 1.8 million SNPs have been characterized. This provides unprecedented power in screening candidate genes and can reduce the effective length of QTL intervals. It also makes it possible to reverse standard mapping strategies and to explore downstream effects of known sequence variants.

485 citations

Journal ArticleDOI
TL;DR: In the context of large-scale genetic association studies where unknown interactions exist among true risk-associated SNPs or SNPs and environmental covariates, screening SNPs using random forest analyses can significantly reduce the number of SNPs that need to be retained for further study compared to standard univariate screening methods.
Abstract: Genome-wide association studies for complex diseases will produce genotypes on hundreds of thousands of single nucleotide polymorphisms (SNPs). A logical first approach to dealing with massive numbers of SNPs is to use some test to screen the SNPs, retaining only those that meet some criterion for futher study. For example, SNPs can be ranked by p-value, and those with the lowest p-values retained. When SNPs have large interaction effects but small marginal effects in a population, they are unlikely to be retained when univariate tests are used for screening. However, model-based screens that pre-specify interactions are impractical for data sets with thousands of SNPs. Random forest analysis is an alternative method that produces a single measure of importance for each predictor variable that takes into account interactions among variables without requiring model specification. Interactions increase the importance for the individual interacting variables, making them more likely to be given high importance relative to other variables. We test the performance of random forests as a screening procedure to identify small numbers of risk-associated SNPs from among large numbers of unassociated SNPs using complex disease models with up to 32 loci, incorporating both genetic heterogeneity and multi-locus interaction. Keeping other factors constant, if risk SNPs interact, the random forest importance measure significantly outperforms the Fisher Exact test as a screening tool. As the number of interacting SNPs increases, the improvement in performance of random forest analysis relative to Fisher Exact test for screening also increases. Random forests perform similarly to the univariate Fisher Exact test as a screening tool when SNPs in the analysis do not interact. In the context of large-scale genetic association studies where unknown interactions exist among true risk-associated SNPs or SNPs and environmental covariates, screening SNPs using random forest analyses can significantly reduce the number of SNPs that need to be retained for further study compared to standard univariate screening methods.

468 citations

Performance
Metrics
No. of papers from the Journal in previous years
YearPapers
202314
202229
20213
2020147
2019102
2018113