scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Efficient Methods to Compute Genomic Predictions

01 Nov 2008-Journal of Dairy Science (Elsevier)-Vol. 91, Iss: 11, pp 4414-4423
TL;DR: Efficient methods for processing genomic data were developed to increase reliability of estimated breeding values and to estimate thousands of marker effects simultaneously, and a blend of first- and second-order Jacobi iteration using 2 separate relaxation factors converged well for allele frequencies and effects.
About: This article is published in Journal of Dairy Science.The article was published on 2008-11-01 and is currently open access. It has received 4196 citations till now. The article focuses on the topics: Best linear unbiased prediction & Allele frequency.
Citations
More filters
Journal ArticleDOI
TL;DR: The GCTA software is a versatile tool to estimate and partition complex trait variation with large GWAS data sets and focuses on the function of estimating the variance explained by all the SNPs on the X chromosome and testing the hypotheses of dosage compensation.
Abstract: For most human complex diseases and traits, SNPs identified by genome-wide association studies (GWAS) explain only a small fraction of the heritability. Here we report a user-friendly software tool called genome-wide complex trait analysis (GCTA), which was developed based on a method we recently developed to address the “missing heritability” problem. GCTA estimates the variance explained by all the SNPs on a chromosome or on the whole genome for a complex trait rather than testing the association of any particular SNP to the trait. We introduce GCTA's five main functions: data management, estimation of the genetic relationships from SNPs, mixed linear model analysis of variance explained by the SNPs, estimation of the linkage disequilibrium structure, and GWAS simulation. We focus on the function of estimating the variance explained by all the SNPs on the X chromosome and testing the hypotheses of dosage compensation. The GCTA software is a versatile tool to estimate and partition complex trait variation with large GWAS data sets.

5,867 citations

Journal ArticleDOI
TL;DR: An R package called GAPIT is developed that implements advanced statistical methods including the compressed mixed linear model (CMLM) and CMLM-based genomic prediction and selection and can handle large datasets in excess of 10 000 individuals and 1 million single-nucleotide polymorphisms with minimal computational time.
Abstract: Summary: Software programs that conduct genome-wide association studies and genomic prediction and selection need to use methodologies that maximize statistical power, provide high prediction accuracy and run in a computationally efficient manner. We developed an R package called Genome Association and Prediction Integrated Tool (GAPIT) that implements advanced statistical methods including the compressed mixed linear model (CMLM) and CMLM-based genomic prediction and selection. The GAPIT package can handle large datasets in excess of 10 000 individuals and 1 million single-nucleotide polymorphisms with minimal computational time, while providing user-friendly access and concise tables and graphs to interpret

1,583 citations


Cites background or methods from "Efficient Methods to Compute Genomi..."

  • ...When the kinship matrix is not provided, it will be calculated with the methods of VanRaden (VanRaden, 2008), Loiselle (Loiselle et al., 1995) or EMMA (Kang et al., 2008)....

    [...]

  • ...Received on April 11, 2012; revised on July 3, 2012; accepted on July 8, 2012...

    [...]

Journal ArticleDOI
TL;DR: BOLT-LMM is presented, which requires only a small number of O(MN) time iterations and increases power by modeling more realistic, non-infinitesimal genetic architectures via a Bayesian mixture prior on marker effect sizes.
Abstract: Linear mixed models are a powerful statistical tool for identifying genetic associations and avoiding confounding. However, existing methods are computationally intractable in large cohorts and may not optimize power. All existing methods require time cost O(MN(2)) (where N is the number of samples and M is the number of SNPs) and implicitly assume an infinitesimal genetic architecture in which effect sizes are normally distributed, which can limit power. Here we present a far more efficient mixed-model association method, BOLT-LMM, which requires only a small number of O(MN) time iterations and increases power by modeling more realistic, non-infinitesimal genetic architectures via a Bayesian mixture prior on marker effect sizes. We applied BOLT-LMM to 9 quantitative traits in 23,294 samples from the Women's Genome Health Study (WGHS) and observed significant increases in power, consistent with simulations. Theory and simulations show that the boost in power increases with cohort size, making BOLT-LMM appealing for genome-wide association studies in large cohorts.

1,232 citations

Journal ArticleDOI
TL;DR: Genotypes for 38,416 markers and August 2003 genetic evaluations for 3,576 Holstein bulls born before 1999 were used to predict January 2008 daughter deviations and genomic prediction improves reliability by tracing the inheritance of genes even with small effects.

1,166 citations


Cites background or methods from "Efficient Methods to Compute Genomi..."

  • ...Equations of VanRaden (2008) allowed distinction between known and missing genotypes, but the alternative of regressing on probabilities for all genotypes could increase accuracy and should be examined....

    [...]

  • ...Predictions were computed using linear and nonlinear genomic models (VanRaden, 2007, 2008)....

    [...]

  • ...Gains in R(2) averaged 3% with simulated data (VanRaden, 2008) but generally were smaller with real data, which indicated that most...

    [...]

  • ...Gains in R2 averaged 3% with simulated data (VanRaden, 2008) but generally were smaller with real data, which indicated that most Journal of Dairy Science Vol. 92 No. 1, 2009 traits are influenced by more loci than the 100 QTL used in simulation....

    [...]

Journal ArticleDOI
TL;DR: A national single-step genetic evaluation with the pedigree relationship matrix augmented with genomic information provided genomic predictions with accuracy and bias comparable to multiple-step procedures and could account for any population or data structure.

1,095 citations


Cites background or methods from "Efficient Methods to Compute Genomi..."

  • ...Genomic evaluations are currently calculated with a multiple-step procedure (VanRaden, 2008; Hayes et al., 2009)....

    [...]

  • ...To facilitate inversion, final analyses used a weighted G as proposed by VanRaden (2008): G = 0.95Gb + 0.05A22....

    [...]

  • ...(VanRaden, 2008), which assumes a priori independence of SNP effects (Gianola et al....

    [...]

  • ...The scaling parameter k was defined as k p pj j= ∑ −2 1( ) (VanRaden, 2008), which assumes a priori independence of SNP effects (Gianola et al., 2009)....

    [...]

  • ...For example, estimation of genomic effects has several options (Meuwissen et al., 2001; Gianola et al., 2006; VanRaden, 2008; de los Campos et al., 2009)....

    [...]

References
More filters
Journal ArticleDOI
01 Apr 2001-Genetics
TL;DR: It was concluded that selection on genetic values predicted from markers could substantially increase the rate of genetic gain in animals and plants, especially if combined with reproductive techniques to shorten the generation interval.
Abstract: Recent advances in molecular genetic techniques will make dense marker maps available and genotyping many individuals for these markers feasible. Here we attempted to estimate the effects of ∼50,000 marker haplotypes simultaneously from a limited number of phenotypic records. A genome of 1000 cM was simulated with a marker spacing of 1 cM. The markers surrounding every 1-cM region were combined into marker haplotypes. Due to finite population size (Ne = 100), the marker haplotypes were in linkage disequilibrium with the QTL located between the markers. Using least squares, all haplotype effects could not be estimated simultaneously. When only the biggest effects were included, they were overestimated and the accuracy of predicting genetic values of the offspring of the recorded animals was only 0.32. Best linear unbiased prediction of haplotype effects assumed equal variances associated to each 1-cM chromosomal segment, which yielded an accuracy of 0.73, although this assumption was far from true. Bayesian methods that assumed a prior distribution of the variance associated with each chromosome segment increased this accuracy to 0.85, even when the prior was not correct. It was concluded that selection on genetic values predicted from markers could substantially increase the rate of genetic gain in animals and plants, especially if combined with reproductive techniques to shorten the generation interval.

6,036 citations


"Efficient Methods to Compute Genomi..." refers background or methods in this paper

  • ...costs from quadratic with number of markers (Meuwissen et al., 2001) to linear....

    [...]

  • ...Genomic selection increases the rate of genetic improvement and reduces cost of progeny testing by allowing breeders to preselect animals that inherited chromosome segments of greater merit (Meuwissen et al., 2001; Schaeffer, 2006)....

    [...]

  • ...Nonlinear predictions A and B are analogous but not identical to Bayesian A and B methods of Meuwissen et al. (2001), and other prior distributions could fit actual data better....

    [...]

  • ...Genomic selection increases the rate of genetic improvement and reduces cost of progeny testing by allowing breeders to preselect animals that inherited chromosome segments of greater merit (Meuwissen et al., 2001; Schaeffer, 2006)....

    [...]

  • ...That procedure reduced computing Journal of Dairy Science Vol. 91 No. 11, 2008 costs from quadratic with number of markers (Meuwissen et al., 2001) to linear....

    [...]

Journal ArticleDOI
TL;DR: The importance of having a coefficient by means of which the degree of inbreeding may be expressed has been brought out by Pearl' in a number of papers published between 1913 and 1917.
Abstract: IN the breeding of domestic animals consanguineous matings are frequently made. Occasionally matings are made between very close relatives-sire and daughter, brother and sister, etc.-but as a. rule such close inbreeding is avoided and there is instead an attempt to concentrate the blood of some noteworthy individual by what is known as line breeding. No regular system of mating such as might be followed with laboratory animals is practicable as a rule. The importance of having a coefficient by means of which the degree of inbreeding may be expressed has been brought out by Pearl' in a number of papers published between 1913 and 1917. His coefficient is based on the smaller number of ancestors in each generation back of an inbred individual, as compared with the maximum possible number. A separate coefficient is obtained for each generation by the formula

1,928 citations

Journal ArticleDOI
TL;DR: Genome-wide selection may become a popular tool for genetic improvement in livestock after a strategy that utilizes these advantages was compared with a traditional progeny testing strategy under a typical Canadian-like dairy cattle situation.
Abstract: Animals can be genotyped for thousands of single nucleotide polymorphisms (SNPs) at one time, where the SNPs are located at roughly 1-cM intervals throughout the genome. For each contiguous pair of SNPs there are four possible haplotypes that could be inherited from the sire. The effects of each interval on a trait can be estimated for all intervals simultaneously in a model where interval effects are random factors. Given the estimated effects of each haplotype for every interval in the genome, and given an animal's genotype, a 'genomic' estimated breeding value is obtained by summing the estimated effects for that genotype. The accuracy of that estimator of breeding values is around 80%. Because the genomic estimated breeding values can be calculated at birth, and because it has a high accuracy, a strategy that utilizes these advantages was compared with a traditional progeny testing strategy under a typical Canadian-like dairy cattle situation. Costs of proving bulls were reduced by 92% and genetic change was increased by a factor of 2. Genome-wide selection may become a popular tool for genetic improvement in livestock.

785 citations


"Efficient Methods to Compute Genomi..." refers background in this paper

  • ...Genomic selection increases the rate of genetic improvement and reduces cost of progeny testing by allowing breeders to preselect animals that inherited chromosome segments of greater merit (Meuwissen et al., 2001; Schaeffer, 2006)....

    [...]

  • ...Genomic selection increases the rate of genetic improvement and reduces cost of progeny testing by allowing breeders to preselect animals that inherited chromosome segments of greater merit (Meuwissen et al., 2001; Schaeffer, 2006)....

    [...]

Journal ArticleDOI
TL;DR: New terms and definitions were developed to explain national USDA genetic evaluations computed by an animal model, whereiability is the squared correlation of predicted and true transmitting ability.

403 citations


"Efficient Methods to Compute Genomi..." refers methods in this paper

  • ...When DYD was the dependent variable, regressions were weighted by reliability from daughters, which was computed as total daughter equivalents minus daughter equivalents from parent average (VanRaden and Wiggans, 1991)....

    [...]

Journal ArticleDOI
01 Jan 2008-Genetics
TL;DR: It was concluded that genomic selection is considerably more accurate than traditional selection, especially for a low-heritability trait.
Abstract: Genomic selection uses total breeding values for juvenile animals, predicted from a large number of estimated marker haplotype effects across the whole genome. In this study the accuracy of predicting breeding values is compared for four different models including a large number of markers, at different marker densities for traits with heritabilities of 50 and 10%. The models estimated the effect of (1) each single-marker allele [single-nucleotide polymorphism (SNP)1], (2) haplotypes constructed from two adjacent marker alleles (SNP2), and (3) haplotypes constructed from 2 or 10 markers, including the covariance between haplotypes by combining linkage disequilibrium and linkage analysis (HAP_IBD2 and HAP_IBD10). Between 119 and 2343 polymorphic SNPs were simulated on a 3-M genome. For the trait with a heritability of 10%, the differences between models were small and none of them yielded the highest accuracies across all marker densities. For the trait with a heritability of 50%, the HAP_IBD10 model yielded the highest accuracies of estimated total breeding values for juvenile and phenotyped animals at all marker densities. It was concluded that genomic selection is considerably more accurate than traditional selection, especially for a low-heritability trait.

381 citations