scispace - formally typeset
Search or ask a question
Author

M.S. Lund

Bio: M.S. Lund is an academic researcher from Aarhus University. The author has contributed to research in topics: Population & Imputation (genetics). The author has an hindex of 12, co-authored 12 publications receiving 1329 citations.

Papers
More filters
Journal ArticleDOI
TL;DR: The 1000 bull genomes project supports the goal of accelerating the rates of genetic gain in domestic cattle while at the same time considering animal health and welfare by providing the annotated sequence variants and genotypes of key ancestor bulls.
Abstract: The 1000 bull genomes project supports the goal of accelerating the rates of genetic gain in domestic cattle while at the same time considering animal health and welfare by providing the annotated sequence variants and genotypes of key ancestor bulls. In the first phase of the 1000 bull genomes project, we sequenced the whole genomes of 234 cattle to an average of 8.3-fold coverage. This sequencing includes data for 129 individuals from the global Holstein-Friesian population, 43 individuals from the Fleckvieh breed and 15 individuals from the Jersey breed. We identified a total of 28.3 million variants, with an average of 1.44 heterozygous sites per kilobase for each individual. We demonstrate the use of this database in identifying a recessive mutation underlying embryonic death and a dominant mutation underlying lethal chrondrodysplasia. We also performed genome-wide association studies for milk production and curly coat, using imputed sequence variants, and identified variants associated with these traits in cattle.

690 citations

Journal ArticleDOI
TL;DR: This paper assesses the increase in reliability achieved when combining four Holstein reference populations of 4000 bulls each from European breeding organizations, i.e. UNCEIA, VikingGenetics, DHV-VIT and CRV, into a single large reference population.
Abstract: Size of the reference population and reliability of phenotypes are crucial factors influencing the reliability of genomic predictions. It is therefore useful to combine closely related populations. Increased accuracies of genomic predictions depend on the number of individuals added to the reference population, the reliability of their phenotypes, and the relatedness of the populations that are combined. This paper assesses the increase in reliability achieved when combining four Holstein reference populations of 4000 bulls each, from European breeding organizations, i.e. UNCEIA (France), VikingGenetics (Denmark, Sweden, Finland), DHV-VIT (Germany) and CRV (The Netherlands, Flanders). Each partner validated its own bulls using their national reference data and the combined data, respectively. Combining the data significantly increased the reliability of genomic predictions for bulls in all four populations. Reliabilities increased by 10%, compared to reliabilities obtained with national reference populations alone, when they were averaged over countries and the traits evaluated. For different traits and countries, the increase in reliability ranged from 2% to 19%. Genomic selection programs benefit greatly from combining data from several closely related populations into a single large reference population.

209 citations

Journal ArticleDOI
R.F. Brøndum1, Bernt Guldbrandtsen1, Goutam Sahana1, M.S. Lund1, Guosheng Su1 
TL;DR: Using BEAGLE for pre-phasing and IMPUTE2 for imputation is a fast and accurate strategy to increase the size of the reference data and in turn the accuracy of imputation when only few animals are available.
Abstract: The advent of low cost next generation sequencing has made it possible to sequence a large number of dairy and beef bulls which can be used as a reference for imputation of whole genome sequence data. The aim of this study was to investigate the accuracy and speed of imputation from a high density SNP marker panel to whole genome sequence level. Data contained 132 Holstein, 42 Jersey, 52 Nordic Red and 16 Brown Swiss bulls with whole genome sequence data; 16 Holstein, 27 Jersey and 29 Nordic Reds had previously been typed with the bovine high density SNP panel and were used for validation. We investigated the effect of enlarging the reference population by combining data across breeds on the accuracy of imputation, and the accuracy and speed of both IMPUTE2 and BEAGLE using either genotype probability reference data or pre-phased reference data. All analyses were done on Bovine autosome 29 using 387,436 bi-allelic variants and 13,612 SNP markers from the bovine HD panel. A combined breed reference population led to higher imputation accuracies than did a single breed reference. The highest accuracy of imputation for all three test breeds was achieved when using BEAGLE with un-phased reference data (mean genotype correlations of 0.90, 0.89 and 0.87 for Holstein, Jersey and Nordic Red respectively) but IMPUTE2 with un-phased reference data gave similar accuracies for Holsteins and Nordic Red. Pre-phasing the reference data only lead to a minor decrease in the imputation accuracy, but gave a large improvement in computation time. Pre-phasing with BEAGLE was substantially faster than pre-phasing with SHAPEIT2 (2.5 hours vs. 52 hours for 242 individuals), and imputation with pre-phased data was faster in IMPUTE2 than in BEAGLE (5 minutes vs. 50 minutes per individual). Combining reference populations across breeds is a good option to increase the size of the reference data and in turn the accuracy of imputation when only few animals are available. Pre-phasing the reference data only slightly decreases the accuracy but gives substantial improvements in speed. Using BEAGLE for pre-phasing and IMPUTE2 for imputation is a fast and accurate strategy.

100 citations

Journal ArticleDOI
TL;DR: The identification of fertility trait-associated SNPs and mapping of the corresponding QTL in small chromosomal regions reported here will facilitate searches for candidate genes and candidate polymorphisms.
Abstract: A genome-wide association study was conducted using a mixed model analysis for QTL for fertility traits in Danish and Swedish Holstein cattle. The analysis incorporated 2,531 progeny tested bulls, and a total of 36,387 SNP markers on 29 bovine autosomes were used. Eleven fertility traits were analyzed for SNP association. Furthermore, mixed model analysis was used for association analyses where a polygenic effect was fitted as a random effect, and genotypes at single SNPs were successively included as a fixed effect in the model. The Bonferroni correction for multiple testing was applied to adjust the significance threshold. Seventy-four SNP-trait combinations showed chromosome-wide significance, and five of these were significant genome-wide. Twenty-four QTL regions on 14 chromosomes were detected. Strong evidence for the presence of QTL that affect fertility traits were observed on chromosomes 3, 5, 10, 13, 19, 20, and 24. The QTL intervals were generally smaller than those described in earlier linkage studies. The identification of fertility trait-associated SNPs and mapping of the corresponding QTL in small chromosomal regions reported here will facilitate searches for candidate genes and candidate polymorphisms.

89 citations

Journal ArticleDOI
TL;DR: The results indicate that Beagle and IMPUTE2 provide the most robust and accurate imputation accuracies, but considering computing time and memory usage, FImpute is another alternative method.

86 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: The Ensembl Variant Effect Predictor can simplify and accelerate variant interpretation in a wide range of study designs.
Abstract: The Ensembl Variant Effect Predictor is a powerful toolset for the analysis, annotation, and prioritization of genomic variants in coding and non-coding regions. It provides access to an extensive collection of genomic annotation, with a variety of interfaces to suit different requirements, and simple options for configuring and extending analysis. It is open source, free to use, and supports full reproducibility of results. The Ensembl Variant Effect Predictor can simplify and accelerate variant interpretation in a wide range of study designs.

4,658 citations

Journal ArticleDOI
TL;DR: The proposed method efficiently makes use of information from close and distant relatives for accurate genotype imputation and is fast, owing to its deterministic nature and, therefore, it can easily be used in large data sets where the use of other methods is impractical.
Abstract: Genotype imputation can help reduce genotyping costs particularly for implementation of genomic selection In applications entailing large populations, recovering the genotypes of untyped loci using information from reference individuals that were genotyped with a higher density panel is computationally challenging Popular imputation methods are based upon the Hidden Markov model and have computational constraints due to an intensive sampling process A fast, deterministic approach, which makes use of both family and population information, is presented here All individuals are related and, therefore, share haplotypes which may differ in length and frequency based on their relationships The method starts with family imputation if pedigree information is available, and then exploits close relationships by searching for long haplotype matches in the reference group using overlapping sliding windows The search continues as the window size is shrunk in each chromosome sweep in order to capture more distant relationships The proposed method gave higher or similar imputation accuracy than Beagle and Impute2 in cattle data sets when all available information was used When close relatives of target individuals were present in the reference group, the method resulted in higher accuracy compared to the other two methods even when the pedigree was not used Rare variants were also imputed with higher accuracy Finally, computing requirements were considerably lower than those of Beagle and Impute2 The presented method took 28 minutes to impute from 6 k to 50 k genotypes for 2,000 individuals with a reference size of 64,429 individuals The proposed method efficiently makes use of information from close and distant relatives for accurate genotype imputation In addition to its high imputation accuracy, the method is fast, owing to its deterministic nature and, therefore, it can easily be used in large data sets where the use of other methods is impractical

766 citations

Journal ArticleDOI
TL;DR: This Review demonstrates the breadth of questions that are being addressed by Pool-seq but also discusses its limitations and provides guidelines for users.
Abstract: The analysis of polymorphism data is becoming increasingly important as a complementary tool to classical genetic analyses. Nevertheless, despite plunging sequencing costs, genomic sequencing of individuals at the population scale is still restricted to a few model species. Whole-genome sequencing of pools of individuals (Pool-seq) provides a cost-effective alternative to sequencing individuals separately. With the availability of custom-tailored software tools, Pool-seq is being increasingly used for population genomic research on both model and non-model organisms. In this Review, we not only demonstrate the breadth of questions that are being addressed by Pool-seq but also discuss its limitations and provide guidelines for users.

642 citations

Journal ArticleDOI
01 Feb 2013-Genetics
TL;DR: Simulation procedures, validation and reporting of results, and apply benchmark procedures for a variety of genomic prediction methods in simulated and real example data are reviewed, concluding that no single method can serve as a benchmark for genomic prediction.
Abstract: The genomic prediction of phenotypes and breeding values in animals and plants has developed rapidly into its own research field. Results of genomic prediction studies are often difficult to compare because data simulation varies, real or simulated data are not fully described, and not all relevant results are reported. In addition, some new methods have been compared only in limited genetic architectures, leading to potentially misleading conclusions. In this article we review simulation procedures, discuss validation and reporting of results, and apply benchmark procedures for a variety of genomic prediction methods in simulated and real example data. Plant and animal breeding programs are being transformed by the use of genomic data, which are becoming widely available and cost-effective to predict genetic merit. A large number of genomic prediction studies have been published using both simulated and real data. The relative novelty of this area of research has made the development of scientific conventions difficult with regard to description of the real data, simulation of genomes, validation and reporting of results, and forward in time methods. In this review article we discuss the generation of simulated genotype and phenotype data, using approaches such as the coalescent and forward in time simulation. We outline ways to validate simulated data and genomic prediction results, including cross-validation. The accuracy and bias of genomic prediction are highlighted as performance indicators that should be reported. We suggest that a measure of relatedness between the reference and validation individuals be reported, as its impact on the accuracy of genomic prediction is substantial. A large number of methods were compared in example simulated and real (pine and wheat) data sets, all of which are publicly available. In our limited simulations, most methods performed similarly in traits with a large number of quantitative trait loci (QTL), whereas in traits with fewer QTL variable selection did have some advantages. In the real data sets examined here all methods had very similar accuracies. We conclude that no single method can serve as a benchmark for genomic prediction. We recommend comparing accuracy and bias of new methods to results from genomic best linear prediction and a variable selection approach (e.g., BayesB), because, together, these methods are appropriate for a range of genetic architectures. An accompanying article in this issue provides a comprehensive review of genomic prediction methods and discusses a selection of topics related to application of genomic prediction in plants and animals.

358 citations

Journal ArticleDOI
TL;DR: A genome-wide analysis of predicted transmitting ability (PTA) of 31 production, health, reproduction and body conformation traits in contemporary Holstein cows provides useful information for annotating phenotypic effects on the dairy genome and for building consensus of dairy QTL effects.
Abstract: Genome-wide association analysis is a powerful tool for annotating phenotypic effects on the genome and knowledge of genes and chromosomal regions associated with dairy phenotypes is useful for genome and gene-based selection. Here, we report results of a genome-wide analysis of predicted transmitting ability (PTA) of 31 production, health, reproduction and body conformation traits in contemporary Holstein cows.

331 citations