scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Development and Characterization of a High Density SNP Genotyping Assay for Cattle

TL;DR: The BovineSNP50 assay as mentioned in this paper is a custom genotyping assay for cattle that interrogates 54,001 SNP loci to support genome-wide association (GWA) applications in cattle.
Abstract: The success of genome-wide association (GWA) studies for the detection of sequence variation affecting complex traits in human has spurred interest in the use of large-scale high-density single nucleotide polymorphism (SNP) genotyping for the identification of quantitative trait loci (QTL) and for marker-assisted selection in model and agricultural species. A cost-effective and efficient approach for the development of a custom genotyping assay interrogating 54,001 SNP loci to support GWA applications in cattle is described. A novel algorithm for achieving a compressed inter-marker interval distribution proved remarkably successful, with median interval of 37 kb and maximum predicted gap of <350 kb. The assay was tested on a panel of 576 animals from 21 cattle breeds and six outgroup species and revealed that from 39,765 to 46,492 SNP are polymorphic within individual breeds (average minor allele frequency (MAF) ranging from 0.24 to 0.27). The assay also identified 79 putative copy number variants in cattle. Utility for GWA was demonstrated by localizing known variation for coat color and the presence/absence of horns to their correct genomic locations. The combination of SNP selection and the novel spacing algorithm allows an efficient approach for the development of high-density genotyping platforms in species having full or even moderate quality draft sequence. Aspects of the approach can be exploited in species which lack an available genome sequence. The BovineSNP50 assay described here is commercially available from Illumina and provides a robust platform for mapping disease genes and QTL in cattle.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: The proposed method efficiently makes use of information from close and distant relatives for accurate genotype imputation and is fast, owing to its deterministic nature and, therefore, it can easily be used in large data sets where the use of other methods is impractical.
Abstract: Genotype imputation can help reduce genotyping costs particularly for implementation of genomic selection In applications entailing large populations, recovering the genotypes of untyped loci using information from reference individuals that were genotyped with a higher density panel is computationally challenging Popular imputation methods are based upon the Hidden Markov model and have computational constraints due to an intensive sampling process A fast, deterministic approach, which makes use of both family and population information, is presented here All individuals are related and, therefore, share haplotypes which may differ in length and frequency based on their relationships The method starts with family imputation if pedigree information is available, and then exploits close relationships by searching for long haplotype matches in the reference group using overlapping sliding windows The search continues as the window size is shrunk in each chromosome sweep in order to capture more distant relationships The proposed method gave higher or similar imputation accuracy than Beagle and Impute2 in cattle data sets when all available information was used When close relatives of target individuals were present in the reference group, the method resulted in higher accuracy compared to the other two methods even when the pedigree was not used Rare variants were also imputed with higher accuracy Finally, computing requirements were considerably lower than those of Beagle and Impute2 The presented method took 28 minutes to impute from 6 k to 50 k genotypes for 2,000 individuals with a reference size of 64,429 individuals The proposed method efficiently makes use of information from close and distant relatives for accurate genotype imputation In addition to its high imputation accuracy, the method is fast, owing to its deterministic nature and, therefore, it can easily be used in large data sets where the use of other methods is impractical

766 citations

Journal ArticleDOI
05 Aug 2009-PLOS ONE
TL;DR: The results of this study indicate the utility of using next generation sequencing technologies to identify large numbers of reliable SNPs and demonstrate that the PorcineSNP60 Beadchip is an excellent tool that will likely be used in a variety of future studies in pigs.
Abstract: Background: The dissection of complex traits of economic importance to the pig industry requires the availability of a significant number of genetic markers, such as single nucleotide polymorphisms (SNPs). This study was conducted to discover several hundreds of thousands of porcine SNPs using next generation sequencing technologies and use these SNPs, as well as others from different public sources, to design a high-density SNP genotyping assay. Methodology/Principal Findings: A total of 19 reduced representation libraries derived from four swine breeds (Duroc, Landrace, Large White, Pietrain) and a Wild Boar population and three restriction enzymes (AluI, HaeIII and MspI) were sequenced using Illumina’s Genome Analyzer (GA). The SNP discovery effort resulted in the de novo identification of over 372K SNPs. More than 549K SNPs were used to design the Illumina Porcine 60K+SNP iSelect Beadchip, now commercially available as the PorcineSNP60. A total of 64,232 SNPs were included on the Beadchip. Results from genotyping the 158 individuals used for sequencing showed a high overall SNP call rate (97.5%). Of the 62,621 loci that could be reliably scored, 58,994 were polymorphic yielding a SNP conversion success rate of 94%. The average minor allele frequency (MAF) for all scorable SNPs was 0.274. Conclusions/Significance: Overall, the results of this study indicate the utility of using next generation sequencing technologies to identify large numbers of reliable SNPs. In addition, the validation of the PorcineSNP60 Beadchip demonstrated that the assay is an excellent tool that will likely be used in a variety of future studies in pigs.

751 citations

Journal ArticleDOI
08 Dec 2011-PLOS ONE
TL;DR: A large maize SNP array taken from more than 800,000 SNPs was established and its use for diversity analysis and high density linkage mapping and independent validation of the B73 sequence assembly was reported.
Abstract: SNP genotyping arrays have been useful for many applications that require a large number of molecular markers such as high-density genetic mapping, genome-wide association studies (GWAS), and genomic selection. We report the establishment of a large maize SNP array and its use for diversity analysis and high density linkage mapping. The markers, taken from more than 800,000 SNPs, were selected to be preferentially located in genes and evenly distributed across the genome. The array was tested with a set of maize germplasm including North American and European inbred lines, parent/F1 combinations, and distantly related teosinte material. A total of 49,585 markers, including 33,417 within 17,520 different genes and 16,168 outside genes, were of good quality for genotyping, with an average failure rate of 4% and rates up to 8% in specific germplasm. To demonstrate this array's use in genetic mapping and for the independent validation of the B73 sequence assembly, two intermated maize recombinant inbred line populations – IBM (B73×Mo17) and LHRF (F2×F252) – were genotyped to establish two high density linkage maps with 20,913 and 14,524 markers respectively. 172 mapped markers were absent in the current B73 assembly and their placement can be used for future improvements of the B73 reference sequence. Colinearity of the genetic and physical maps was mostly conserved with some exceptions that suggest errors in the B73 assembly. Five major regions containing non-colinearities were identified on chromosomes 2, 3, 6, 7 and 9, and are supported by both independent genetic maps. Four additional non-colinear regions were found on the LHRF map only; they may be due to a lower density of IBM markers in those regions or to true structural rearrangements between lines. Given the array's high quality, it will be a valuable resource for maize genetics and many aspects of maize breeding.

565 citations

Journal ArticleDOI
TL;DR: This work assessed the gain in accuracy of GEBV in Jersey cattle as a result of using a combined Holstein and Jersey reference population, with either 39,745 or 624,213 single nucleotide polymorphism (SNP) markers.

551 citations

Journal ArticleDOI
TL;DR: This assembly represents a ∼400-fold improvement in continuity due to properly assembled gaps, compared to the previously published C. hircus assembly, and better resolves repetitive structures longer than 1 kb, representing the largest repeat family and immune gene complex yet produced for an individual of a ruminant species.
Abstract: Adam Phillippy, Curtis Van Tassell, Timothy Smith and colleagues present a new reference genome assembly for the domestic goat using a pipeline that improves contiguity of the assembly by more than 250-fold. The pipeline uses a combination of short- and long-read sequencing, optical mapping, and chromatin interaction mapping.

512 citations

References
More filters
Journal ArticleDOI
TL;DR: This work introduces PLINK, an open-source C/C++ WGAS tool set, and describes the five main domains of function: data management, summary statistics, population stratification, association analysis, and identity-by-descent estimation, which focuses on the estimation and use of identity- by-state and identity/descent information in the context of population-based whole-genome studies.
Abstract: Whole-genome association studies (WGAS) bring new computational, as well as analytic, challenges to researchers. Many existing genetic-analysis tools are not designed to handle such large data sets in a convenient manner and do not necessarily exploit the new opportunities that whole-genome data bring. To address these issues, we developed PLINK, an open-source C/C++ WGAS tool set. With PLINK, large data sets comprising hundreds of thousands of markers genotyped for thousands of individuals can be rapidly manipulated and analyzed in their entirety. As well as providing tools to make the basic analytic steps computationally efficient, PLINK also supports some novel approaches to whole-genome data that take advantage of whole-genome coverage. We introduce PLINK and describe the five main domains of function: data management, summary statistics, population stratification, association analysis, and identity-by-descent estimation. In particular, we focus on the estimation and use of identity-by-state and identity-by-descent information in the context of population-based whole-genome studies. This information can be used to detect and correct for population stratification and to identify extended chromosomal segments that are shared identical by descent between very distantly related individuals. Analysis of the patterns of segmental sharing has the potential to map disease loci that contain multiple rare variants in a population-based linkage analysis.

26,280 citations

Journal ArticleDOI
01 Apr 2001-Genetics
TL;DR: It was concluded that selection on genetic values predicted from markers could substantially increase the rate of genetic gain in animals and plants, especially if combined with reproductive techniques to shorten the generation interval.
Abstract: Recent advances in molecular genetic techniques will make dense marker maps available and genotyping many individuals for these markers feasible. Here we attempted to estimate the effects of ∼50,000 marker haplotypes simultaneously from a limited number of phenotypic records. A genome of 1000 cM was simulated with a marker spacing of 1 cM. The markers surrounding every 1-cM region were combined into marker haplotypes. Due to finite population size (Ne = 100), the marker haplotypes were in linkage disequilibrium with the QTL located between the markers. Using least squares, all haplotype effects could not be estimated simultaneously. When only the biggest effects were included, they were overestimated and the accuracy of predicting genetic values of the offspring of the recorded animals was only 0.32. Best linear unbiased prediction of haplotype effects assumed equal variances associated to each 1-cM chromosomal segment, which yielded an accuracy of 0.73, although this assumption was far from true. Bayesian methods that assumed a prior distribution of the variance associated with each chromosome segment increased this accuracy to 0.85, even when the prior was not correct. It was concluded that selection on genetic values predicted from markers could substantially increase the rate of genetic gain in animals and plants, especially if combined with reproductive techniques to shorten the generation interval.

6,036 citations

Journal ArticleDOI
TL;DR: This review describes recent empirical and theoretical work on the extent of linkage disequilibrium in the human genome, comparing the predictions of simple population-genetic models to available data and some implications that the emerging patterns of LD have for association-mapping strategies.
Abstract: In this review, we describe recent empirical and theoretical work on the extent of linkage disequilibrium (LD) in the human genome, comparing the predictions of simple population-genetic models to available data. Several studies report significant LD over distances longer than those predicted by standard models, whereas some data from short, intergenic regions show less LD than would be expected. The apparent discrepancies between theory and data present a challenge—both to modelers and to human geneticists—to identify which important features are missing from our understanding of the biological processes that give rise to LD. Salient features may include demographic complications such as recent admixture, as well as genetic factors such as local variation in recombination rates, gene conversion, and the potential segregation of inversions. We also outline some implications that the emerging patterns of LD have for association-mapping strategies. In particular, we discuss what marker densities might be necessary for genomewide association scans.

1,280 citations

Journal ArticleDOI
Richard A. Gibbs1, Jeremy F. Taylor2, Curtis P. Van Tassell3, William Barendse4, William Barendse5, Kellye Eversole, Clare A. Gill6, Ronnie D. Green3, Debora L. Hamernik3, Steven M. Kappes3, Sigbjørn Lien7, Lakshmi K. Matukumalli3, Lakshmi K. Matukumalli8, John C. McEwan9, Lynne V. Nazareth1, Robert D. Schnabel2, George M. Weinstock1, David A. Wheeler1, Paolo Ajmone-Marsan10, Paul Boettcher11, Alexandre Rodrigues Caetano12, José Fernando Garcia13, José Fernando Garcia11, Olivier Hanotte14, Paola Mariani15, Loren C. Skow6, Tad S. Sonstegard3, John L. Williams16, John L. Williams15, Boubacar Diallo, Lemecha Hailemariam17, Mário Luiz Martinez12, C. A. Morris9, Luiz Otávio Campos da Silva12, Richard J. Spelman18, Woudyalew Mulatu14, Keyan Zhao19, Colette A. Abbey6, Morris Agaba14, Flábio R. Araújo12, Rowan J. Bunch4, Rowan J. Bunch5, James O. Burton16, C. Gorni15, Hanotte Olivier15, Blair E. Harrison4, Blair E. Harrison5, Bill Luff, Marco Antonio Machado12, Joel Mwakaya14, Graham Plastow20, Warren Sim5, Warren Sim4, Timothy P. L. Smith3, Merle B Thomas4, Merle B Thomas5, Alessio Valentini21, Paul D. Williams5, James E. Womack6, John Woolliams16, Yue Liu1, Xiang Qin1, Kim C. Worley1, Chuan Gao6, Huaiyang Jiang1, Stephen S. Moore20, Yanru Ren1, Xingzhi Song1, Carlos Bustamante19, Ryan D. Hernandez19, Donna M. Muzny1, Shobha Patil1, Anthony San Lucas1, Qing Fu1, Matthew Peter Kent7, Richard Vega1, Aruna Matukumalli3, Sean McWilliam4, Sean McWilliam5, Gert Sclep15, Katarzyna Bryc19, Jung-Woo Choi6, Hong Gao19, John J. Grefenstette8, Brenda M. Murdoch20, Alessandra Stella15, Rafael Villa-Angulo8, Mark G. Wright19, Jan Aerts16, Jan Aerts22, Oliver C. Jann16, Riccardo Negrini10, Michael E. Goddard23, Michael E. Goddard24, Ben J. Hayes23, Daniel G. Bradley25, Marcos V.B. da Silva12, Marcos V.B. da Silva3, Lilian P.L. Lau25, George E. Liu3, David J. Lynn26, David J. Lynn25, Francesca Panzitta15, Ken G. Dodds9 
24 Apr 2009-Science
TL;DR: Data show that cattle have undergone a rapid recent decrease in effective population size from a very large ancestral population, possibly due to bottlenecks associated with domestication, selection, and breed formation.
Abstract: The imprints of domestication and breed development on the genomes of livestock likely differ from those of companion animals. A deep draft sequence assembly of shotgun reads from a single Hereford female and comparative sequences sampled from six additional breeds were used to develop probes to interrogate 37,470 single-nucleotide polymorphisms (SNPs) in 497 cattle from 19 geographically and biologically diverse breeds. These data show that cattle have undergone a rapid recent decrease in effective population size from a very large ancestral population, possibly due to bottlenecks associated with domestication, selection, and breed formation. Domestication and artificial selection appear to have left detectable signatures of selection within the cattle genome, yet the current levels of diversity within breeds are at least as great as exists within humans.

769 citations

Journal ArticleDOI
28 Sep 2000-Nature
TL;DR: A simple but powerful method, called reduced representation shotgun (RRS) sequencing, for creating SNP maps, which facilitates the rapid, inexpensive construction of SNP maps in biomedically and agriculturally important species.
Abstract: Most genomic variation is attributable to single nucleotide polymorphisms (SNPs), which therefore offer the highest resolution for tracking disease genes and population history. It has been proposed that a dense map of 30,000-500,000 SNPs can be used to scan the human genome for haplotypes associated with common diseases. Here we describe a simple but powerful method, called reduced representation shotgun (RRS) sequencing, for creating SNP maps. RRS re-samples specific subsets of the genome from several individuals, and compares the resulting sequences using a highly accurate SNP detection algorithm. The method can be extended by alignment to available genome sequence, increasing the yield of SNPs and providing map positions. These methods are being used by The SNP Consortium, an international collaboration of academic centres, pharmaceutical companies and a private foundation, to discover and release at least 300,000 human SNPs. We have discovered 47,172 human SNPs by RRS, and in total the Consortium has identified 148,459 SNPs. More broadly, RRS facilitates the rapid, inexpensive construction of SNP maps in biomedically and agriculturally important species. SNPs discovered by RRS also offer unique advantages for large-scale genotyping.

749 citations