An open resource of structural variation for medical and population genetics
read more
Citations
Variation across 141,456 human exomes and genomes reveals the spectrum of loss-of-function intolerance across human protein-coding genes
The genetic architecture of type 2 diabetes
Structural variation in the sequencing era.
A robust benchmark for detection of germline large deletions and insertions.
Initial whole-genome sequencing and analysis of the host genetic contribution to COVID-19 severity and susceptibility
References
The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data
A framework for variation discovery and genotyping using next-generation DNA sequencing data
Analysis of protein-coding genetic variation in 60,706 humans
UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age
The UK Biobank resource with deep phenotyping and genomic data
Related Papers (5)
Analysis of protein-coding genetic variation in 60,706 humans
An integrated map of structural variation in 2,504 human genomes
Variation across 141,456 human exomes and genomes reveals the spectrum of loss-of-function intolerance across human protein-coding genes
A global reference for human genetic variation.
Frequently Asked Questions (10)
Q2. What is the need for a reference for SVs across global populations?
As short-read WGS is rapidly becoming the predominant technology in large-scale human disease studies, and will probably displace conventional methods for diagnostic screening, there is a mounting need for comparable references of SVs across global populations.
Q3. How many SVs are available to short-read WGS?
92.7% of all known autosomal protein-coding nucleotides are not localized to simple- or low-copy repeats, and therefore the authors expect that the catalogues of SVs accessible to short-read WGS across large populations like gnomAD-SV will capture a majority of the most interpretable gene-disrupting SVs in humans.
Q4. What are the reasons why SVs have remained elusive?
Mutation rate estimates for SVs have remained elusive owing to limited sample sizes, poor resolution of conventional technologies, technical challenges of SV discovery, and use of cell line-derived DNA in population studies1,25.
Q5. What is the significance of SVs in population studies?
Owing to their size and mutational diversity, SVs can have varied consequences on protein-coding genes12 (Fig. 4a, Supplementary Fig. 17).
Q6. How many SVs were retained for subsequent analyses?
After excluding low-quality SVs, which were predominantly (61.6%) composed of incompletely resolved breakpoint junctions (that is, ‘breakends’) that lack interpretable alternative allele structures for functional annotation and produce high false-discovery rates20 (Extended Data Fig. 2a), the authors retained 335,470 high-quality SVs for subsequent analyses (Supplementary Table 3).
Q7. What is the mutational diversity of gnomAD-SV?
The mutational diversity of gnomAD-SV was extensive: the authors completely resolved 5,295 complex SVs across 11 mutational subclasses, of which 3,901 (73.7%) involved inverted segments (Fig. 2), confirming that inversion variation is predominantly composed of complex SVs rather than canonical inversions1,24.
Q8. How many SVs were found in the 1000 Genomes Project?
This final set of high-quality SVs corresponded to a median of 7,439 SVs per genome, or more than twice the number of variants per genome captured by previous WGS-based SV studies such as the 1000 Genomes Project (3,441 SVs per genome from approximately 7× coverage WGS), which underscores the benefits of high-coverage WGS and improved multi-algorithm ensemble methods for SV discovery.
Q9. What is the way to determine if a SV is a pathogenic ?
0.32% of samples carried a very rare (allele frequency < 0.1%) SV resulting in pLoF of a gene for which incidental findings are clinically actionable, nearly half of which (that is, 0.13% of all samples) would meet diagnostic criteria as pathogenic or likely pathogenic based upon the American College of Medical Genetics (ACMG) recommendations7 (Fig. 6c).
Q10. What are the main goals of gnomAD-SV?
Although these data remain insufficient to derive accurate estimates of gene-level constraint, sequence-specific mutation rates, and intolerance to noncoding SVs, they provide a step towards these goals and reinforce the value of data sharing and harmonized analyses of aggregated genomic data sets.