Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program
read more
Citations
Genomewide Association Study of Severe Covid-19 with Respiratory Failure.
Benefits and limitations of genome-wide association studies.
Genetic mechanisms of critical illness in Covid-19.
Comparative genetic analysis of the novel coronavirus (2019-nCoV/SARS-CoV-2) receptor ACE2 in different populations
New insights into the genetic etiology of Alzheimer’s disease and related dementias
References
Gene Ontology: tool for the unification of biology
The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data
A global reference for human genetic variation.
Analysis of protein-coding genetic variation in 60,706 humans
Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM
Related Papers (5)
A global reference for human genetic variation.
The mutational constraint spectrum quantified from variation in 141,456 humans
PLINK: A Tool Set for Whole-Genome Association and Population-Based Linkage Analyses
Frequently Asked Questions (15)
Q2. What is the significance of the clustering of SNVs?
Insights into mutation processes A hallmark of human genetic variation is that SNVs tend to cluster together throughout the genome3,28.
Q3. How many simulated singletons were less than 100 bp apart?
In coalescent simulations (see Methods), only 0.16% of the simulated singletons within an individual were less than 100 bp apart (Supplementary Figs. 19, 20).
Q4. What are the main resources of the TOPMed program?
Members of the broader scientific community are using TOPMed resources through the WGS and phenotype data available on dbGaP, the BRAVO variant server and the imputation reference panel on the TOPMed imputation server.
Q5. How can the authors enhance the analysis of any genotyped samples?
In addition to enabling detailed analysis of TOPMed sequenced samples, TOPMed can enhance the analysis of any genotyped samples72.
Q6. What are the main uses of TOPMed data?
In addition to these uses, the authors expect that TOPMed data will improve nearly all ongoing studies of common and rare disorders by providing both a deep catalogue of variation in healthy individuals and an imputation resource that enables array-based studies to achieve a completeness that was previously attainable only through direct sequencing.
Q7. How many variants were identified in the sample?
Sequence analysis identified 410,323,831 genetic variants (381,343,078 SNVs and 28,980,753 indels), corresponding to an average of one variant per 7 bp (Extended Data Table 4).
Q8. How many rare variants can be recovered from TOPMed?
This means that 89% of the approximately 80,000 rare variants with MAF < 0.5% in an average genome of African ancestry can be recovered through genotype imputation using the TOPMed panel.
Q9. What are the common groups of African American and Caribbean populations?
As expected, African American and Caribbean population groups have the greatest heterozygosity7,47, followed by Hispanic/Latino, European American, Amish, East Asian and Samoan groups.
Q10. How many previously undescribed variants were detected?
Additional previously undescribed variants were detected through assembly of unmapped reads and customized analysis in highly variable loci.
Q11. How many SNVs were identified in a subset of 3,000 individuals?
To dissect the spatial clustering of SNVs, the authors analysed a collection of 50,264,223 singleton SNVs ascertained in a subset of 3,000 unrelated individuals selected to have low levels of genetically estimated admixture—1,000 each of African, East Asian and European ancestry32 (see Methods).
Q12. What is the way to identify and classify haplotypes in the genome?
A complementary approach to de novo genome assembly is to develop approaches that combine multiple types of information—including previously observed haplotype variation, SNVs, indels, copy number and homology information—to identify and classify haplotypes in interesting regions of the genome.
Q13. How many pLOF variants were identified in each genome?
The authors identified more pLOF variants per individual than in previous surveys based on exome sequencing—an increase that was mainly driven by the identification of additional frameshift variants (Supplementary Table 6) and by a more uniform and complete coverage of protein-coding regions (Supplementary Figs. 13, 14).
Q14. How many variants were not described in dbSNP build 149?
78.7% of these variants had not been described in dbSNP build 149; TOPMed variants now account for the majority of variants in dbSNP.
Q15. What was the mean alternative allele concordance for the two variants?
the authors compared genotypes for samples sequenced in duplicate (the mean alternative allele concordance was 0.9995 for single-nucleotide variants (SNVs) and 0.9930 for insertions or deletions (indels)).