scispace - formally typeset
Journal ArticleDOI

Expectations and blind spots for structural variation detection from long-read assemblies and short-read genome sequencing technologies.

Reads0
Chats0
TLDR
These analyses highlight the considerable added value of assembly-based lrWGS to create new catalogs of insertions and transposable elements, as well as disease-associated repeat expansions in genomic sequences that were previously recalcitrant to routine assessment.
Abstract
Virtually all genome sequencing efforts in national biobanks, complex and Mendelian disease programs, and medical genetic initiatives are reliant upon short-read whole-genome sequencing (srWGS), which presents challenges for the detection of structural variants (SVs) relative to emerging long-read WGS (lrWGS) technologies Given this ubiquity of srWGS in large-scale genomics initiatives, we sought to establish expectations for routine SV detection from this data type by comparison with lrWGS assembly, as well as to quantify the genomic properties and added value of SVs uniquely accessible to each technology Analyses from the Human Genome Structural Variation Consortium (HGSVC) of three families captured ~11,000 SVs per genome from srWGS and ~25,000 SVs per genome from lrWGS assembly Detection power and precision for SV discovery varied dramatically by genomic context and variant class: 97% of the current GRCh38 reference is defined by segmental duplication (SD) and simple repeat (SR), yet 914% of deletions that were specifically discovered by lrWGS localized to these regions Across the remaining 903% of reference sequence, we observed extremely high (938%) concordance between technologies for deletions in these datasets In contrast, lrWGS was superior for detection of insertions across all genomic contexts Given that non-SD/SR sequences encompass 959% of currently annotated disease-associated exons, improved sensitivity from lrWGS to discover novel pathogenic deletions in these currently interpretable genomic regions is likely to be incremental However, these analyses highlight the considerable added value of assembly-based lrWGS to create new catalogs of insertions and transposable elements, as well as disease-associated repeat expansions in genomic sequences that were previously recalcitrant to routine assessment

read more

Citations
More filters
Journal ArticleDOI

High-coverage whole-genome sequencing of the expanded 1000 Genomes Project cohort including 602 trios

TL;DR: In this paper , a high-coverage 3,202-sample WGS 1kGP resource was presented, which now includes 602 complete trios, sequenced to a depth of 30X using Illumina.
Journal ArticleDOI

Rare coding variation provides insight into the genetic architecture and phenotypic context of autism

TL;DR: The authors explored the genes disrupted by these variants from joint analysis of protein-truncating variants (PTVs), missense variants and copy number variants (CNVs) in a cohort of 63,237 individuals.
Journal ArticleDOI

Pangenome-based genome inference allows efficient and accurate genotyping across a wide spectrum of variant classes

TL;DR: PanGenie as discussed by the authors uses a haplotype-resolved pangenome reference together with k-mer counts from short-read sequencing data to genotype a wide spectrum of genetic variation.
Journal ArticleDOI

Pangenome-based genome inference allows efficient and accurate genotyping across a wide spectrum of variant classes

TL;DR: PanGenie as discussed by the authors uses a haplotype-resolved pangenome reference together with k-mer counts from short-read sequencing data to genotype a wide spectrum of genetic variation.
References
More filters
Journal ArticleDOI

Fast and accurate short read alignment with Burrows–Wheeler transform

TL;DR: Burrows-Wheeler Alignment tool (BWA) is implemented, a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps.
Journal ArticleDOI

The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data

TL;DR: The GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.
Journal ArticleDOI

The Human Genome Browser at UCSC

TL;DR: A mature web tool for rapid and reliable display of any requested portion of the genome at any scale, together with several dozen aligned annotation tracks, is provided at http://genome.ucsc.edu.
Journal ArticleDOI

Analysis of protein-coding genetic variation in 60,706 humans

Monkol Lek, +106 more
- 18 Aug 2016 - 
TL;DR: The aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries generated as part of the Exome Aggregation Consortium (ExAC) provides direct evidence for the presence of widespread mutational recurrence.
Journal ArticleDOI

A Map of Human Genome Variation From Population-Scale Sequencing

TL;DR: The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation as a foundation for investigating the relationship between genotype and phenotype as mentioned in this paper, and the results of the pilot phase of the project, designed to develop and compare different strategies for genomewide sequencing with high-throughput platforms.
Related Papers (5)
Trending Questions (1)
Is short read sequencing imposing the impressions that SNV and short indels to be the dominant pathogenic burden?

The answer to the query is not mentioned in the paper. The paper discusses the challenges and differences in detecting structural variants (SVs) using short-read whole-genome sequencing (srWGS) and long-read whole-genome sequencing (lrWGS) technologies. It does not specifically address the dominance of SNVs and short indels as the pathogenic burden.