Journal ArticleDOI
Expectations and blind spots for structural variation detection from long-read assemblies and short-read genome sequencing technologies.
Xuefang Zhao,Ryan L. Collins,Wan-Ping Lee,Alexandra M Weber,Yukyung Jun,Qihui Zhu,Ben Weisburd,Yongqing Huang,Peter A. Audano,Harold Z. Wang,Mark Walker,Chelsea Lowther,Jack Fu,Mark Gerstein,Scott E. Devine,Tobias Marschall,Jan O. Korbel,Evan E. Eichler,Mark Chaisson,Charles Lee,Ryan E. Mills,Harrison Brand,Michael E. Talkowski +22 more
Reads0
Chats0
TLDR
These analyses highlight the considerable added value of assembly-based lrWGS to create new catalogs of insertions and transposable elements, as well as disease-associated repeat expansions in genomic sequences that were previously recalcitrant to routine assessment.Abstract:
Virtually all genome sequencing efforts in national biobanks, complex and Mendelian disease programs, and medical genetic initiatives are reliant upon short-read whole-genome sequencing (srWGS), which presents challenges for the detection of structural variants (SVs) relative to emerging long-read WGS (lrWGS) technologies Given this ubiquity of srWGS in large-scale genomics initiatives, we sought to establish expectations for routine SV detection from this data type by comparison with lrWGS assembly, as well as to quantify the genomic properties and added value of SVs uniquely accessible to each technology Analyses from the Human Genome Structural Variation Consortium (HGSVC) of three families captured ~11,000 SVs per genome from srWGS and ~25,000 SVs per genome from lrWGS assembly Detection power and precision for SV discovery varied dramatically by genomic context and variant class: 97% of the current GRCh38 reference is defined by segmental duplication (SD) and simple repeat (SR), yet 914% of deletions that were specifically discovered by lrWGS localized to these regions Across the remaining 903% of reference sequence, we observed extremely high (938%) concordance between technologies for deletions in these datasets In contrast, lrWGS was superior for detection of insertions across all genomic contexts Given that non-SD/SR sequences encompass 959% of currently annotated disease-associated exons, improved sensitivity from lrWGS to discover novel pathogenic deletions in these currently interpretable genomic regions is likely to be incremental However, these analyses highlight the considerable added value of assembly-based lrWGS to create new catalogs of insertions and transposable elements, as well as disease-associated repeat expansions in genomic sequences that were previously recalcitrant to routine assessmentread more
Citations
More filters
Journal ArticleDOI
High-coverage whole-genome sequencing of the expanded 1000 Genomes Project cohort including 602 trios
TL;DR: In this paper , a high-coverage 3,202-sample WGS 1kGP resource was presented, which now includes 602 complete trios, sequenced to a depth of 30X using Illumina.
Journal ArticleDOI
Rare coding variation provides insight into the genetic architecture and phenotypic context of autism
Jack Fu,F. Kyle Satterstrom,Minshi Peng,Harrison Brand,Ryan L. Collins,Sha-Sha Dong,Brie Wamsley,Lambertus Klei,Lily Wang,Stephanie P Hao,Christine Stevens,Caroline N. Cusick,Mehrtash Babadi,Eric Banks,Brett Collins,Sheila Dodge,Stacey Gabriel,Laura D. Gauthier,Samuel K. Lee,Lindsay Liang,Alicia Ljungdahl,Behrang Mahjani,Laura G. Sloofman,Andrey Smirnov,Mafalda Barbosa,Catalina Betancur,Alfredo Brusco,Brian H.Y. Chung,Edwin H. Cook,Michael L. Cuccaro,Enrico Domenici,Giovanni Battista Ferrero,J. Jay Gargus,Gail E. Herman,Irva Hertz-Picciotto,Patrícia Maciel,Dara S. Manoach,Maria Rita Passos-Bueno,Antonio M. Persico,Alessandra Renieri,James S. Sutcliffe,Flora Tassone,Elisabetta Trabetti,Gabriele Campos,Simona Cardaropoli,Diana Carli,Marcus C. Y. Chan,Chiara Fallerini,Elisa Giorgio,Ana Cristina Girardi,Emily Hansen‐Kiss,So Lun Lee,Carla Lintas,Yunin Ludena,Rachel Nguyen,Lisa Pavinato,Margaret A. Pericak-Vance,Isaac N. Pessah,Rebecca J. Schmidt,Moyra Smith,Claudia Ismania Samogy Costa,Slavica Trajkova,Jaqueline Wang,Mullin H.C. Yu,David M. Cutler,Silvia De Rubeis,Joseph D. Buxbaum,Mark J. Daly,Bernie Devlin,Kathryn Roeder,Stephen Sanders,Michael E. Talkowski +71 more
TL;DR: The authors explored the genes disrupted by these variants from joint analysis of protein-truncating variants (PTVs), missense variants and copy number variants (CNVs) in a cohort of 63,237 individuals.
Journal ArticleDOI
A draft human pangenome reference
Wen-Wei Liao,Mobin Asri,Jana Ebler,Daniel Doerr,Marina Haukness,Glenn Hickey,Shuangjia Lu,Julian K. Lucas,Jean Marcel Maurice Monlong,Haley J. Abel,Silvia Buonaiuto,Xian Chang,Haoyu Cheng,Justin Jang Hann Chu,Vincenza Colonna,Jordan M. Eizenga,Xiaowen Feng,Christian Fischer,Robert S. Fulton,Shilpa Garg,Cristian Groza,Andrea Guarracino,William T. Harvey,Simon Heumos,Kerstin Howe,Miten Jain,Tsung-Yu Lu,Charles Markello,Fergal J. Martin,Matthew Mitchell,Katherine M. Munson,Moses N. Mwaniki,Adam M. Novak,Hugh E. Olsen,Trevor Pesout,David Porubsky,Pjotr Prins,Jonas Andreas Sibbesen,Chad Tomlinson,Flavia Villani,Mitchell R. Vollger,Guillaume Bourque,Mark Chaisson,Paul Flicek,Adam M. Phillippy,Justin M. Zook,Evan E. Eichler,David Haussler,Erich D. Jarvis,Karen H. Miga,Ting Wang,Erik Garrison,Tobias Marschall,Ira M. Hall,Heng Li,Benedict Paten +55 more
TL;DR: The pangenome reference as discussed by the authors contains 47 phased, diploid assemblies from a cohort of genetically diverse individuals and is more than 99% accurate at the structural and base pair levels.
Journal ArticleDOI
Pangenome-based genome inference allows efficient and accurate genotyping across a wide spectrum of variant classes
Jana Ebler,Peter J.R. Ebert,Wayne E. Clarke,Tobias Rausch,Peter A. Audano,Torsten Houwaart,Yafei Mao,Jan O. Korbel,Evan E. Eichler,Michael C. Zody,Alexander T. Dilthey,Tobias Marschall +11 more
TL;DR: PanGenie as discussed by the authors uses a haplotype-resolved pangenome reference together with k-mer counts from short-read sequencing data to genotype a wide spectrum of genetic variation.
Journal ArticleDOI
Pangenome-based genome inference allows efficient and accurate genotyping across a wide spectrum of variant classes
Jana Ebler,Peter J.R. Ebert,Wayne E. Clarke,Tobias Rausch,Peter A. Audano,Torsten Houwaart,Yafei Mao,Jan O. Korbel,Evan E. Eichler,Michael C. Zody,Alexander T. Dilthey,Tobias Marschall +11 more
TL;DR: PanGenie as discussed by the authors uses a haplotype-resolved pangenome reference together with k-mer counts from short-read sequencing data to genotype a wide spectrum of genetic variation.
References
More filters
Journal ArticleDOI
Fast and accurate short read alignment with Burrows–Wheeler transform
Heng Li,Richard Durbin +1 more
TL;DR: Burrows-Wheeler Alignment tool (BWA) is implemented, a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps.
Journal ArticleDOI
The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data
Aaron McKenna,Matthew Hanna,Eric Banks,Andrey Sivachenko,Kristian Cibulskis,Andrew Kernytsky,Kiran V. Garimella,David Altshuler,Stacey Gabriel,Mark J. Daly,Mark A. DePristo +10 more
TL;DR: The GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.
Journal ArticleDOI
The Human Genome Browser at UCSC
W. James Kent,Charles W. Sugnet,Terrence S. Furey,Krishna M. Roskin,Tom H. Pringle,Alan M. Zahler,and David Haussler +6 more
TL;DR: A mature web tool for rapid and reliable display of any requested portion of the genome at any scale, together with several dozen aligned annotation tracks, is provided at http://genome.ucsc.edu.
Journal ArticleDOI
Analysis of protein-coding genetic variation in 60,706 humans
Monkol Lek,Konrad J. Karczewski,Konrad J. Karczewski,Eric Vallabh Minikel,Eric Vallabh Minikel,Kaitlin E. Samocha,Eric Banks,Timothy Fennell,Anne H. O’Donnell-Luria,Anne H. O’Donnell-Luria,Anne H. O’Donnell-Luria,James S. Ware,Andrew J. Hill,Andrew J. Hill,Andrew J. Hill,Beryl B. Cummings,Beryl B. Cummings,Taru Tukiainen,Taru Tukiainen,Daniel P. Birnbaum,Jack A. Kosmicki,Laramie E. Duncan,Laramie E. Duncan,Karol Estrada,Karol Estrada,Fengmei Zhao,Fengmei Zhao,James Zou,Emma Pierce-Hoffman,Emma Pierce-Hoffman,Joanne Berghout,David Neil Cooper,Nicole A. Deflaux,Mark A. DePristo,Ron Do,Jason Flannick,Jason Flannick,Menachem Fromer,Laura D. Gauthier,Jackie Goldstein,Jackie Goldstein,Namrata Gupta,Daniel P. Howrigan,Daniel P. Howrigan,Adam Kiezun,Mitja I. Kurki,Mitja I. Kurki,Ami Levy Moonshine,Pradeep Natarajan,Lorena Orozco,Gina M. Peloso,Gina M. Peloso,Ryan Poplin,Manuel A. Rivas,Valentin Ruano-Rubio,Samuel A. Rose,Douglas M. Ruderfer,Khalid Shakir,Peter D. Stenson,Christine Stevens,Brett Thomas,Brett Thomas,Grace Tiao,María Teresa Tusié-Luna,Ben Weisburd,Hong-Hee Won,Dongmei Yu,David Altshuler,David Altshuler,Diego Ardissino,Michael Boehnke,John Danesh,Stacey Donnelly,Roberto Elosua,Jose C. Florez,Jose C. Florez,Stacey Gabriel,Gad Getz,Gad Getz,Stephen J. Glatt,Christina M. Hultman,Sekar Kathiresan,Markku Laakso,Steven A. McCarroll,Steven A. McCarroll,Mark I. McCarthy,Mark I. McCarthy,Dermot P.B. McGovern,Ruth McPherson,Benjamin M. Neale,Benjamin M. Neale,Aarno Palotie,Shaun Purcell,Danish Saleheen,Jeremiah M. Scharf,Pamela Sklar,Patrick F. Sullivan,Patrick F. Sullivan,Jaakko Tuomilehto,Ming T. Tsuang,Hugh Watkins,Hugh Watkins,James G. Wilson,Mark J. Daly,Mark J. Daly,Daniel G. MacArthur,Daniel G. MacArthur +106 more
TL;DR: The aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries generated as part of the Exome Aggregation Consortium (ExAC) provides direct evidence for the presence of widespread mutational recurrence.
Journal ArticleDOI
A Map of Human Genome Variation From Population-Scale Sequencing
Gonçalo R. Abecasis,David Altshuler,David Altshuler,Adam Auton,Lisa D Brooks,Richard Durbin,Richard A. Gibbs,Matthew E. Hurles,Gil McVean +8 more
TL;DR: The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation as a foundation for investigating the relationship between genotype and phenotype as mentioned in this paper, and the results of the pilot phase of the project, designed to develop and compare different strategies for genomewide sequencing with high-throughput platforms.
Related Papers (5)
Mapping And Phasing Of Structural Variation In Patient Genomes Using Nanopore Sequencing
Mircea Cretu Stancu,Markus J. van Roosmalen,Ivo Renkens,Marleen M. Nieboer,Sjors Middelkamp,Joep de Ligt,Giulia Pregno,Daniela Giachino,Giorgia Mandrile,Jose Espejo Valle-Inclan,Jerome Korzelius,Ewart de Bruijn,Edwin Cuppen,Michael E. Talkowski,Tobias Marschall,Jeroen de Ridder,Wigard P. Kloosterman +16 more
Long span DNA paired-end-tag (DNA-PET) sequencing strategy for the interrogation of genomic structural mutations and fusion-point-guided reconstruction of amplicons.
Fei Yao,Fei Yao,Pramila N. Ariyaratne,Axel M. Hillmer,Wah Heng Lee,Guoliang Li,Audrey S.M. Teo,Xing Yi Woo,Zhenshui Zhang,Jieqi P. Chen,Wan Ting Poh,Kelson F B Zawack,Chee Seng Chan,See Ting Leong,Say Chuan Neo,Poh Sum D Choi,Song Gao,Niranjan Nagarajan,Hervé Thoreau,Atif Shahab,Xiaoan Ruan,Valere Cacheux-Rataboul,Chia-Lin Wei,Guillaume Bourque,Wing-Kin Sung,Edison T. Liu,Yijun Ruan,Yijun Ruan +27 more