scispace - formally typeset
Search or ask a question
Author

Valerie A. Schneider

Bio: Valerie A. Schneider is an academic researcher from National Institutes of Health. The author has contributed to research in topics: Reference genome & Genome. The author has an hindex of 17, co-authored 25 publications receiving 13117 citations.

Papers
More filters
Journal ArticleDOI
Adam Auton1, Gonçalo R. Abecasis2, David Altshuler3, Richard Durbin4  +514 moreInstitutions (90)
01 Oct 2015-Nature
TL;DR: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations, and has reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-generation sequencing, deep exome sequencing, and dense microarray genotyping.
Abstract: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.

12,661 citations

01 Oct 2015
TL;DR: The 1000 Genomes Project as mentioned in this paper provided a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations, and reported the completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole genome sequencing, deep exome sequencing and dense microarray genotyping.
Abstract: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.

3,247 citations

Journal ArticleDOI
TL;DR: It is asserted that the collected updates in GRCh38 make the newer assembly a more robust substrate for comprehensive analyses that will promote the understanding of human biology and advance the efforts to improve health.
Abstract: The human reference genome assembly plays a central role in nearly all aspects of today's basic and clinical research. GRCh38 is the first coordinate-changing assembly update since 2009; it reflects the resolution of roughly 1000 issues and encompasses modifications ranging from thousands of single base changes to megabase-scale path reorganizations, gap closures, and localization of previously orphaned sequences. We developed a new approach to sequence generation for targeted base updates and used data from new genome mapping technologies and single haplotype resources to identify and resolve larger assembly issues. For the first time, the reference assembly contains sequence-based representations for the centromeres. We also expanded the number of alternate loci to create a reference that provides a more robust representation of human population variation. We demonstrate that the updates render the reference an improved annotation substrate, alter read alignments in unchanged regions, and impact variant interpretation at clinically relevant loci. We additionally evaluated a collection of new de novo long-read haploid assemblies and conclude that although the new assemblies compare favorably to the reference with respect to continuity, error rate, and gene completeness, the reference still provides the best representation for complex genomic regions and coding sequences. We assert that the collected updates in GRCh38 make the newer assembly a more robust substrate for comprehensive analyses that will promote our understanding of human biology and advance our efforts to improve health.

643 citations

Journal ArticleDOI
03 Sep 2020-Nature
TL;DR: High-coverage, ultra-long-read nanopore sequencing is used to create a new human genome assembly that improves on the coverage and accuracy of the current reference (GRCh38) and includes the gap-free, telomere-to-telomere sequence of the X chromosome.
Abstract: After two decades of improvements, the current human reference genome (GRCh38) is the most accurate and complete vertebrate genome ever produced. However, no single chromosome has been finished end to end, and hundreds of unresolved gaps persist1,2. Here we present a human genome assembly that surpasses the continuity of GRCh382, along with a gapless, telomere-to-telomere assembly of a human chromosome. This was enabled by high-coverage, ultra-long-read nanopore sequencing of the complete hydatidiform mole CHM13 genome, combined with complementary technologies for quality improvement and validation. Focusing our efforts on the human X chromosome3, we reconstructed the centromeric satellite DNA array (approximately 3.1 Mb) and closed the 29 remaining gaps in the current reference, including new sequences from the human pseudoautosomal regions and from cancer-testis ampliconic gene families (CT-X and GAGE). These sequences will be integrated into future human reference genome releases. In addition, the complete chromosome X, combined with the ultra-long nanopore data, allowed us to map methylation patterns across complex tandem repeats and satellite arrays. Our results demonstrate that finishing the entire human genome is now within reach, and the data presented here will facilitate ongoing efforts to complete the other human chromosomes. High-coverage, ultra-long-read nanopore sequencing is used to create a new human genome assembly that improves on the coverage and accuracy of the current reference (GRCh38) and includes the gap-free, telomere-to-telomere sequence of the X chromosome.

502 citations

Journal ArticleDOI
TL;DR: Support for this work came from the Intramural Research Program of the NIH, The National Library of Medicine, the European Molecular Biology Laboratory, the Wellcome Trust, and the Howard Hughes Medical Institute.
Abstract: I have read the journal's policy and have the following conflicts: Paul Flicek is married to the deputy editor of PLoS Medicine, Melissa Norton. Evan Eichler is on the board of Pacific Biosciences. Support for this work came from the Intramural Research Program of the NIH, The National Library of Medicine, the European Molecular Biology Laboratory, the Wellcome Trust (grant number 077198), and the Howard Hughes Medical Institute (EEE). The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.

451 citations


Cited by
More filters
Journal ArticleDOI
Adam Auton1, Gonçalo R. Abecasis2, David Altshuler3, Richard Durbin4  +514 moreInstitutions (90)
01 Oct 2015-Nature
TL;DR: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations, and has reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-generation sequencing, deep exome sequencing, and dense microarray genotyping.
Abstract: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.

12,661 citations

01 Jun 2012
TL;DR: SPAdes as mentioned in this paper is a new assembler for both single-cell and standard (multicell) assembly, and demonstrate that it improves on the recently released E+V-SC assembler and on popular assemblers Velvet and SoapDeNovo (for multicell data).
Abstract: The lion's share of bacteria in various environments cannot be cloned in the laboratory and thus cannot be sequenced using existing technologies. A major goal of single-cell genomics is to complement gene-centric metagenomic data with whole-genome assemblies of uncultivated organisms. Assembly of single-cell data is challenging because of highly non-uniform read coverage as well as elevated levels of sequencing errors and chimeric reads. We describe SPAdes, a new assembler for both single-cell and standard (multicell) assembly, and demonstrate that it improves on the recently released E+V-SC assembler (specialized for single-cell data) and on popular assemblers Velvet and SoapDeNovo (for multicell data). SPAdes generates single-cell assemblies, providing information about genomes of uncultivatable bacteria that vastly exceeds what may be obtained via traditional metagenomics studies. SPAdes is available online ( http://bioinf.spbau.ru/spades ). It is distributed as open source software.

10,124 citations

Journal ArticleDOI
Monkol Lek, Konrad J. Karczewski1, Konrad J. Karczewski2, Eric Vallabh Minikel2, Eric Vallabh Minikel1, Kaitlin E. Samocha, Eric Banks1, Timothy Fennell1, Anne H. O’Donnell-Luria2, Anne H. O’Donnell-Luria1, Anne H. O’Donnell-Luria3, James S. Ware, Andrew J. Hill1, Andrew J. Hill4, Andrew J. Hill2, Beryl B. Cummings1, Beryl B. Cummings2, Taru Tukiainen1, Taru Tukiainen2, Daniel P. Birnbaum1, Jack A. Kosmicki, Laramie E. Duncan2, Laramie E. Duncan1, Karol Estrada2, Karol Estrada1, Fengmei Zhao1, Fengmei Zhao2, James Zou1, Emma Pierce-Hoffman1, Emma Pierce-Hoffman2, Joanne Berghout5, David Neil Cooper6, Nicole A. Deflaux7, Mark A. DePristo1, Ron Do, Jason Flannick1, Jason Flannick2, Menachem Fromer, Laura D. Gauthier1, Jackie Goldstein1, Jackie Goldstein2, Namrata Gupta1, Daniel P. Howrigan2, Daniel P. Howrigan1, Adam Kiezun1, Mitja I. Kurki2, Mitja I. Kurki1, Ami Levy Moonshine1, Pradeep Natarajan, Lorena Orozco, Gina M. Peloso2, Gina M. Peloso1, Ryan Poplin1, Manuel A. Rivas1, Valentin Ruano-Rubio1, Samuel A. Rose1, Douglas M. Ruderfer8, Khalid Shakir1, Peter D. Stenson6, Christine Stevens1, Brett Thomas1, Brett Thomas2, Grace Tiao1, María Teresa Tusié-Luna, Ben Weisburd1, Hong-Hee Won9, Dongmei Yu, David Altshuler1, David Altshuler10, Diego Ardissino, Michael Boehnke11, John Danesh12, Stacey Donnelly1, Roberto Elosua, Jose C. Florez2, Jose C. Florez1, Stacey Gabriel1, Gad Getz1, Gad Getz2, Stephen J. Glatt13, Christina M. Hultman14, Sekar Kathiresan, Markku Laakso15, Steven A. McCarroll2, Steven A. McCarroll1, Mark I. McCarthy16, Mark I. McCarthy17, Dermot P.B. McGovern18, Ruth McPherson19, Benjamin M. Neale1, Benjamin M. Neale2, Aarno Palotie, Shaun Purcell8, Danish Saleheen20, Jeremiah M. Scharf, Pamela Sklar, Patrick F. Sullivan14, Patrick F. Sullivan21, Jaakko Tuomilehto22, Ming T. Tsuang23, Hugh Watkins17, Hugh Watkins16, James G. Wilson24, Mark J. Daly2, Mark J. Daly1, Daniel G. MacArthur2, Daniel G. MacArthur1 
18 Aug 2016-Nature
TL;DR: The aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries generated as part of the Exome Aggregation Consortium (ExAC) provides direct evidence for the presence of widespread mutational recurrence.
Abstract: Large-scale reference data sets of human genetic variation are critical for the medical and functional interpretation of DNA sequence changes. Here we describe the aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries generated as part of the Exome Aggregation Consortium (ExAC). This catalogue of human genetic diversity contains an average of one variant every eight bases of the exome, and provides direct evidence for the presence of widespread mutational recurrence. We have used this catalogue to calculate objective metrics of pathogenicity for sequence variants, and to identify genes subject to strong selection against various classes of mutation; identifying 3,230 genes with near-complete depletion of predicted protein-truncating variants, with 72% of these genes having no currently established human disease phenotype. Finally, we demonstrate that these data can be used for the efficient filtering of candidate disease-causing variants, and for the discovery of human 'knockout' variants in protein-coding genes.

8,758 citations

Journal ArticleDOI
TL;DR: Canu, a successor of Celera Assembler that is specifically designed for noisy single-molecule sequences, is presented, demonstrating that Canu can reliably assemble complete microbial genomes and near-complete eukaryotic chromosomes using either Pacific Biosciences or Oxford Nanopore technologies.
Abstract: Long-read single-molecule sequencing has revolutionized de novo genome assembly and enabled the automated reconstruction of reference-quality genomes. However, given the relatively high error rates of such technologies, efficient and accurate assembly of large repeats and closely related haplotypes remains challenging. We address these issues with Canu, a successor of Celera Assembler that is specifically designed for noisy single-molecule sequences. Canu introduces support for nanopore sequencing, halves depth-of-coverage requirements, and improves assembly continuity while simultaneously reducing runtime by an order of magnitude on large genomes versus Celera Assembler 8.2. These advances result from new overlapping and assembly algorithms, including an adaptive overlapping strategy based on tf-idf weighted MinHash and a sparse assembly graph construction that avoids collapsing diverged repeats and haplotypes. We demonstrate that Canu can reliably assemble complete microbial genomes and near-complete eukaryotic chromosomes using either Pacific Biosciences (PacBio) or Oxford Nanopore technologies and achieves a contig NG50 of >21 Mbp on both human and Drosophila melanogaster PacBio data sets. For assembly structures that cannot be linearly represented, Canu provides graph-based assembly outputs in graphical fragment assembly (GFA) format for analysis or integration with complementary phasing and scaffolding techniques. The combination of such highly resolved assembly graphs with long-range scaffolding information promises the complete and automated assembly of complex genomes.

4,806 citations

Journal ArticleDOI
11 Oct 2018-Nature
TL;DR: Deep phenotype and genome-wide genetic data from 500,000 individuals from the UK Biobank is described, describing population structure and relatedness in the cohort, and imputation to increase the number of testable variants to 96 million.
Abstract: The UK Biobank project is a prospective cohort study with deep genetic and phenotypic data collected on approximately 500,000 individuals from across the United Kingdom, aged between 40 and 69 at recruitment. The open resource is unique in its size and scope. A rich variety of phenotypic and health-related information is available on each participant, including biological measurements, lifestyle indicators, biomarkers in blood and urine, and imaging of the body and brain. Follow-up information is provided by linking health and medical records. Genome-wide genotype data have been collected on all participants, providing many opportunities for the discovery of new genetic associations and the genetic bases of complex traits. Here we describe the centralized analysis of the genetic data, including genotype quality, properties of population structure and relatedness of the genetic data, and efficient phasing and genotype imputation that increases the number of testable variants to around 96 million. Classical allelic variation at 11 human leukocyte antigen genes was imputed, resulting in the recovery of signals with known associations between human leukocyte antigen alleles and many diseases.

4,489 citations