Home
/
Authors
/
Valerie A. Schneider

Author

Valerie A. Schneider

Bio: Valerie A. Schneider is an academic researcher from National Institutes of Health. The author has contributed to research in topics: Reference genome & Genome. The author has an hindex of 17, co-authored 25 publications receiving 13117 citations.

Topics: Reference genome, Genome, Human genome, Sequence assembly, Medicine ...read more

Papers

PDF

Open Access

More filters

Journal Article•DOI•

A global reference for human genetic variation.

[...]

Adam Auton¹, Gonçalo R. Abecasis², David Altshuler³, Richard Durbin⁴ +514 more•Institutions (90)

01 Oct 2015-Nature

TL;DR: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations, and has reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-generation sequencing, deep exome sequencing, and dense microarray genotyping.

...read moreread less

Abstract: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.

...read moreread less

12,661 citations

A global reference for human genetic variation

[...]

Adam Auton, Gonçalo R. Abecasis, David Altshuler, Richard Durbin +476 more

01 Oct 2015

TL;DR: The 1000 Genomes Project as mentioned in this paper provided a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations, and reported the completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole genome sequencing, deep exome sequencing and dense microarray genotyping.

...read moreread less

3,247 citations

Journal Article•DOI•

Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly

[...]

Valerie A. Schneider¹, Tina A. Graves-Lindsay², Kerstin Howe³, Nathan Bouk¹, Hsiu-Chuan Chen¹, Paul Kitts¹, Terence Murphy¹, Kim D. Pruitt¹, Françoise Thibaud-Nissen¹, Derek Albracht², Robert S. Fulton², Milinn Kremitzki², Vincent Magrini², Chris Markovic², Sean McGrath², Karyn Meltz Steinberg², Kate Auger³, William Chow³, Joanna Collins³, Glenn Harden³, Tim Hubbard³, Sarah Pelan³, Jared T. Simpson³, Glen Threadgold³, James Torrance³, Jonathan Wood³, Laura Clarke⁴, Sergey Koren¹, Matthew Boitano⁵, Paul Peluso⁵, Heng Li⁶, Chen-Shan Chin⁵, Adam M. Phillippy¹, Richard Durbin³, Richard K. Wilson², Paul Flicek⁴, Evan E. Eichler⁷, Deanna M. Church¹ - Show less +34 more•Institutions (7)

National Institutes of Health¹, Washington University in St. Louis², Wellcome Trust Sanger Institute³, European Bioinformatics Institute⁴, Pacific Biosciences⁵, Broad Institute⁶, University of Washington⁷

01 May 2017-Genome Research

TL;DR: It is asserted that the collected updates in GRCh38 make the newer assembly a more robust substrate for comprehensive analyses that will promote the understanding of human biology and advance the efforts to improve health.

...read moreread less

Abstract: The human reference genome assembly plays a central role in nearly all aspects of today's basic and clinical research. GRCh38 is the first coordinate-changing assembly update since 2009; it reflects the resolution of roughly 1000 issues and encompasses modifications ranging from thousands of single base changes to megabase-scale path reorganizations, gap closures, and localization of previously orphaned sequences. We developed a new approach to sequence generation for targeted base updates and used data from new genome mapping technologies and single haplotype resources to identify and resolve larger assembly issues. For the first time, the reference assembly contains sequence-based representations for the centromeres. We also expanded the number of alternate loci to create a reference that provides a more robust representation of human population variation. We demonstrate that the updates render the reference an improved annotation substrate, alter read alignments in unchanged regions, and impact variant interpretation at clinically relevant loci. We additionally evaluated a collection of new de novo long-read haploid assemblies and conclude that although the new assemblies compare favorably to the reference with respect to continuity, error rate, and gene completeness, the reference still provides the best representation for complex genomic regions and coding sequences. We assert that the collected updates in GRCh38 make the newer assembly a more robust substrate for comprehensive analyses that will promote our understanding of human biology and advance our efforts to improve health.

...read moreread less

643 citations

Journal Article•DOI•

Telomere-to-telomere assembly of a complete human X chromosome

[...]

Karen H. Miga¹, Sergey Koren², Arang Rhie², Mitchell R. Vollger³, Ariel Gershman⁴, Andrey Bzikadze⁵, Shelise Brooks², Edmund Howe⁶, David Porubsky³, Glennis A. Logsdon³, Valerie A. Schneider², Tamara A. Potapova⁶, Jonathan Wood⁷, William Chow⁷, Joel Armstrong¹, Jeanne Fredrickson³, Evgenia Pak², Kristof Tigyi¹, Milinn Kremitzki⁸, Christopher Markovic⁸, Valerie Maduro², Amalia Dutra², Gerard G. Bouffard², Alexander M. Chang², Nancy F. Hansen², Amy B. Wilfert³, Françoise Thibaud-Nissen², Anthony D. Schmitt, Jon Matthew Belton, Siddarth Selvaraj, Megan Y. Dennis⁹, Daniela C. Soto⁹, Ruta Sahasrabudhe⁹, Gulhan Kaya⁹, Josh Quick¹⁰, Nicholas J. Loman¹⁰, Nadine Holmes¹¹, Matthew Loose¹¹, Urvashi Surti¹², Rosa Ana Risques³, Tina A. Graves Lindsay⁸, Robert S. Fulton⁸, Ira M. Hall⁸, Benedict Paten¹, Kerstin Howe⁷, Winston Timp⁴, Alice Young², James C. Mullikin², Pavel A. Pevzner⁵, Jennifer L. Gerton⁶, Beth A. Sullivan¹³, Evan E. Eichler³, Adam M. Phillippy² - Show less +49 more•Institutions (13)

University of California, Santa Cruz¹, National Institutes of Health², University of Washington³, Johns Hopkins University⁴, University of California, San Diego⁵, Stowers Institute for Medical Research⁶, Wellcome Trust Sanger Institute⁷, Washington University in St. Louis⁸, University of California, Davis⁹, University of Birmingham¹⁰, University of Nottingham¹¹, University of Pittsburgh¹², Duke University¹³

03 Sep 2020-Nature

TL;DR: High-coverage, ultra-long-read nanopore sequencing is used to create a new human genome assembly that improves on the coverage and accuracy of the current reference (GRCh38) and includes the gap-free, telomere-to-telomere sequence of the X chromosome.

...read moreread less

Abstract: After two decades of improvements, the current human reference genome (GRCh38) is the most accurate and complete vertebrate genome ever produced. However, no single chromosome has been finished end to end, and hundreds of unresolved gaps persist1,2. Here we present a human genome assembly that surpasses the continuity of GRCh382, along with a gapless, telomere-to-telomere assembly of a human chromosome. This was enabled by high-coverage, ultra-long-read nanopore sequencing of the complete hydatidiform mole CHM13 genome, combined with complementary technologies for quality improvement and validation. Focusing our efforts on the human X chromosome3, we reconstructed the centromeric satellite DNA array (approximately 3.1 Mb) and closed the 29 remaining gaps in the current reference, including new sequences from the human pseudoautosomal regions and from cancer-testis ampliconic gene families (CT-X and GAGE). These sequences will be integrated into future human reference genome releases. In addition, the complete chromosome X, combined with the ultra-long nanopore data, allowed us to map methylation patterns across complex tandem repeats and satellite arrays. Our results demonstrate that finishing the entire human genome is now within reach, and the data presented here will facilitate ongoing efforts to complete the other human chromosomes. High-coverage, ultra-long-read nanopore sequencing is used to create a new human genome assembly that improves on the coverage and accuracy of the current reference (GRCh38) and includes the gap-free, telomere-to-telomere sequence of the X chromosome.

...read moreread less

502 citations

Journal Article•DOI•

Modernizing Reference Genome Assemblies

[...]

Deanna M. Church¹, Valerie A. Schneider¹, Tina Graves², Katherine Auger³, Fiona Cunningham, Nathan Bouk¹, Hsiu Chuan Chen¹, Richa Agarwala¹, William M. McLaren, Graham R. S. Ritchie, Derek Albracht², Milinn Kremitzki², Susan M. Rock², Holland Kotkiewicz², Colin Kremitzki², Aye Wollam², Lee Trani², Lucinda Fulton², Robert S. Fulton², Lucy Matthews³, S. Whitehead³, William Chow³, James Torrance³, Matthew Dunn³, Glenn Harden³, Glen Threadgold³, Jonathan Wood³, Joanna Collins³, Paul Heath³, Guy Griffiths³, Sarah Pelan³, Darren Grafham³, Evan E. Eichler⁴, Evan E. Eichler⁵, George M. Weinstock², Elaine R. Mardis², Richard K. Wilson², Kerstin Howe³, Paul Flicek, Tim Hubbard³ - Show less +36 more•Institutions (5)

National Institutes of Health¹, Washington University in St. Louis², Wellcome Trust Sanger Institute³, Howard Hughes Medical Institute⁴, University of Washington⁵

05 Jul 2011-PLOS Biology

TL;DR: Support for this work came from the Intramural Research Program of the NIH, The National Library of Medicine, the European Molecular Biology Laboratory, the Wellcome Trust, and the Howard Hughes Medical Institute.

...read moreread less

Abstract: I have read the journal's policy and have the following conflicts: Paul Flicek is married to the deputy editor of PLoS Medicine, Melissa Norton. Evan Eichler is on the board of Pacific Biosciences. Support for this work came from the Intramural Research Program of the NIH, The National Library of Medicine, the European Molecular Biology Laboratory, the Wellcome Trust (grant number 077198), and the Howard Hughes Medical Institute (EEE). The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.

...read moreread less

451 citations

1
2
3
4
…
5
6
7

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

A global reference for human genetic variation.

[...]

Adam Auton¹, Gonçalo R. Abecasis², David Altshuler³, Richard Durbin⁴ +514 more•Institutions (90)

01 Oct 2015-Nature

...read moreread less

12,661 citations

SPAdes, a new genome assembly algorithm and its applications to single-cell sequencing ( 7th Annual SFAF Meeting, 2012)

[...]

Glenn Tesler

01 Jun 2012

TL;DR: SPAdes as mentioned in this paper is a new assembler for both single-cell and standard (multicell) assembly, and demonstrate that it improves on the recently released E+V-SC assembler and on popular assemblers Velvet and SoapDeNovo (for multicell data).

...read moreread less

Abstract: The lion's share of bacteria in various environments cannot be cloned in the laboratory and thus cannot be sequenced using existing technologies. A major goal of single-cell genomics is to complement gene-centric metagenomic data with whole-genome assemblies of uncultivated organisms. Assembly of single-cell data is challenging because of highly non-uniform read coverage as well as elevated levels of sequencing errors and chimeric reads. We describe SPAdes, a new assembler for both single-cell and standard (multicell) assembly, and demonstrate that it improves on the recently released E+V-SC assembler (specialized for single-cell data) and on popular assemblers Velvet and SoapDeNovo (for multicell data). SPAdes generates single-cell assemblies, providing information about genomes of uncultivatable bacteria that vastly exceeds what may be obtained via traditional metagenomics studies. SPAdes is available online ( http://bioinf.spbau.ru/spades ). It is distributed as open source software.

...read moreread less

10,124 citations

Journal Article•DOI•

Analysis of protein-coding genetic variation in 60,706 humans

[...]

Monkol Lek, Konrad J. Karczewski¹, Konrad J. Karczewski², Eric Vallabh Minikel², Eric Vallabh Minikel¹, Kaitlin E. Samocha, Eric Banks¹, Timothy Fennell¹, Anne H. O’Donnell-Luria², Anne H. O’Donnell-Luria¹, Anne H. O’Donnell-Luria³, James S. Ware, Andrew J. Hill¹, Andrew J. Hill⁴, Andrew J. Hill², Beryl B. Cummings¹, Beryl B. Cummings², Taru Tukiainen¹, Taru Tukiainen², Daniel P. Birnbaum¹, Jack A. Kosmicki, Laramie E. Duncan², Laramie E. Duncan¹, Karol Estrada², Karol Estrada¹, Fengmei Zhao¹, Fengmei Zhao², James Zou¹, Emma Pierce-Hoffman¹, Emma Pierce-Hoffman², Joanne Berghout⁵, David Neil Cooper⁶, Nicole A. Deflaux⁷, Mark A. DePristo¹, Ron Do, Jason Flannick¹, Jason Flannick², Menachem Fromer, Laura D. Gauthier¹, Jackie Goldstein¹, Jackie Goldstein², Namrata Gupta¹, Daniel P. Howrigan², Daniel P. Howrigan¹, Adam Kiezun¹, Mitja I. Kurki², Mitja I. Kurki¹, Ami Levy Moonshine¹, Pradeep Natarajan, Lorena Orozco, Gina M. Peloso², Gina M. Peloso¹, Ryan Poplin¹, Manuel A. Rivas¹, Valentin Ruano-Rubio¹, Samuel A. Rose¹, Douglas M. Ruderfer⁸, Khalid Shakir¹, Peter D. Stenson⁶, Christine Stevens¹, Brett Thomas¹, Brett Thomas², Grace Tiao¹, María Teresa Tusié-Luna, Ben Weisburd¹, Hong-Hee Won⁹, Dongmei Yu, David Altshuler¹, David Altshuler¹⁰, Diego Ardissino, Michael Boehnke¹¹, John Danesh¹², Stacey Donnelly¹, Roberto Elosua, Jose C. Florez², Jose C. Florez¹, Stacey Gabriel¹, Gad Getz¹, Gad Getz², Stephen J. Glatt¹³, Christina M. Hultman¹⁴, Sekar Kathiresan, Markku Laakso¹⁵, Steven A. McCarroll², Steven A. McCarroll¹, Mark I. McCarthy¹⁶, Mark I. McCarthy¹⁷, Dermot P.B. McGovern¹⁸, Ruth McPherson¹⁹, Benjamin M. Neale¹, Benjamin M. Neale², Aarno Palotie, Shaun Purcell⁸, Danish Saleheen²⁰, Jeremiah M. Scharf, Pamela Sklar, Patrick F. Sullivan¹⁴, Patrick F. Sullivan²¹, Jaakko Tuomilehto²², Ming T. Tsuang²³, Hugh Watkins¹⁷, Hugh Watkins¹⁶, James G. Wilson²⁴, Mark J. Daly², Mark J. Daly¹, Daniel G. MacArthur², Daniel G. MacArthur¹ - Show less +103 more•Institutions (24)

Broad Institute¹, Harvard University², Boston Children's Hospital³, University of Washington⁴, University of Arizona⁵, Cardiff University⁶, Google⁷, Icahn School of Medicine at Mount Sinai⁸, Samsung Medical Center⁹, Vertex Pharmaceuticals¹⁰, University of Michigan¹¹, University of Cambridge¹², State University of New York Upstate Medical University¹³, Karolinska Institutet¹⁴, University of Eastern Finland¹⁵, Wellcome Trust Centre for Human Genetics¹⁶, University of Oxford¹⁷, Cedars-Sinai Medical Center¹⁸, University of Ottawa¹⁹, University of Pennsylvania²⁰, University of North Carolina at Chapel Hill²¹, University of Helsinki²², University of California, San Diego²³, University of Mississippi Medical Center²⁴

18 Aug 2016-Nature

TL;DR: The aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries generated as part of the Exome Aggregation Consortium (ExAC) provides direct evidence for the presence of widespread mutational recurrence.

...read moreread less

Abstract: Large-scale reference data sets of human genetic variation are critical for the medical and functional interpretation of DNA sequence changes. Here we describe the aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries generated as part of the Exome Aggregation Consortium (ExAC). This catalogue of human genetic diversity contains an average of one variant every eight bases of the exome, and provides direct evidence for the presence of widespread mutational recurrence. We have used this catalogue to calculate objective metrics of pathogenicity for sequence variants, and to identify genes subject to strong selection against various classes of mutation; identifying 3,230 genes with near-complete depletion of predicted protein-truncating variants, with 72% of these genes having no currently established human disease phenotype. Finally, we demonstrate that these data can be used for the efficient filtering of candidate disease-causing variants, and for the discovery of human 'knockout' variants in protein-coding genes.

...read moreread less

8,758 citations

Journal Article•DOI•

Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation.

[...]

Sergey Koren¹, Brian P. Walenz¹, Konstantin Berlin², Jason R. Miller³, Nicholas H. Bergman, Adam M. Phillippy¹ - Show less +2 more•Institutions (3)

National Institutes of Health¹, Invincea², J. Craig Venter Institute³

15 Mar 2017-Genome Research

TL;DR: Canu, a successor of Celera Assembler that is specifically designed for noisy single-molecule sequences, is presented, demonstrating that Canu can reliably assemble complete microbial genomes and near-complete eukaryotic chromosomes using either Pacific Biosciences or Oxford Nanopore technologies.

...read moreread less

Abstract: Long-read single-molecule sequencing has revolutionized de novo genome assembly and enabled the automated reconstruction of reference-quality genomes. However, given the relatively high error rates of such technologies, efficient and accurate assembly of large repeats and closely related haplotypes remains challenging. We address these issues with Canu, a successor of Celera Assembler that is specifically designed for noisy single-molecule sequences. Canu introduces support for nanopore sequencing, halves depth-of-coverage requirements, and improves assembly continuity while simultaneously reducing runtime by an order of magnitude on large genomes versus Celera Assembler 8.2. These advances result from new overlapping and assembly algorithms, including an adaptive overlapping strategy based on tf-idf weighted MinHash and a sparse assembly graph construction that avoids collapsing diverged repeats and haplotypes. We demonstrate that Canu can reliably assemble complete microbial genomes and near-complete eukaryotic chromosomes using either Pacific Biosciences (PacBio) or Oxford Nanopore technologies and achieves a contig NG50 of >21 Mbp on both human and Drosophila melanogaster PacBio data sets. For assembly structures that cannot be linearly represented, Canu provides graph-based assembly outputs in graphical fragment assembly (GFA) format for analysis or integration with complementary phasing and scaffolding techniques. The combination of such highly resolved assembly graphs with long-range scaffolding information promises the complete and automated assembly of complex genomes.

...read moreread less

4,806 citations

Journal Article•DOI•

The UK Biobank resource with deep phenotyping and genomic data

[...]

Clare Bycroft¹, Colin Freeman¹, Desislava Petkova¹, Desislava Petkova², Gavin Band¹, Lloyd T. Elliott¹, Kevin Sharp¹, Allan Motyer³, Damjan Vukcevic³, Olivier Delaneau⁴, Olivier Delaneau⁵, Jared O'Connell⁶, Adrian Cortes¹, Adrian Cortes⁷, Samantha Welsh, Alan Young¹, Mark Effingham, Gil McVean¹, Stephen Leslie³, Naomi E. Allen¹, Peter Donnelly¹, Jonathan Marchini¹ - Show less +18 more•Institutions (7)

University of Oxford¹, Procter & Gamble², University of Melbourne³, Swiss Institute of Bioinformatics⁴, University of Geneva⁵, Illumina⁶, John Radcliffe Hospital⁷

11 Oct 2018-Nature

TL;DR: Deep phenotype and genome-wide genetic data from 500,000 individuals from the UK Biobank is described, describing population structure and relatedness in the cohort, and imputation to increase the number of testable variants to 96 million.

...read moreread less

Abstract: The UK Biobank project is a prospective cohort study with deep genetic and phenotypic data collected on approximately 500,000 individuals from across the United Kingdom, aged between 40 and 69 at recruitment. The open resource is unique in its size and scope. A rich variety of phenotypic and health-related information is available on each participant, including biological measurements, lifestyle indicators, biomarkers in blood and urine, and imaging of the body and brain. Follow-up information is provided by linking health and medical records. Genome-wide genotype data have been collected on all participants, providing many opportunities for the discovery of new genetic associations and the genetic bases of complex traits. Here we describe the centralized analysis of the genetic data, including genotype quality, properties of population structure and relatedness of the genetic data, and efficient phasing and genotype imputation that increases the number of testable variants to around 96 million. Classical allelic variation at 11 human leukocyte antigen genes was imputed, resulting in the recovery of signals with known associations between human leukocyte antigen alleles and many diseases.

...read moreread less

4,489 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse