A global reference for human genetic variation.

doi:10.1038/NATURE15393

Home
/
Papers
/
A global reference for human genetic variation.

Journal Article•DOI•

A global reference for human genetic variation.

Adam Auton¹, Gonçalo R. Abecasis², David Altshuler³, Richard Durbin⁴ +514 more•Institutions (90)

01 Oct 2015-Nature (Nature Publishing Group)-Vol. 526, Iss: 7571, pp 68-74

TL;DR: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations, and has reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-generation sequencing, deep exome sequencing, and dense microarray genotyping.

read less

Abstract: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Mutational signatures in esophageal adenocarcinoma define etiologically distinct subgroups with therapeutic relevance

[...]

Maria Secrier¹, Xiaodun Li¹, Nadeera de Silva¹, Matthew D. Eldridge¹, Gianmarco Contino¹, Jan Bornschein¹, Shona MacRae¹, Nicola Grehan¹, Maria O'Donovan¹, Ahmad Miremadi¹, Tsun-Po Yang¹, Lawrence Bower¹, Hamza Chettouh¹, Jason Crawte¹, Nuria Galeano-Dalmau¹, Anna M. Grabowska², John H. Saunders³, Timothy J. Underwood⁴, Timothy J. Underwood⁵, Nicola Waddell⁶, Andrew Barbour⁷, Barbara Nutzinger¹, Achilleas Achilleos¹, Paul A.W. Edwards¹, Andy G. Lynch¹, Simon Tavaré¹, Rebecca C. Fitzgerald¹ - Show less +23 more•Institutions (7)

University of Cambridge¹, University of Nottingham², Nottingham University Hospitals NHS Trust³, University Hospital Southampton NHS Foundation Trust⁴, University of Southampton⁵, QIMR Berghofer Medical Research Institute⁶, Princess Alexandra Hospital⁷

05 Sep 2016-Nature Genetics

TL;DR: WGS analysis of 129 cases of Esophageal adenocarcinoma demonstrated that this is a heterogeneous cancer dominated by copy number alterations with frequent large-scale rearrangements, and mutational signatures showed three distinct molecular subtypes with potential therapeutic relevance.

...read moreread less

Abstract: Esophageal adenocarcinoma (EAC) has a poor outcome, and targeted therapy trials have thus far been disappointing owing to a lack of robust stratification methods. Whole-genome sequencing (WGS) analysis of 129 cases demonstrated that this is a heterogeneous cancer dominated by copy number alterations with frequent large-scale rearrangements. Co-amplification of receptor tyrosine kinases (RTKs) and/or downstream mitogenic activation is almost ubiquitous; thus tailored combination RTK inhibitor (RTKi) therapy might be required, as we demonstrate in vitro. However, mutational signatures showed three distinct molecular subtypes with potential therapeutic relevance, which we verified in an independent cohort (n = 87): (i) enrichment for BRCA signature with prevalent defects in the homologous recombination pathway; (ii) dominant T>G mutational pattern associated with a high mutational load and neoantigen burden; and (iii) C>A/T mutational pattern with evidence of an aging imprint. These subtypes could be ascertained using a clinically applicable sequencing strategy (low coverage) as a basis for therapy selection.

...read moreread less

292 citations

Cites methods from "A global reference for human geneti..."

...Before running the software, common variants in the 1000 genomes database [75] appearing in at least 0....
[...]

Journal Article•DOI•

Haplotype-resolved diverse human genomes and integrated analysis of structural variation.

[...]

Peter Ebert¹, Peter A. Audano², Qihui Zhu, Bernardo Rodriguez-Martin³, David Porubsky², Marc Jan Bonder³, Marc Jan Bonder⁴, Arvis Sulovari², Jana Ebler¹, Weichen Zhou⁵, Rebecca Serra Mari¹, Feyza Yilmaz, Xuefang Zhao⁶, Xuefang Zhao⁷, PingHsun Hsieh², Joyce V. Lee, Sushant Kumar⁸, Jiadong Lin⁹, Tobias Rausch³, Yu Chen¹⁰, Jingwen Ren¹¹, Martin Santamarina¹², Wolfram Höps³, Hufsah Ashraf¹, Nelson T. Chuang¹³, Xiaofei Yang⁹, Katherine M. Munson², Alexandra P. Lewis², Susan Fairley³, Luke J. Tallon¹³, Wayne E. Clarke, Anna O. Basile, Marta Byrska-Bishop, André Corvelo, Uday S. Evani, Tsung Yu Lu¹¹, Mark Chaisson¹¹, Junjie Chen¹⁴, Chong Li¹⁴, Harrison Brand⁷, Harrison Brand⁶, Aaron M. Wenger¹⁵, Maryam Ghareghani¹⁶, Maryam Ghareghani¹⁷, Maryam Ghareghani¹, William T. Harvey², Benjamin Raeder³, Patrick Hasenfeld³, Allison A. Regier¹⁸, Haley J. Abel¹⁸, Ira M. Hall⁸, Paul Flicek³, Oliver Stegle³, Oliver Stegle⁴, Mark Gerstein⁸, Jose M. C. Tubio¹², Zepeng Mu¹⁹, Yang I. Li¹⁹, Xinghua Shi¹⁴, Alex Hastie, Kai Ye⁵, Kai Ye⁹, Zechen Chong¹⁰, Ashley D. Sanders³, Michael C. Zody, Michael E. Talkowski⁶, Michael E. Talkowski⁷, Ryan E. Mills⁵, Scott E. Devine¹³, Charles Lee⁹, Charles Lee²⁰, Jan O. Korbel³, Tobias Marschall¹, Evan E. Eichler² - Show less +70 more•Institutions (20)

University of Düsseldorf¹, University of Washington², European Bioinformatics Institute³, German Cancer Research Center⁴, University of Michigan⁵, Broad Institute⁶, Harvard University⁷, Yale University⁸, Xi'an Jiaotong University⁹, University of Alabama at Birmingham¹⁰, University of Southern California¹¹, University of Santiago de Compostela¹², University of Maryland, Baltimore¹³, Temple University¹⁴, Pacific Biosciences¹⁵, Saarland University¹⁶, Max Planck Society¹⁷, Washington University in St. Louis¹⁸, University of Chicago¹⁹, Ewha Womans University²⁰

02 Apr 2021-Science

TL;DR: In this article, the authors present 64 assembled haplotypes from 32 diverse human genomes, which integrate all forms of genetic variation, even across complex loci, and identify 107,590 structural variants (SVs), of which 68% were not discovered with short-read sequencing.

...read moreread less

Abstract: Long-read and strand-specific sequencing technologies together facilitate the de novo assembly of high-quality haplotype-resolved human genomes without parent-child trio data. We present 64 assembled haplotypes from 32 diverse human genomes. These highly contiguous haplotype assemblies (average minimum contig length needed to cover 50% of the genome: 26 million base pairs) integrate all forms of genetic variation, even across complex loci. We identified 107,590 structural variants (SVs), of which 68% were not discovered with short-read sequencing, and 278 SV hotspots (spanning megabases of gene-rich sequence). We characterized 130 of the most active mobile element source elements and found that 63% of all SVs arise through homology-mediated mechanisms. This resource enables reliable graph-based genotyping from short reads of up to 50,340 SVs, resulting in the identification of 1526 expression quantitative trait loci as well as SV candidates for adaptive selection within the human population.

...read moreread less

289 citations

Journal Article•DOI•

Mapping and phasing of structural variation in patient genomes using nanopore sequencing

[...]

Mircea Cretu Stancu¹, Markus J. van Roosmalen¹, Ivo Renkens¹, Marleen M. Nieboer¹, Sjors Middelkamp¹, Joep de Ligt¹, Giulia Pregno², Daniela Giachino², Giorgia Mandrile², Jose Espejo Valle-Inclan¹, Jerome Korzelius¹, Ewart de Bruijn¹, Edwin Cuppen¹, Michael E. Talkowski³, Michael E. Talkowski⁴, Tobias Marschall⁵, Tobias Marschall⁶, Jeroen de Ridder¹, Wigard P. Kloosterman¹ - Show less +15 more•Institutions (6)

Utrecht University¹, University of Turin², Broad Institute³, Harvard University⁴, Max Planck Society⁵, Saarland University⁶

06 Nov 2017-Nature Communications

TL;DR: It is demonstrated that nanopore long reads are superior to short reads with regard to detection of de novo chromothripsis rearrangements, and the value of long-read sequencing in mapping and phasing of SVs for both clinical and research applications is demonstrated.

...read moreread less

Abstract: Despite improvements in genomics technology, the detection of structural variants (SVs) from short-read sequencing still poses challenges, particularly for complex variation. Here we analyse the genomes of two patients with congenital abnormalities using the MinION nanopore sequencer and a novel computational pipeline-NanoSV. We demonstrate that nanopore long reads are superior to short reads with regard to detection of de novo chromothripsis rearrangements. The long reads also enable efficient phasing of genetic variations, which we leveraged to determine the parental origin of all de novo chromothripsis breakpoints and to resolve the structure of these complex rearrangements. Additionally, genome-wide surveillance of inherited SVs reveals novel variants, missed in short-read data sets, a large proportion of which are retrotransposon insertions. We provide a first exploration of patient genome sequencing with a nanopore sequencer and demonstrate the value of long-read sequencing in mapping and phasing of SVs for both clinical and research applications.

...read moreread less

289 citations

Journal Article•DOI•

Deep sequencing of 10,000 human genomes.

[...]

Amalio Telenti¹, Levi C. T. Pierce, William H. Biggs, Julia di Iulio¹, Emily H. M. Wong, Martin M. Fabani, Ewen F. Kirkness, Ahmed A. Moustafa, Naisha Shah, Chao Xie, Suzanne Brewerton, Nadeem Bulsara, Chad Garner, Gary Metzker, Efren Sandoval, Brad A. Perkins, Franz Josef Och, Yaron Turpaz, J. Craig Venter¹ - Show less +15 more•Institutions (1)

J. Craig Venter Institute¹

18 Oct 2016-Proceedings of the National Academy of Sciences of the United States of America

TL;DR: This work reports on the sequencing of 10,545 human genomes at 30×–40× coverage with an emphasis on quality metrics and novel variant and sequence discovery and concludes that high-coverage genome sequencing provides accurate detail on human variation for discovery and clinical applications.

...read moreread less

Abstract: We report on the sequencing of 10,545 human genomes at 30×-40× coverage with an emphasis on quality metrics and novel variant and sequence discovery. We find that 84% of an individual human genome can be sequenced confidently. This high-confidence region includes 91.5% of exon sequence and 95.2% of known pathogenic variant positions. We present the distribution of over 150 million single-nucleotide variants in the coding and noncoding genome. Each newly sequenced genome contributes an average of 8,579 novel variants. In addition, each genome carries on average 0.7 Mb of sequence that is not found in the main build of the hg38 reference genome. The density of this catalog of variation allowed us to construct high-resolution profiles that define genomic sites that are highly intolerant of genetic variation. These results indicate that the data generated by deep genome sequencing is of the quality necessary for clinical use.

...read moreread less

288 citations

Journal Article•DOI•

The genomic history of the Iberian Peninsula over the past 8000 years.

[...]

Iñigo Olalde¹, Swapan Mallick², Swapan Mallick¹, Swapan Mallick³, Nick Patterson³, Nadin Rohland¹, Vanessa Villalba-Mouco⁴, Vanessa Villalba-Mouco⁵, Marina Silva⁶, Katharina Dulias⁶, Ceiridwen J. Edwards⁶, Francesca Gandini⁶, Maria Pala⁶, Pedro Soares⁷, Manuel Ferrando-Bernal⁸, Nicole Adamski², Nicole Adamski¹, Nasreen Broomandkhoshbacht², Nasreen Broomandkhoshbacht¹, Olivia Cheronet⁹, Brendan J. Culleton¹⁰, Daniel Fernandes⁹, Daniel Fernandes¹¹, Ann Marie Lawson¹, Ann Marie Lawson², Matthew Mah³, Matthew Mah¹, Matthew Mah², Jonas Oppenheimer², Jonas Oppenheimer¹, Kristin Stewardson², Kristin Stewardson¹, Zhao Zhang¹, Juan Manuel Jiménez Arenas¹², Juan Manuel Jiménez Arenas¹³, Isidro Jorge Toro Moyano, Domingo C. Salazar-García¹⁴, Pere Castanyer, Marta Santos, Joaquim Tremoleda, Marina Lozano¹⁵, Pablo García Borja¹⁶, Javier Fernández-Eraso¹⁴, José Antonio Mujika-Alustiza¹⁴, Cecilio Barroso, Francisco J. Bermúdez, Enrique Viguera Mínguez¹⁷, Josep Burch, Neus Coromina, David Vivó, Artur Cebrià¹⁸, Josep Maria Fullola¹⁸, Oreto García-Puchol¹⁹, Juan Ignacio Morales¹⁸, F. Xavier Oms¹⁸, Tona Majó²⁰, Josep Maria Vergès¹⁵, Antonia Díaz-Carvajal¹⁸, Imma Ollich-Castanyer¹⁸, F. Javier López-Cachero¹⁸, Ana Maria Silva²¹, Ana Maria Silva¹¹, Carmen Alonso-Fernández, Germán Delibes de Castro²², Javier Jiménez Echevarría, Adolfo Moreno-Márquez²³, Adolfo Moreno-Márquez²⁴, Guillermo Pascual Berlanga¹³, Pablo Ramos-García¹³, José Ramos-Muñoz²³, Eduardo Vijande Vila²³, Gustau Aguilella Arzo, Ángel Esparza Arroyo²⁵, Katina T. Lillios²⁶, Jennifer E. Mack²⁶, Javier Velasco-Vázquez²⁷, Anna J. Waterman²⁸, Luis Benítez de Lugo Enrich¹⁶, Luis Benítez de Lugo Enrich²⁹, María Benito Sánchez³⁰, Bibiana Agustí, Ferran Codina, Gabriel de Prado, Almudena Estalrrich³¹, Álvaro Fernández Flores, Clive Finlayson, Geraldine Finlayson³², Geraldine Finlayson³³, Stewart Finlayson³³, Stewart Finlayson³⁴, Francisco Giles-Guzmán³³, Antonio Rosas³⁵, Virginia Barciela González²², Gabriel García Atiénzar²², Mauro S. Hernández Pérez²², Armando Llanos, Yolanda Carrión Marco¹⁹, Isabel Collado Beneyto, David López-Serrano, Mario Sanz Tormo³⁶, António Carlos Valera, Concepción Blasco²⁹, Corina Liesau²⁹, Patricia Ríos²⁹, Joan Daura¹⁸, María Jesús de Pedro Michó, Agustín Diez Castillo¹⁹, Raúl Flores Fernández³⁷, Raúl Flores Fernández³⁸, Joan Francès Farré, Rafael Garrido-Pena²⁹, Victor S. Gonçalves²¹, Elisa Guerra-Doce²², Ana Mercedes Herrero-Corral³⁰, Joaquim Juan-Cabanilles, Daniel López-Reyes, Sarah B. McClure³⁶, Marta Pérez¹⁸, Arturo Oliver Foix, Montserrat Sanz Borràs¹⁸, Ana Catarina Sousa²¹, Julio Manuel Vidal Encinas, Douglas J. Kennett¹⁰, Douglas J. Kennett³⁶, Martin B. Richards⁶, Kurt W. Alt³⁷, Kurt W. Alt³⁸, Wolfgang Haak³⁹, Wolfgang Haak⁵, Ron Pinhasi⁹, Carles Lalueza-Fox⁸, David Reich¹, David Reich³, David Reich² - Show less +130 more•Institutions (39)

Harvard University¹, Howard Hughes Medical Institute², Broad Institute³, University of Zaragoza⁴, Max Planck Society⁵, University of Huddersfield⁶, University of Minho⁷, Pompeu Fabra University⁸, University of Vienna⁹, Pennsylvania State University¹⁰, University of Coimbra¹¹, University of Zurich¹², University of Granada¹³, University of the Basque Country¹⁴, Rovira i Virgili University¹⁵, National University of Distance Education¹⁶, University of Málaga¹⁷, University of Barcelona¹⁸, University of Valencia¹⁹, Autonomous University of Barcelona²⁰, University of Lisbon²¹, Facultad de Filosofía y Letras²², University of Cádiz²³, University of Almería²⁴, University of Salamanca²⁵, University of Iowa²⁶, University of Las Palmas de Gran Canaria²⁷, Mount Mercy University²⁸, Autonomous University of Madrid²⁹, Complutense University of Madrid³⁰, University of Cantabria³¹, Liverpool John Moores University³², Gibraltar Hardware³³, Anglia Ruskin University³⁴, Spanish National Research Council³⁵, University of California, Santa Barbara³⁶, Danube Private University³⁷, University of Basel³⁸, University of Adelaide³⁹

15 Mar 2019-Science

TL;DR: It is revealed that present-day Basques are best described as a typical Iron Age population without the admixture events that later affected the rest of Iberia, and how the ancestry of the peninsula was transformed by gene flow from North Africa and the eastern Mediterranean is document.

...read moreread less

Abstract: J.M.F., F.J.L.-C., J.I.M., F.X.O., J.D., and M.S.B. were supported by HAR2017-86509-P, HAR2017-87695-P, and SGR2017-11 from the Generalitat de Catalunya, AGAUR agency. C.L.-F. was supported by Obra Social La Caixa and by FEDER-MINECO (BFU2015- 64699-P). L.B.d.L.E. was supported by REDISCO-HAR2017-88035-P (Plan Nacional I+D+I, MINECO). C.L., P.R., and C.Bl. were supported by MINECO (HAR2016-77600-P). A.Esp., J.V.-V., G.D., and D.C.S.-G. were supported by MINECO (HAR2009-10105 and HAR2013-43851-P). D.J.K. and B.J.C. were supported by NSF BCS-1460367. K.T.L., A.W., and J.M. were supported by NSF BCS-1153568. J.F.-E. and J.A.M.-A. were supported by IT622-13 Gobierno Vasco, Diputacion Foral de Alava, and Diputacion Foral de Gipuzkoa. We acknowledge support from the Portuguese Foundation for Science and Technology (PTDC/EPH-ARQ/4164/2014) and the FEDER-COMPETE 2020 project 016899. P.S. was supported by the FCT Investigator Program (IF/01641/2013), FCT IP, and ERDF (COMPETE2020 – POCI). M.Si. and K.D. were supported by a Leverhulme Trust Doctoral Scholarship awarded to M.B.R. and M.P. D.R. was supported by an Allen Discovery Center grant from the Paul Allen Foundation, NIH grant GM100233, and the Howard Hughes Medical Institute. V.V.-M. and W.H. were supported by the Max Planck Society.

...read moreread less

287 citations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
…
37
38
39
40
41
42
43
…
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Basic Local Alignment Search Tool

[...]

Stephen F. Altschul¹, Warren Gish¹, Webb Miller², Eugene W. Myers³, David J. Lipman¹ - Show less +1 more•Institutions (3)

National Institutes of Health¹, Pennsylvania State University², University of Arizona³

01 Oct 1990-Journal of Molecular Biology

TL;DR: A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score.

...read moreread less

88,255 citations

Journal Article•DOI•

The Sequence Alignment/Map format and SAMtools

[...]

Heng Li¹, Bob Handsaker², Alec Wysoker², T. J. Fennell², Jue Ruan³, Nils Homer², Gabor T. Marth⁴, Gonçalo R. Abecasis², Richard Durbin¹ - Show less +5 more•Institutions (4)

Wellcome Trust Sanger Institute¹, University of California, Los Angeles², Chinese Academy of Sciences³, Boston College⁴

01 Aug 2009-Bioinformatics

TL;DR: SAMtools as discussed by the authors implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments.

...read moreread less

Abstract: Summary: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments. Availability: http://samtools.sourceforge.net Contact: [email protected]

...read moreread less

45,957 citations

Journal Article•DOI•

BEDTools: a flexible suite of utilities for comparing genomic features

[...]

Aaron R. Quinlan¹, Ira M. Hall¹•Institutions (1)

University of Virginia¹

15 Mar 2010-Bioinformatics

TL;DR: A new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format, which allows the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks.

...read moreread less

Abstract: Motivation: Testing for correlations between different sets of genomic features is a fundamental task in genomics research. However, searching for overlaps between features with existing webbased methods is complicated by the massive datasets that are routinely produced with current sequencing technologies. Fast and flexible tools are therefore required to ask complex questions of these data in an efficient manner. Results: This article introduces a new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format. BEDTools also supports the comparison of sequence alignments in BAM format to both BED and GFF features. The tools are extremely efficient and allow the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks. BEDTools can be combined with one another as well as with standard UNIX commands, thus facilitating routine genomics tasks as well as pipelines that can quickly answer intricate questions of large genomic datasets. Availability and implementation: BEDTools was written in C++. Source code and a comprehensive user manual are freely available at http://code.google.com/p/bedtools

...read moreread less

18,858 citations

Journal Article•DOI•

An integrated encyclopedia of DNA elements in the human genome

[...]

Principal investigators¹, Nhgri groups², Data production leads³, Lead analysts³•Institutions (3)

Wellcome Trust¹, University of Washington², Pennsylvania State University³

06 Sep 2012-Nature

TL;DR: The Encyclopedia of DNA Elements project provides new insights into the organization and regulation of the authors' genes and genome, and is an expansive resource of functional annotations for biomedical research.

...read moreread less

Abstract: The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.

...read moreread less

13,548 citations

Journal Article•DOI•

The variant call format and VCFtools

[...]

Petr Danecek¹, Adam Auton², Gonçalo R. Abecasis³, Cornelis A. Albers¹, Eric Banks⁴, Mark A. DePristo⁴, Robert E. Handsaker⁴, Gerton Lunter², Gabor T. Marth⁵, Stephen T. Sherry⁶, Gilean McVean², Richard Durbin¹ - Show less +8 more•Institutions (6)

Wellcome Trust¹, University of Oxford², University of Michigan³, Broad Institute⁴, Boston College⁵, National Institutes of Health⁶

01 Aug 2011-Bioinformatics

TL;DR: VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API.

...read moreread less

Abstract: Summary: The variant call format (VCF) is a generic format for storing DNA polymorphism data such as SNPs, insertions, deletions and structural variants, together with rich annotations. VCF is usually stored in a compressed manner and can be indexed for fast data retrieval of variants from a range of positions on the reference genome. The format was developed for the 1000 Genomes Project, and has also been adopted by other projects such as UK10K, dbSNP and the NHLBI Exome Project. VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API. Availability: http://vcftools.sourceforge.net Contact: [email protected]

...read moreread less

10,164 citations