BCFtools/csq: haplotype-aware variant consequences.

doi:10.1093/BIOINFORMATICS/BTX100

Home
/
Papers
/
BCFtools/csq: haplotype-aware variant consequences.

Journal Article•DOI•

BCFtools/csq: haplotype-aware variant consequences.

Petr Danecek¹, Shane A. McCarthy¹•Institutions (1)

Wellcome Trust Sanger Institute¹

01 Jul 2017-Bioinformatics (Oxford University Press)-Vol. 33, Iss: 13, pp 2037-2039

TL;DR: BCFtools/csq is a fast program for haplotype‐aware consequence calling which can take into account known phase, and Predictions match existing tools when run in localized mode, but the program is an order of magnitude faster and requires an orders of magnitude less memory.

read less

Abstract: Motivation Prediction of functional variant consequences is an important part of sequencing pipelines, allowing the categorization and prioritization of genetic variants for follow up analysis. However, current predictors analyze variants as isolated events, which can lead to incorrect predictions when adjacent variants alter the same codon, or when a frame-shifting indel is followed by a frame-restoring indel. Exploiting known haplotype information when making consequence predictions can resolve these issues. Results BCFtools/csq is a fast program for haplotype-aware consequence calling which can take into account known phase. Consequence predictions are changed for 501 of 5019 compound variants found in the 81.7M variants in the 1000 Genomes Project data, with an average of 139 compound variants per haplotype. Predictions match existing tools when run in localized mode, but the program is an order of magnitude faster and requires an order of magnitude less memory. Availability and implementation The program is freely available for commercial and non-commercial use in the BCFtools package which is available for download from http://samtools.github.io/bcftools . Contact pd3@sanger.ac.uk. Supplementary information Supplementary data are available at Bioinformatics online.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Twelve years of SAMtools and BCFtools.

[...]

Petr Danecek¹, James K. Bonfield¹, Jennifer Liddle¹, John Marshall², Valeriu Ohan¹, Martin O. Pollard¹, Andrew Whitwham¹, Thomas M. Keane³, Shane A. McCarthy¹, Robert L. Davies¹, Heng Li⁴ - Show less +7 more•Institutions (4)

Wellcome Trust Sanger Institute¹, University of Glasgow², European Bioinformatics Institute³, Harvard University⁴

01 Feb 2021-GigaScience

TL;DR: The SAMtools and BCFtools packages represent a unique collection of tools that have been used in numerous other software projects and countless genomic pipelines and are freely available on GitHub under the permissive MIT licence, free for both noncommercial and commercial use.

...read moreread less

Abstract: Background: SAMtools and BCFtools are widely used programs for processing and analysing high-throughput sequencing data. They include tools for file format conversion and manipulation, sorting, querying, statistics, variant calling, and effect analysis amongst other methods. Findings: The first version appeared online 12 years ago and has been maintained and further developed ever since, with many new features and improvements added over the years. The SAMtools and BCFtools packages represent a unique collection of tools that have been used in numerous other software projects and countless genomic pipelines. Conclusion: Both SAMtools and BCFtools are freely available on GitHub under the permissive MIT licence, free for both non-commercial and commercial use. Both packages have been installed >1 million times via Bioconda. The source code and documentation are available from https://www.htslib.org.

...read moreread less

2,448 citations

Journal Article•DOI•

Genomic Patterns of De Novo Mutation in Simplex Autism

[...]

Tychele N. Turner¹, Bradley P. Coe¹, Diane E. Dickel², Kendra Hoekzema¹, Bradley J. Nelson¹, Michael C. Zody, Zev N. Kronenberg¹, Fereydoun Hormozdiari³, Archana Raja¹, Len A. Pennacchio⁴, Robert B. Darnell⁵, Evan E. Eichler¹ - Show less +8 more•Institutions (5)

University of Washington¹, Lawrence Berkeley National Laboratory², University of California, Davis³, United States Department of Energy⁴, Howard Hughes Medical Institute⁵

19 Oct 2017-Cell

TL;DR: Patients are more likely to carry multiple coding and noncoding DNMs in different genes, which are enriched for expression in striatal neurons, suggesting a path forward for genetically characterizing more complex cases of autism.

...read moreread less

283 citations

Additional excerpts

...…version 1.0.1 https://github.com/ekg/freebayes https://github.com/ekg/freebayes BCFtools version 1.3.1 Danecek and McCarthy, 2017 https://samtools.github.io/bcftools/ bcftools.html mrsFAST-ultra 3.3.8 Hach et al., 2010…...
[...]

Journal Article•DOI•

Exome sequencing of 457 autism families recruited online provides evidence for autism risk genes.

[...]

Pamela Feliciano, Xueya Zhou¹, Irina Astrovskaya, Tychele N. Turner², Tianyun Wang², Leo Brueggeman³, Rebecca A. Barnard⁴, Alexander Hsieh¹, LeeAnne Green Snyder, Donna M. Muzny⁵, Aniko Sabo⁵, Richard A. Gibbs⁵, Evan E. Eichler², Brian J. O'Roak⁴, Jacob J. Michaelson³, Natalia Volfovsky, Yufeng Shen¹, Wendy K. Chung⁶ - Show less +14 more•Institutions (6)

Columbia University¹, University of Washington², Roy J. and Lucille A. Carver College of Medicine³, Oregon Health & Science University⁴, Baylor College of Medicine⁵, Columbia University Medical Center⁶

23 Aug 2019-npj Genomic Medicine

TL;DR: A pilot study for SPARK identified variants in genes and loci that are clinically recognized causes or significant contributors to ASD in 10.4% of families without previous genetic findings, and BRSK2 has the strongest statistical support and reaches genome-wide significance as a risk gene for ASD.

...read moreread less

Abstract: Autism spectrum disorder (ASD) is a genetically heterogeneous condition, caused by a combination of rare de novo and inherited variants as well as common variants in at least several hundred genes. However, significantly larger sample sizes are needed to identify the complete set of genetic risk factors. We conducted a pilot study for SPARK (SPARKForAutism.org) of 457 families with ASD, all consented online. Whole exome sequencing (WES) and genotyping data were generated for each family using DNA from saliva. We identified variants in genes and loci that are clinically recognized causes or significant contributors to ASD in 10.4% of families without previous genetic findings. In addition, we identified variants that are possibly associated with ASD in an additional 3.4% of families. A meta-analysis using the TADA framework at a false discovery rate (FDR) of 0.1 provides statistical support for 26 ASD risk genes. While most of these genes are already known ASD risk genes, BRSK2 has the strongest statistical support and reaches genome-wide significance as a risk gene for ASD (p-value = 2.3e−06). Future studies leveraging the thousands of individuals with ASD who have enrolled in SPARK are likely to further clarify the genetic risk factors associated with ASD as well as allow accelerate ASD research that incorporates genetic etiology.

...read moreread less

154 citations

Journal Article•DOI•

A chromosome-level genome assembly of Cydia pomonella provides insights into chemical ecology and insecticide resistance

[...]

Fanghao Wan, Chuanlin Yin¹, Rui Tang², Rui Tang³, Maohua Chen⁴, Qiang Wu, Cong Huang⁵, Wanqiang Qian, Omar Rota-Stabelli, Nianwan Yang, Wang Shuping, Guirong Wang, Guifen Zhang, Jianyang Guo, Liuqi Aloy Gu⁶, Longfei Chen¹, Longsheng Xing, Yu Xi, Feiling Liu¹, Kejian Lin, Mengbo Guo, Wei Liu, Kang He¹, Ruizheng Tian⁴, Emmanuelle Jacquin-Joly⁷, Pierre Franck⁷, Myriam Siegwart⁷, Lino Ometto⁸, Gianfranco Anfora⁹, Mark Blaxter¹⁰, Camille Meslin⁷, Petr Nguyen¹¹, Petr Nguyen¹², Martina Dalíková¹¹, Martina Dalíková¹², František Marec¹¹, Jérôme Olivares⁷, Sandrine Maugin⁷, Jianru Shen, Jinding Liu¹³, Jinmeng Guo¹³, Jiapeng Luo¹, Bo Liu, Wei Fan, Likai Feng, Xianxin Zhao¹, Xiong Peng⁴, Kang Wang⁴, Lang Liu⁴, Hai-Xia Zhan³, Wanxue Liu, Guoliang Shi¹⁴, Chunyan Jiang¹⁴, Jisu Jin⁵, Xiaoqing Xian, Sha Lu¹⁴, Mingli Ye, Meizhen Li¹, Minglu Yang¹⁵, Renci Xiong¹⁵, James R. Walters⁶, Fei Li¹ - Show less +58 more•Institutions (15)

Zhejiang University¹, Chinese Academy of Sciences², CABI³, Northwest A&F University⁴, Hunan Agricultural University⁵, University of Kansas⁶, Institut national de la recherche agronomique⁷, University of Pavia⁸, University of Trento⁹, University of Edinburgh¹⁰, Academy of Sciences of the Czech Republic¹¹, Sewanee: The University of the South¹², Nanjing Agricultural University¹³, Qingdao Agricultural University¹⁴, Xinjiang Production and Construction Corps¹⁵

17 Sep 2019-Nature Communications

TL;DR: The high-quality genome assembly of C. pomonella informs the genetic basis of its invasiveness, suggesting the codling moth has distinctive capabilities and adaptive potential that may explain its worldwide expansion.

...read moreread less

Abstract: The codling moth Cydia pomonella, a major invasive pest of pome fruit, has spread around the globe in the last half century. We generated a chromosome-level scaffold assembly including the Z chromosome and a portion of the W chromosome. This assembly reveals the duplication of an olfactory receptor gene (OR3), which we demonstrate enhances the ability of C. pomonella to exploit kairomones and pheromones in locating both host plants and mates. Genome-wide association studies contrasting insecticide-resistant and susceptible strains identify hundreds of single nucleotide polymorphisms (SNPs) potentially associated with insecticide resistance, including three SNPs found in the promoter of CYP6B2. RNAi knockdown of CYP6B2 increases C. pomonella sensitivity to two insecticides, deltamethrin and azinphos methyl. The high-quality genome assembly of C. pomonella informs the genetic basis of its invasiveness, suggesting the codling moth has distinctive capabilities and adaptive potential that may explain its worldwide expansion.

...read moreread less

87 citations

Journal Article•DOI•

Landscape of multi-nucleotide variants in 125,748 human exomes and 15,708 genomes

[...]

Qingbo Wang¹, Qingbo Wang², Emma Pierce-Hoffman¹, Beryl B. Cummings², Beryl B. Cummings¹, Jessica Alföldi², Jessica Alföldi¹, Laurent C. Francioli¹, Laurent C. Francioli², Laura D. Gauthier¹, Andrew J. Hill³, Andrew J. Hill¹, Anne H. O’Donnell-Luria¹, Anne H. O’Donnell-Luria², Genome Aggregation Database Production Team¹, Genome Aggregation Database Production Team², Konrad J. Karczewski², Konrad J. Karczewski¹, Daniel G. MacArthur - Show less +15 more•Institutions (3)

Broad Institute¹, Harvard University², University of Washington³

27 May 2020-Nature Communications

TL;DR: The gnomAD dataset is used to assemble a catalogue of MNVs and the relative impact of known mutational mechanisms - CpG deamination, replication error by polymerase zeta, and polymerase slippage at repeat junctions - are estimated.

...read moreread less

Abstract: Multi-nucleotide variants (MNVs), defined as two or more nearby variants existing on the same haplotype in an individual, are a clinically and biologically important class of genetic variation. However, existing tools typically do not accurately classify MNVs, and understanding of their mutational origins remains limited. Here, we systematically survey MNVs in 125,748 whole exomes and 15,708 whole genomes from the Genome Aggregation Database (gnomAD). We identify 1,792,248 MNVs across the genome with constituent variants falling within 2 bp distance of one another, including 18,756 variants with a novel combined effect on protein sequence. Finally, we estimate the relative impact of known mutational mechanisms - CpG deamination, replication error by polymerase zeta, and polymerase slippage at repeat junctions - on the generation of MNVs. Our results demonstrate the value of haplotype-aware variant annotation, and refine our understanding of genome-wide mutational mechanisms of MNVs. Multi-nucleotide variants (MNV) are genetic variants in close proximity of each other on the same haplotype whose functional impact is difficult to predict if they reside in the same codon. Here, Wang et al. use the gnomAD dataset to assemble a catalogue of MNVs and estimate their global mutation rate.

...read moreread less

84 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data

[...]

Kai Wang¹, Mingyao Li¹, Hakon Hakonarson¹•Institutions (1)

Children's Hospital of Philadelphia¹

01 Sep 2010-Nucleic Acids Research

TL;DR: The ANNOVAR tool to annotate single nucleotide variants and insertions/deletions, such as examining their functional consequence on genes, inferring cytogenetic bands, reporting functional importance scores, finding variants in conserved regions, or identifying variants reported in the 1000 Genomes Project and dbSNP is developed.

...read moreread less

Abstract: High-throughput sequencing platforms are generating massive amounts of genetic variation data for diverse genomes, but it remains a challenge to pinpoint a small subset of functionally important variants. To fill these unmet needs, we developed the ANNOVAR tool to annotate single nucleotide variants (SNVs) and insertions/deletions, such as examining their functional consequence on genes, inferring cytogenetic bands, reporting functional importance scores, finding variants in conserved regions, or identifying variants reported in the 1000 Genomes Project and dbSNP. ANNOVAR can utilize annotation databases from the UCSC Genome Browser or any annotation data set conforming to Generic Feature Format version 3 (GFF3). We also illustrate a 'variants reduction' protocol on 4.7 million SNVs and indels from a human genome, including two causal mutations for Miller syndrome, a rare recessive disease. Through a stepwise procedure, we excluded variants that are unlikely to be causal, and identified 20 candidate genes including the causal gene. Using a desktop computer, ANNOVAR requires ∼4 min to perform gene-based annotation and ∼15 min to perform variants reduction on 4.7 million variants, making it practical to handle hundreds of human genomes in a day. ANNOVAR is freely available at http://www.openbioinformatics.org/annovar/.

...read moreread less

10,461 citations

Journal Article•DOI•

Analysis of protein-coding genetic variation in 60,706 humans

[...]

Monkol Lek, Konrad J. Karczewski¹, Konrad J. Karczewski², Eric Vallabh Minikel¹, Eric Vallabh Minikel², Kaitlin E. Samocha, Eric Banks², Timothy Fennell², Anne H. O’Donnell-Luria³, Anne H. O’Donnell-Luria¹, Anne H. O’Donnell-Luria², James S. Ware, Andrew J. Hill⁴, Andrew J. Hill¹, Andrew J. Hill², Beryl B. Cummings¹, Beryl B. Cummings², Taru Tukiainen², Taru Tukiainen¹, Daniel P. Birnbaum², Jack A. Kosmicki, Laramie E. Duncan², Laramie E. Duncan¹, Karol Estrada², Karol Estrada¹, Fengmei Zhao¹, Fengmei Zhao², James Zou², Emma Pierce-Hoffman¹, Emma Pierce-Hoffman², Joanne Berghout⁵, David Neil Cooper⁶, Nicole A. Deflaux⁷, Mark A. DePristo², Ron Do, Jason Flannick¹, Jason Flannick², Menachem Fromer, Laura D. Gauthier², Jackie Goldstein¹, Jackie Goldstein², Namrata Gupta², Daniel P. Howrigan², Daniel P. Howrigan¹, Adam Kiezun², Mitja I. Kurki², Mitja I. Kurki¹, Ami Levy Moonshine², Pradeep Natarajan, Lorena Orozco, Gina M. Peloso¹, Gina M. Peloso², Ryan Poplin², Manuel A. Rivas², Valentin Ruano-Rubio², Samuel A. Rose², Douglas M. Ruderfer⁸, Khalid Shakir², Peter D. Stenson⁶, Christine Stevens², Brett Thomas¹, Brett Thomas², Grace Tiao², María Teresa Tusié-Luna, Ben Weisburd², Hong-Hee Won⁹, Dongmei Yu, David Altshuler¹⁰, David Altshuler², Diego Ardissino, Michael Boehnke¹¹, John Danesh¹², Stacey Donnelly², Roberto Elosua, Jose C. Florez¹, Jose C. Florez², Stacey Gabriel², Gad Getz¹, Gad Getz², Stephen J. Glatt¹³, Christina M. Hultman¹⁴, Sekar Kathiresan, Markku Laakso¹⁵, Steven A. McCarroll², Steven A. McCarroll¹, Mark I. McCarthy¹⁶, Mark I. McCarthy¹⁷, Dermot P.B. McGovern¹⁸, Ruth McPherson¹⁹, Benjamin M. Neale¹, Benjamin M. Neale², Aarno Palotie, Shaun Purcell⁸, Danish Saleheen²⁰, Jeremiah M. Scharf, Pamela Sklar, Patrick F. Sullivan¹⁴, Patrick F. Sullivan²¹, Jaakko Tuomilehto²², Ming T. Tsuang²³, Hugh Watkins¹⁷, Hugh Watkins¹⁶, James G. Wilson²⁴, Mark J. Daly², Mark J. Daly¹, Daniel G. MacArthur¹, Daniel G. MacArthur² - Show less +103 more•Institutions (24)

Harvard University¹, Broad Institute², Boston Children's Hospital³, University of Washington⁴, University of Arizona⁵, Cardiff University⁶, Google⁷, Icahn School of Medicine at Mount Sinai⁸, Samsung Medical Center⁹, Vertex Pharmaceuticals¹⁰, University of Michigan¹¹, University of Cambridge¹², State University of New York Upstate Medical University¹³, Karolinska Institutet¹⁴, University of Eastern Finland¹⁵, Wellcome Trust Centre for Human Genetics¹⁶, University of Oxford¹⁷, Cedars-Sinai Medical Center¹⁸, University of Ottawa¹⁹, University of Pennsylvania²⁰, University of North Carolina at Chapel Hill²¹, University of Helsinki²², University of California, San Diego²³, University of Mississippi Medical Center²⁴

18 Aug 2016-Nature

TL;DR: The aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries generated as part of the Exome Aggregation Consortium (ExAC) provides direct evidence for the presence of widespread mutational recurrence.

...read moreread less

Abstract: Large-scale reference data sets of human genetic variation are critical for the medical and functional interpretation of DNA sequence changes. Here we describe the aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries generated as part of the Exome Aggregation Consortium (ExAC). This catalogue of human genetic diversity contains an average of one variant every eight bases of the exome, and provides direct evidence for the presence of widespread mutational recurrence. We have used this catalogue to calculate objective metrics of pathogenicity for sequence variants, and to identify genes subject to strong selection against various classes of mutation; identifying 3,230 genes with near-complete depletion of predicted protein-truncating variants, with 72% of these genes having no currently established human disease phenotype. Finally, we demonstrate that these data can be used for the efficient filtering of candidate disease-causing variants, and for the discovery of human 'knockout' variants in protein-coding genes.

...read moreread less

8,758 citations

Journal Article•DOI•

A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3

[...]

Pablo Cingolani¹, Adrian E. Platts², Le Lily Wang¹, M. Coon¹, Tung T. Nguyen¹, Luan Wang¹, Susan Land¹, Xiangyi Lu¹, Douglas M. Ruden¹ - Show less +5 more•Institutions (2)

Wayne State University¹, McGill University²

01 Apr 2012-Fly

TL;DR: It appears that the 5′ and 3′ UTRs are reservoirs for genetic variations that changes the termini of proteins during evolution of the Drosophila genus.

...read moreread less

Abstract: We describe a new computer program, SnpEff, for rapidly categorizing the effects of variants in genome sequences. Once a genome is sequenced, SnpEff annotates variants based on their genomic locations and predicts coding effects. Annotated genomic locations include intronic, untranslated region, upstream, downstream, splice site, or intergenic regions. Coding effects such as synonymous or non-synonymous amino acid replacement, start codon gains or losses, stop codon gains or losses, or frame shifts can be predicted. Here the use of SnpEff is illustrated by annotating ~356,660 candidate SNPs in ~117 Mb unique sequences, representing a substitution rate of ~1/305 nucleotides, between the Drosophila melanogaster w1118; iso-2; iso-3 strain and the reference y1; cn1 bw1 sp1 strain. We show that ~15,842 SNPs are synonymous and ~4,467 SNPs are non-synonymous (N/S ~0.28). The remaining SNPs are in other categories, such as stop codon gains (38 SNPs), stop codon losses (8 SNPs), and start codon gains (297 SNPs) in...

...read moreread less

8,017 citations

Journal Article•DOI•

The Ensembl Variant Effect Predictor.

[...]

William M. McLaren¹, Laurent Gil¹, Sarah E. Hunt¹, Harpreet Singh Riat¹, Graham R. S. Ritchie¹, Anja Thormann¹, Paul Flicek¹, Fiona Cunningham¹ - Show less +4 more•Institutions (1)

European Bioinformatics Institute¹

06 Jun 2016-Genome Biology

TL;DR: The Ensembl Variant Effect Predictor can simplify and accelerate variant interpretation in a wide range of study designs.

...read moreread less

Abstract: The Ensembl Variant Effect Predictor is a powerful toolset for the analysis, annotation, and prioritization of genomic variants in coding and non-coding regions. It provides access to an extensive collection of genomic annotation, with a variety of interfaces to suit different requirements, and simple options for configuring and extending analysis. It is open source, free to use, and supports full reproducibility of results. The Ensembl Variant Effect Predictor can simplify and accelerate variant interpretation in a wide range of study designs.

...read moreread less

4,658 citations

Journal Article•DOI•

A reference panel of 64,976 haplotypes for genotype imputation

[...]

Shane A. McCarthy¹, Sayantan Das², Warren W. Kretzschmar³, Olivier Delaneau⁴, Andrew R. Wood⁵, Alexander Teumer⁶, Hyun Min Kang², Christian Fuchsberger², Petr Danecek¹, Kevin Sharp³, Yang Luo¹, C Sidore⁷, Alan Kwong², Nicholas J. Timpson⁸, Seppo Koskinen, Scott I. Vrieze⁹, Laura J. Scott², He Zhang², Anubha Mahajan³, Jan H. Veldink, Ulrike Peters¹⁰, Ulrike Peters¹¹, Carlos N. Pato¹², Cornelia M. van Duijn¹³, Christopher E. Gillies², Ilaria Gandin¹⁴, Massimo Mezzavilla, Arthur Gilly¹, Massimiliano Cocca¹⁴, Michela Traglia, Andrea Angius⁷, Jeffrey C. Barrett¹, D.I. Boomsma¹⁵, Kari Branham², Gerome Breen¹⁶, Gerome Breen¹⁷, Chad M. Brummett², Fabio Busonero⁷, Harry Campbell¹⁸, Andrew T. Chan¹⁹, Sai Chen², Emily Y. Chew²⁰, Francis S. Collins²⁰, Laura J Corbin⁸, George Davey Smith⁸, George Dedoussis²¹, Marcus Dörr⁶, Aliki-Eleni Farmaki²¹, Luigi Ferrucci²⁰, Lukas Forer²², Ross M. Fraser², Stacey Gabriel²³, Shawn Levy, Leif Groop²⁴, Leif Groop²⁵, Tabitha A. Harrison¹¹, Andrew T. Hattersley⁵, Oddgeir L. Holmen²⁶, Kristian Hveem²⁶, Matthias Kretzler², James Lee²⁷, Matt McGue²⁸, Thomas Meitinger²⁹, David Melzer⁵, Josine L. Min⁸, Karen L. Mohlke³⁰, John B. Vincent³¹, Matthias Nauck⁶, Deborah A. Nickerson¹⁰, Aarno Palotie¹⁹, Aarno Palotie²³, Michele T. Pato¹², Nicola Pirastu¹⁴, Melvin G. McInnis², J. Brent Richards³², J. Brent Richards¹⁶, Cinzia Sala, Veikko Salomaa, David Schlessinger²⁰, Sebastian Schoenherr²², P. Eline Slagboom³³, Kerrin S. Small¹⁶, Tim D. Spector¹⁶, Dwight Stambolian³⁴, Marcus A. Tuke⁵, Jaakko Tuomilehto, Leonard H. van den Berg, Wouter van Rheenen, Uwe Völker⁶, Cisca Wijmenga³⁵, Daniela Toniolo, Eleftheria Zeggini¹, Paolo Gasparini¹⁴, Matthew G. Sampson², James F. Wilson¹⁸, Timothy M. Frayling⁵, Paul I.W. de Bakker³⁶, Morris A. Swertz³⁵, Steven A. McCarroll¹⁹, Charles Kooperberg¹¹, Annelot M. Dekker, David Altshuler, Cristen J. Willer², William G. Iacono²⁸, Samuli Ripatti²⁵, Nicole Soranzo²⁷, Nicole Soranzo¹, Klaudia Walter¹, Anand Swaroop²⁰, Francesco Cucca⁷, Carl A. Anderson¹, Richard M. Myers, Michael Boehnke², Mark I. McCarthy³, Mark I. McCarthy³⁷, Richard Durbin¹, Gonçalo R. Abecasis², Jonathan Marchini³ - Show less +114 more•Institutions (37)

Wellcome Trust Sanger Institute¹, University of Michigan², University of Oxford³, University of Geneva⁴, University of Exeter⁵, Greifswald University Hospital⁶, National Research Council⁷, University of Bristol⁸, University of Colorado Boulder⁹, University of Washington¹⁰, Fred Hutchinson Cancer Research Center¹¹, SUNY Downstate Medical Center¹², Erasmus University Rotterdam¹³, University of Trieste¹⁴, VU University Amsterdam¹⁵, King's College London¹⁶, South London and Maudsley NHS Foundation Trust¹⁷, University of Edinburgh¹⁸, Harvard University¹⁹, National Institutes of Health²⁰, Harokopio University²¹, Innsbruck Medical University²², Broad Institute²³, Lund University²⁴, University of Helsinki²⁵, Norwegian University of Science and Technology²⁶, University of Cambridge²⁷, University of Minnesota²⁸, Technische Universität München²⁹, University of North Carolina at Chapel Hill³⁰, University of Toronto³¹, McGill University³², Leiden University³³, University of Pennsylvania³⁴, University of Groningen³⁵, Utrecht University³⁶, Churchill Hospital³⁷

22 Aug 2016-Nature Genetics

TL;DR: A reference panel of 64,976 human haplotypes at 39,235,157 SNPs constructed using whole-genome sequence data from 20 studies of predominantly European ancestry leads to accurate genotype imputation at minor allele frequencies as low as 0.1% and a large increase in the number of SNPs tested in association studies.

...read moreread less

Abstract: We describe a reference panel of 64,976 human haplotypes at 39,235,157 SNPs constructed using whole-genome sequence data from 20 studies of predominantly European ancestry. Using this resource leads to accurate genotype imputation at minor allele frequencies as low as 0.1% and a large increase in the number of SNPs tested in association studies, and it can help to discover and refine causal loci. We describe remote server resources that allow researchers to carry out imputation and phasing consistently and efficiently.

...read moreread less

2,149 citations