scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A global reference for human genetic variation.

Adam Auton1, Gonçalo R. Abecasis2, David Altshuler3, Richard Durbin4  +514 moreInstitutions (90)
01 Oct 2015-Nature (Nature Publishing Group)-Vol. 526, Iss: 7571, pp 68-74
TL;DR: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations, and has reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-generation sequencing, deep exome sequencing, and dense microarray genotyping.
Abstract: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.
Citations
More filters
Journal ArticleDOI
TL;DR: Peddy facilitates both automated and interactive, visual detection of sample swaps, poor sequencing quality, and other indicators of sample problems that, if left undetected, would inhibit discovery.
Abstract: The potential for genetic discovery in human DNA sequencing studies is greatly diminished if DNA samples from a cohort are mislabeled, swapped, or contaminated or if they include unintended individuals. Unfortunately, the potential for such errors is significant since DNA samples are often manipulated by several protocols, labs, or scientists in the process of sequencing. We have developed a software package, peddy, to identify and facilitate the remediation of such errors via interactive visualizations and reports comparing the stated sex, relatedness, and ancestry to what is inferred from the individual genotypes derived from whole-genome (WGS) or whole-exome (WES) sequencing. Peddy predicts a sample's ancestry using a machine learning model trained on individuals of diverse ancestries from the 1000 Genomes Project reference panel. Peddy facilitates both automated and interactive, visual detection of sample swaps, poor sequencing quality, and other indicators of sample problems that, if left undetected, would inhibit discovery.

101 citations

Journal ArticleDOI
TL;DR: The NeuroChip has a more comprehensive and improved content, which makes it a reliable, high-throughput, cost-effective screening tool for genetic research and molecular diagnostics in neurodegenerative diseases.

101 citations


Cites methods from "A global reference for human geneti..."

  • ...%) and European ancestry in all individuals (based on 1000Genomes clustering) (Genomes Project et al., 2015), we performed imputation using the Michigan imputation server, according to established guidelines (https://imputationserver.sph.umich.edu) (Das et al., 2016)....

    [...]

Journal ArticleDOI
TL;DR: It is concluded that pharmacogenomic information for patient stratification is of value to tailor optimized treatment regimens particularly in oncology.
Abstract: Much of the inter-individual variability in drug efficacy and risk of adverse reactions is due to polymorphisms in genes encoding proteins involved in drug pharmacokinetics and pharmacodynamics or immunological responses. Pharmacogenetic research has identified a multitude of gene-drug response associations, which have resulted in genetically guided treatment and dosing decisions to yield a higher success rate of pharmacological treatment. The rapid technological developments for genetic analyses reveal that the number of genetic variants with importance for drug action is much higher than previously thought and that a true personalized prediction of drug response requires attention to millions of rare mutations. Here, we review the evolutionary background of genetic polymorphisms in drug-metabolizing enzymes, provide some important examples of current use of pharmacogenomic biomarkers, and give an update of germline and somatic genome biomarkers that are in use in drug development and clinical practice. We also discuss the current technology development with emphasis on complex genetic loci, review current initiatives for validation of pharmacogenomic biomarkers, and present scenarios for the future taking rare genetic variants into account for a true personalized genetically guided drug prescription. We conclude that pharmacogenomic information for patient stratification is of value to tailor optimized treatment regimens particularly in oncology. However, the routine use of pharmacogenomic biomarkers in clinical practice in other therapeutic areas is currently sparse and the prospects of its future implementation are being scrutinized by different international consortia.

101 citations

Journal ArticleDOI
25 Jul 2018-Neuron
TL;DR: Exome sequencing of 125 CH trios and 52 additional probands identified three genes with significant burden of rare damaging de novo or transmitted mutations and four genes required for neural tube development and regulate ventricular zone neural stem cell fate, implicate impaired neurogenesis in the pathogenesis of a subset of CH patients.

100 citations


Additional excerpts

  • ...Single nucleotide variants and small insertions and deletion were called using GATK Haplotype Caller and annotated using ANNOVAR (Wang et al., 2010b), NHLBI exome variant server (Fu et al., 2013), 1000 Genomes (Auton et al., 2015), DbSNP (Sherry et al., 2001), and gnomAD and ExAC databases (Lek et al., 2016)....

    [...]

  • ...Candidate de novo variants were filtered based on the following criteria: (1) minor allele frequency (MAF) % 5 3 10 3 in ExAC, 1000 Genomes, and EVS, (2) GATK variant quality score recalibration (VQSR) of ‘pass’, (3) minimum sequencing depth of 8 reads in the proband and each parent, (4) genotype quality (GQ) score R 20 and alternate allele ratio R 40%, (5) TrioDeNovo data quality (DQ) score R 7, and (6) exonic or splice-site variant....

    [...]

  • ...REAGENT or RESOURCE SOURCE IDENTIFIER Deposited Data Whole-exome sequencing data from CH trios (n = 125) This paper dbGaP: phs000744; https://www.ncbi.nlm.nih.gov/projects/ gap/cgi-bin/study.cgi?study_id=phs000744 Whole-exome sequencing data from SSC control trios Iossifov et al., 2014 https://ndar.nih.gov/study.html?id=352 Software and Algorithms Genome Analysis Tool Kit (GATK) DePristo et al., 2011; McKenna et al., 2010; Van der Auwera et al., 2013 https://software.broadinstitute.org/gatk/ BWA-mem Li and Durbin, 2009 http://bio-bwa.sourceforge.net/ Annovar Wang et al., 2010b http://annovar.openbioinformatics.org/en/latest/ PLINK/SEQ Fromer et al., 2014 https://atgu.mgh.harvard.edu/plinkseq/ Other 1000 Genomes GRCh37 h19 genome build 1000 Genomes Project http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/ reference/human_g1k_v37.fasta.gz RefSeq hg19 gene annotation UCSC Genome Browser http://genome.ucsc.edu/cgi-bin/hgTables?command=start Intervals file for IDT xGen v.1.0 Integrated DNA Technologies http://www.idtdna.com/pages/products/next-generation- sequencing/hybridization-capture/lockdown-panels/ xgen-exome-research-panel...

    [...]

  • ...After filtering out common CNVs present at allele frequencies greater than 0.1% in 1000 Genomes (Sudmant et al., 2015) and 10% in the cohort, high quality CNVs (SQ > 60 where SQ indicates the phred-scaled quality score for the presence of a CNV event within the interval) were subjected to visual inspection....

    [...]

  • ..., 2013), 1000 Genomes (Auton et al., 2015), DbSNP (Sherry et al....

    [...]

Journal ArticleDOI
Mathias Gorski1, Peter J. van der Most2, Alexander Teumer3, Audrey Y. Chu4, Audrey Y. Chu5, Man Li6, Man Li7, Vladan Mijatovic8, Ilja M. Nolte2, Massimiliano Cocca9, Daniel Taliun, Felicia Gomez10, Yong Li11, Bamidele O. Tayo12, Adrienne Tin6, Mary F. Feitosa10, Thor Aspelund13, John Attia14, Reiner Biffar3, Murielle Bochud15, Eric Boerwinkle16, Ingrid B. Borecki17, Erwin P. Bottinger18, Ming-Huei Chen4, Vincent Chouraki19, Marina Ciullo20, Josef Coresh6, Marilyn C. Cornelis21, Gary C. Curhan5, Adamo Pio D'Adamo9, Abbas Dehghan22, Laura Dengler1, Jingzhong Ding23, Gudny Eiriksdottir, Karlhans Endlich3, Stefan Enroth24, Tõnu Esko25, Oscar H. Franco22, Paolo Gasparini9, Christian Gieger, Giorgia Girotto9, Omri Gottesman18, Vilmundur Gudnason13, Ulf Gyllensten24, Stephen Hancock14, Tamara B. Harris4, Catherine Helmer26, Catherine Helmer27, Simon Höllerer1, Edith Hofer28, Albert Hofman22, Elizabeth G. Holliday, Georg Homuth3, Frank B. Hu5, Cornelia Huth, Nina Hutri-Kähönen, Shih-Jen Hwang4, Medea Imboden29, Medea Imboden30, Åsa Johansson24, Mika Kähönen, Wolfgang König31, Wolfgang König32, Holly Kramer12, Bernhard K. Krämer33, Ashok Kumar29, Ashok Kumar34, Ashok Kumar30, Zoltán Kutalik15, Jean-Charles Lambert19, Lenore J. Launer4, Terho Lehtimäki, Martin H. de Borst2, Gerjan Navis2, Morris A. Swertz2, Yongmei Liu23, Kurt Lohman23, Ruth J. F. Loos18, Ruth J. F. Loos35, Yingchang Lu18, Leo-Pekka Lyytikäinen, Mark McEvoy14, Christa Meisinger, Thomas Meitinger32, Andres Metspalu25, Marie Metzger36, Evelin Mihailov25, Paul Mitchell37, Matthias Nauck38, Albertine J. Oldehinkel2, Matthias Olden1, Matthias Olden4, Brenda W.J.H. Penninx39, Giorgio Pistis, Peter P. Pramstaller, Nicole Probst-Hensch29, Nicole Probst-Hensch30, Olli T. Raitakari40, Rainer Rettig3, Paul M. Ridker5, Fernando Rivadeneira22, Antonietta Robino, Sylvia E. Rosas5, Douglas M. Ruderfer18, Daniela Ruggiero, Yasaman Saba28, Cinzia Sala, Helena Schmidt28, Reinhold Schmidt28, Rodney J. Scott14, Sanaz Sedaghat22, Albert V. Smith13, Rossella Sorice20, Bénédicte Stengel36, Sylvia Stracke3, Konstantin Strauch41, Daniela Toniolo, André G. Uitterlinden22, Sheila Ulivi, Jorma Viikari40, Uwe Völker38, Peter Vollenweider15, Henry Völzke38, Dragana Vuckovic9, Melanie Waldenberger, Jie Jin Wang37, Qiong Yang42, Daniel I. Chasman43, Daniel I. Chasman5, Gerard Tromp44, Harold Snieder2, Iris M. Heid1, Caroline S. Fox4, Anna Köttgen11, Anna Köttgen6, Cristian Pattaro, Carsten A. Böger1, Christian Fuchsberger 
TL;DR: A GWAS meta-analysis of kidney function based on the estimated glomerular filtration rate in 110,517 European ancestry participants using 1000 Genomes imputed data identified 10 novel loci with p-value < 5 × 10−8 previously missed by HapMap-based GWAS, which highlight the utility of re-imputing from denser reference panels, until whole-genome sequencing becomes feasible in large samples.
Abstract: HapMap imputed genome-wide association studies (GWAS) have revealed >50 loci at which common variants with minor allele frequency >5% are associated with kidney function. GWAS using more complete reference sets for imputation, such as those from The 1000 Genomes project, promise to identify novel loci that have been missed by previous efforts. To investigate the value of such a more complete variant catalog, we conducted a GWAS meta-analysis of kidney function based on the estimated glomerular filtration rate (eGFR) in 110,517 European ancestry participants using 1000 Genomes imputed data. We identified 10 novel loci with p-value < 5 × 10(-8) previously missed by HapMap-based GWAS. Six of these loci (HOXD8, ARL15, PIK3R1, EYA4, ASTN2, and EPB41L3) are tagged by common SNPs unique to the 1000 Genomes reference panel. Using pathway analysis, we identified 39 significant (FDR < 0.05) genes and 127 significantly (FDR < 0.05) enriched gene sets, which were missed by our previous analyses. Among those, the 10 identified novel genes are part of pathways of kidney development, carbohydrate metabolism, cardiac septum development and glucose metabolism. These results highlight the utility of re-imputing from denser reference panels, until whole-genome sequencing becomes feasible in large samples.

100 citations


Cites background from "A global reference for human geneti..."

  • ...Recent technological advances resulted in large collections of whole-genome sequence data, such as those from The 1000 Genomes project14,15....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score.

88,255 citations

Journal ArticleDOI
TL;DR: SAMtools as discussed by the authors implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments.
Abstract: Summary: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments. Availability: http://samtools.sourceforge.net Contact: [email protected]

45,957 citations

Journal ArticleDOI
TL;DR: A new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format, which allows the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks.
Abstract: Motivation: Testing for correlations between different sets of genomic features is a fundamental task in genomics research. However, searching for overlaps between features with existing webbased methods is complicated by the massive datasets that are routinely produced with current sequencing technologies. Fast and flexible tools are therefore required to ask complex questions of these data in an efficient manner. Results: This article introduces a new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format. BEDTools also supports the comparison of sequence alignments in BAM format to both BED and GFF features. The tools are extremely efficient and allow the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks. BEDTools can be combined with one another as well as with standard UNIX commands, thus facilitating routine genomics tasks as well as pipelines that can quickly answer intricate questions of large genomic datasets. Availability and implementation: BEDTools was written in C++. Source code and a comprehensive user manual are freely available at http://code.google.com/p/bedtools

18,858 citations

Journal ArticleDOI
06 Sep 2012-Nature
TL;DR: The Encyclopedia of DNA Elements project provides new insights into the organization and regulation of the authors' genes and genome, and is an expansive resource of functional annotations for biomedical research.
Abstract: The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.

13,548 citations

Journal ArticleDOI
TL;DR: VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API.
Abstract: Summary: The variant call format (VCF) is a generic format for storing DNA polymorphism data such as SNPs, insertions, deletions and structural variants, together with rich annotations. VCF is usually stored in a compressed manner and can be indexed for fast data retrieval of variants from a range of positions on the reference genome. The format was developed for the 1000 Genomes Project, and has also been adopted by other projects such as UK10K, dbSNP and the NHLBI Exome Project. VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API. Availability: http://vcftools.sourceforge.net Contact: [email protected]

10,164 citations