scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A global reference for human genetic variation.

Adam Auton1, Gonçalo R. Abecasis2, David Altshuler3, Richard Durbin4  +514 moreInstitutions (90)
01 Oct 2015-Nature (Nature Publishing Group)-Vol. 526, Iss: 7571, pp 68-74
TL;DR: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations, and has reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-generation sequencing, deep exome sequencing, and dense microarray genotyping.
Abstract: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.
Citations
More filters
Journal ArticleDOI
TL;DR: Recent advances in the understanding of intrahepatic cholestasis of pregnancy are described with a particular emphasis on how aspects of genetic and reproductive hormone involvement in pathophysiology have been elucidated.

121 citations

Journal ArticleDOI
TL;DR: Analyzing optical genome maps of 154 individuals from the 26 populations sequenced in the 1000 Genomes Project, it is found that phylogenetic population patterns of large SVs are similar to those of single nucleotide variations in 86% of the human genome, while ~2% ofThe genome has high structural complexity.
Abstract: Large structural variants (SVs) in the human genome are difficult to detect and study by conventional sequencing technologies. With long-range genome analysis platforms, such as optical mapping, one can identify large SVs (>2 kb) across the genome in one experiment. Analyzing optical genome maps of 154 individuals from the 26 populations sequenced in the 1000 Genomes Project, we find that phylogenetic population patterns of large SVs are similar to those of single nucleotide variations in 86% of the human genome, while ~2% of the genome has high structural complexity. We are able to characterize SVs in many intractable regions of the genome, including segmental duplications and subtelomeric, pericentromeric, and acrocentric areas. In addition, we discover ~60 Mb of non-redundant genome content missing in the reference genome sequence assembly. Our results highlight the need for a comprehensive set of alternate haplotypes from different populations to represent SV patterns in the genome.

120 citations

Journal ArticleDOI
01 Feb 2017-Brain
TL;DR: In this article, a Bulgarian family with three siblings affected by complicated hereditary spastic paraplegia was identified and identified a homozygous p.Thr512Ile (c.1535C > T) mutation in ATP13A2.
Abstract: Hereditary spastic paraplegias are heterogeneous neurodegenerative disorders characterized by progressive spasticity of the lower limbs due to degeneration of the corticospinal motor neurons. In a Bulgarian family with three siblings affected by complicated hereditary spastic paraplegia, we performed whole exome sequencing and homozygosity mapping and identified a homozygous p.Thr512Ile (c.1535C > T) mutation in ATP13A2. Molecular defects in this gene have been causally associated with Kufor-Rakeb syndrome (#606693), an autosomal recessive form of juvenile-onset parkinsonism, and neuronal ceroid lipofuscinosis (#606693), a neurodegenerative disorder characterized by the intracellular accumulation of autofluorescent lipopigments. Further analysis of 795 index cases with hereditary spastic paraplegia and related disorders revealed two additional families carrying truncating biallelic mutations in ATP13A2. ATP13A2 is a lysosomal P5-type transport ATPase, the activity of which critically depends on catalytic autophosphorylation. Our biochemical and immunocytochemical experiments in COS-1 and HeLa cells and patient-derived fibroblasts demonstrated that the hereditary spastic paraplegia-associated mutations, similarly to the ones causing Kufor-Rakeb syndrome and neuronal ceroid lipofuscinosis, cause loss of ATP13A2 function due to transcript or protein instability and abnormal intracellular localization of the mutant proteins, ultimately impairing the lysosomal and mitochondrial function. Moreover, we provide the first biochemical evidence that disease-causing mutations can affect the catalytic autophosphorylation activity of ATP13A2. Our study adds complicated hereditary spastic paraplegia (SPG78) to the clinical continuum of ATP13A2-associated neurological disorders, which are commonly hallmarked by lysosomal and mitochondrial dysfunction. The disease presentation in our patients with hereditary spastic paraplegia was dominated by an adult-onset lower-limb predominant spastic paraparesis. Cognitive impairment was present in most of the cases and ranged from very mild deficits to advanced dementia with fronto-temporal characteristics. Nerve conduction studies revealed involvement of the peripheral motor and sensory nerves. Only one of five patients with hereditary spastic paraplegia showed clinical indication of extrapyramidal involvement in the form of subtle bradykinesia and slight resting tremor. Neuroimaging cranial investigations revealed pronounced vermian and hemispheric cerebellar atrophy. Notably, reduced striatal dopamine was apparent in the brain of one of the patients, who had no clinical signs or symptoms of extrapyramidal involvement.

120 citations


Additional excerpts

  • ...…and 45% frequency in dbSNP135 (Sherry et al., 2001), EVS6500 [Exome Variant Server, NHLBI Exome Sequencing Project (ESP), Seattle, WA (URL: http://evs.gs. washington.edu/EVS/) (August 2014)], 1000 Genomes (Genomes Project et al., 2015), and in-house exome/ genome databases (Gonzalez et al., 2015)....

    [...]

Journal ArticleDOI
Céline Bellenguez1, Céline Bellenguez2, Céline Bellenguez3, Camille Charbonnier1, Benjamin Grenier-Boley1, Benjamin Grenier-Boley2, Benjamin Grenier-Boley3, Olivier Quenez1, Kilan Le Guennec1, Gaël Nicolas1, Ganesh Chauhan4, David Wallon1, Stéphane Rousseau1, Anne Claire Richard1, Anne Boland, Guillaume Bourque5, Hans Markus Munter5, Robert Olaso, Vincent Meyer, Adeline Rollin-Sillaire6, Florence Pasquier6, Luc Letenneur4, Richard Redon7, Jean-François Dartigues4, Christophe Tzourio4, Thierry Frebourg1, Mark Lathrop5, Jean-François Deleuze, Didier Hannequin1, Emmanuelle Génin1, Philippe Amouyel, Stéphanie Debette4, Jean-Charles Lambert1, Jean-Charles Lambert3, Jean-Charles Lambert2, Dominique Campion1, Olivier Martinaud8, Aline Zarea, Stéphanie Bombois, Marie-Anne Mackowiak, Vincent Deramecourt, A Michon, Isabelle Le Ber, Bruno Dubois, Olivier Godefroy, Frédérique Etcharry-Bouyx, Valérie Chauviré, Ludivine Chamard, Eric Berger, Eloi Magnin, Sophie Auriacombe, François Tison, Vincent de la Sayette, Dominique Castan, Elsa Dionet, François Sellal, Olivier Rouaud, Christel Thauvin, Olivier Moreaud, Mathilde Sauvée, Maïté Formaglio, Hélène Mollion, Isabelle Roullet-Solignac, Alain Vighetto, Bernard Croisile, Mira Didic, Olivier Felician, Lejla Koric, Mathieu Ceccaldi, Audrey Gabelle, Cecilia Marelli, Pierre Labauge, Thérèse Rivasseau Jonveaux, Martine Vercelletto, Claire Boutoleau-Bretonnière, Giovanni Castelnovo, Claire Paquet, Julien Dumurgier, Jacques Hugon, Foucauld De Boisgueheneuc, Serge Belliard, Serge Bakchine, Marie Sarazin, Marie-Odile Barrellon, Bernard Laurent, Frédéric Blanc, Jérémie Pariente, Snejana Jurici 
TL;DR: Despite different effect sizes and varying cumulative minor allele frequencies, the rare protein-truncating and missense-predicted damaging variants in TREM2, SORL1, and ABCA7 contribute similarly to the heritability of EOAD and explain between 1.1% and 1.5% of EAD heritability each.

120 citations

Journal ArticleDOI
TL;DR: Factors contributing to the lower risk effect in the ApoE gene ε4 allele are likely due to ancestry-specific genetic factors near ApOE rather than non-genetic ethnic, cultural, and environmental factors.
Abstract: The ApoE e4 allele is the most significant genetic risk factor for late-onset Alzheimer disease. The risk conferred by e4, however, differs across populations, with populations of African ancestry showing lower e4 risk compared to those of European or Asian ancestry. The cause of this heterogeneity in risk effect is currently unknown; it may be due to environmental or cultural factors correlated with ancestry, or it may be due to genetic variation local to the ApoE region that differs among populations. Exploring these hypotheses may lead to novel, population-specific therapeutics and risk predictions. To test these hypotheses, we analyzed ApoE genotypes and genome-wide array data in individuals from African American and Puerto Rican populations. A total of 1,766 African American and 220 Puerto Rican individuals with late-onset Alzheimer disease, and 3,730 African American and 169 Puerto Rican cognitively healthy individuals (> 65 years) participated in the study. We first assessed average ancestry across the genome (“global” ancestry) and then tested it for interaction with ApoE genotypes. Next, we assessed the ancestral background of ApoE alleles (“local” ancestry) and tested if ancestry local to ApoE influenced Alzheimer disease risk while controlling for global ancestry. Measures of global ancestry showed no interaction with ApoE risk (Puerto Rican: p-value = 0.49; African American: p-value = 0.65). Conversely, ancestry local to the ApoE region showed an interaction with the ApoE e4 allele in both populations (Puerto Rican: p-value = 0.019; African American: p-value = 0.005). ApoE e4 alleles on an African background conferred a lower risk than those with a European ancestral background, regardless of population (Puerto Rican: OR = 1.26 on African background, OR = 4.49 on European; African American: OR = 2.34 on African background, OR = 3.05 on European background). Factors contributing to the lower risk effect in the ApoE gene e4 allele are likely due to ancestry-specific genetic factors near ApoE rather than non-genetic ethnic, cultural, and environmental factors.

120 citations

References
More filters
Journal ArticleDOI
TL;DR: A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score.

88,255 citations

Journal ArticleDOI
TL;DR: SAMtools as discussed by the authors implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments.
Abstract: Summary: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments. Availability: http://samtools.sourceforge.net Contact: [email protected]

45,957 citations

Journal ArticleDOI
TL;DR: A new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format, which allows the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks.
Abstract: Motivation: Testing for correlations between different sets of genomic features is a fundamental task in genomics research. However, searching for overlaps between features with existing webbased methods is complicated by the massive datasets that are routinely produced with current sequencing technologies. Fast and flexible tools are therefore required to ask complex questions of these data in an efficient manner. Results: This article introduces a new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format. BEDTools also supports the comparison of sequence alignments in BAM format to both BED and GFF features. The tools are extremely efficient and allow the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks. BEDTools can be combined with one another as well as with standard UNIX commands, thus facilitating routine genomics tasks as well as pipelines that can quickly answer intricate questions of large genomic datasets. Availability and implementation: BEDTools was written in C++. Source code and a comprehensive user manual are freely available at http://code.google.com/p/bedtools

18,858 citations

Journal ArticleDOI
06 Sep 2012-Nature
TL;DR: The Encyclopedia of DNA Elements project provides new insights into the organization and regulation of the authors' genes and genome, and is an expansive resource of functional annotations for biomedical research.
Abstract: The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.

13,548 citations

Journal ArticleDOI
TL;DR: VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API.
Abstract: Summary: The variant call format (VCF) is a generic format for storing DNA polymorphism data such as SNPs, insertions, deletions and structural variants, together with rich annotations. VCF is usually stored in a compressed manner and can be indexed for fast data retrieval of variants from a range of positions on the reference genome. The format was developed for the 1000 Genomes Project, and has also been adopted by other projects such as UK10K, dbSNP and the NHLBI Exome Project. VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API. Availability: http://vcftools.sourceforge.net Contact: [email protected]

10,164 citations