scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A global reference for human genetic variation.

Adam Auton1, Gonçalo R. Abecasis2, David Altshuler3, Richard Durbin4  +514 moreInstitutions (90)
01 Oct 2015-Nature (Nature Publishing Group)-Vol. 526, Iss: 7571, pp 68-74
TL;DR: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations, and has reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-generation sequencing, deep exome sequencing, and dense microarray genotyping.
Abstract: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.
Citations
More filters
Journal ArticleDOI
TL;DR: It is shown that a novel NLGN1 Pro89Leu (P89L) missense variant found in two ASD siblings leads to changes in cellular localization, protein degradation, and to the impairment of spine formation, and support that the NLGN synaptic pathway is of importance in the etiology of neuropsychiatric disorders.
Abstract: Genetic mutations contribute to the etiology of autism spectrum disorder (ASD), a common, heterogeneous neurodevelopmental disorder characterized by impairments in social interaction, communication, and repetitive and restricted patterns of behavior. Since neuroligin3 (NLGN3), a cell adhesion molecule at the neuronal synapse, was first identified as a risk gene for ASD, several additional variants in NLGN3 and NLGN4 were found in ASD patients. Moreover, synaptopathies are now known to cause several neuropsychiatric disorders including ASD. In humans, NLGNs consist of five family members, and neuroligin1 (NLGN1) is a major component forming a complex on excitatory glutamatergic synapses. However, the significance of NLGN1 in neuropsychiatric disorders remains unknown. Here, we systematically examine five missense variants of NLGN1 that were detected in ASD patients, and show molecular and cellular alterations caused by these variants. We show that a novel NLGN1 Pro89Leu (P89L) missense variant found in two ASD siblings leads to changes in cellular localization, protein degradation, and to the impairment of spine formation. Furthermore, we generated the knock-in P89L mice, and we show that the P89L heterozygote mice display abnormal social behavior, a core feature of ASD. These results, for the first time, implicate rare variants in NLGN1 as functionally significant and support that the NLGN synaptic pathway is of importance in the etiology of neuropsychiatric disorders.

86 citations

Journal ArticleDOI
TL;DR: In this paper, the authors investigate an attribute of 3D genome architecture-the stability of TAD boundaries across cell types-and demonstrate its relevance to understand how genetic variation in TADs contributes to complex disease by synthesizing TAD maps across 37 diverse cell types with 41 genome-wide association studies (GWASs).
Abstract: Topologically associating domains (TADs) are fundamental units of three-dimensional (3D) nuclear organization The regions bordering TADs-TAD boundaries-contribute to the regulation of gene expression by restricting interactions of cis-regulatory sequences to their target genes TAD and TAD-boundary disruption have been implicated in rare-disease pathogenesis; however, we have a limited framework for integrating TADs and their variation across cell types into the interpretation of common-trait-associated variants Here, we investigate an attribute of 3D genome architecture-the stability of TAD boundaries across cell types-and demonstrate its relevance to understanding how genetic variation in TADs contributes to complex disease By synthesizing TAD maps across 37 diverse cell types with 41 genome-wide association studies (GWASs), we investigate the differences in disease association and evolutionary pressure on variation in TADs versus TAD boundaries We demonstrate that genetic variation in TAD boundaries contributes more to complex-trait heritability, especially for immunologic, hematologic, and metabolic traits We also show that TAD boundaries are more evolutionarily constrained than TADs Next, stratifying boundaries by their stability across cell types, we find substantial variation Compared to boundaries unique to a specific cell type, boundaries stable across cell types are further enriched for complex-trait heritability, evolutionary constraint, CTCF binding, and housekeeping genes Thus, considering TAD boundary stability across cell types provides valuable context for understanding the genome's functional landscape and enabling variant interpretation that takes 3D structure into account

86 citations

Journal ArticleDOI
TL;DR: This protocol describes bioinformatics procedures to detect RNA editing in RNA-sequencing datasets using REDItools and REDIportal and shows how to identify dysregulated editing at specific recoding sites in post-mortem brain samples of Huntington disease donors.
Abstract: RNA editing is a widespread post-transcriptional mechanism able to modify transcripts through insertions/deletions or base substitutions. It is prominent in mammals, in which millions of adenosines are deaminated to inosines by members of the ADAR family of enzymes. A-to-I RNA editing has a plethora of biological functions, but its detection in large-scale transcriptome datasets is still an unsolved computational task. To this aim, we developed REDItools, the first software package devoted to the RNA editing profiling in RNA-sequencing (RNAseq) data. It has been successfully used in human transcriptomes, proving the tissue and cell type specificity of RNA editing as well as its pervasive nature. Outcomes from large-scale REDItools analyses on human RNAseq data have been collected in our specialized REDIportal database, containing more than 4.5 million events. Here we describe in detail two bioinformatic procedures based on our computational resources, REDItools and REDIportal. In the first procedure, we outline a workflow to detect RNA editing in the human cell line NA12878, for which transcriptome and whole genome data are available. In the second procedure, we show how to identify dysregulated editing at specific recoding sites in post-mortem brain samples of Huntington disease donors. On a 64-bit computer running Linux with ≥32 GB of random-access memory (RAM), both procedures should take ~76 h, using 4 to 24 cores. Our protocols have been designed to investigate RNA editing in different organisms with available transcriptomic and/or genomic reads. Scripts to complete both procedures and a docker image are available at https://github.com/BioinfoUNIBA/REDItools. This protocol describes bioinformatics procedures to detect RNA editing in RNA-sequencing datasets using REDItools and REDIportal. REDItools is a software package to profile RNA editing, while known editing sites are collected in the REDIportal database.

86 citations

Journal ArticleDOI
06 Aug 2020-Nature
TL;DR: In this article, the authors describe shared and population-specific patterns of genomic mutations and clonal selection in haematopoietic cells on the basis of 33,250 -autosomal mosaic chromosomal alterations that were detected in 179,417 Japanese participants in the BioBank Japan cohort and compared with analogous data from the UK Biobank in this long-lived Japanese population.
Abstract: The extent to which the biology of oncogenesis and ageing are shaped by factors that distinguish human populations is unknown Haematopoietic clones with acquired mutations become common with advancing age and can lead to blood cancers1–10 Here we describe shared and population-specific patterns of genomic mutations and clonal selection in haematopoietic cells on the basis of 33,250 autosomal mosaic chromosomal alterations that we detected in 179,417 Japanese participants in the BioBank Japan cohort and compared with analogous data from the UK Biobank In this long-lived Japanese population, mosaic chromosomal alterations were detected in more than 350% (sem, 14%) of individuals older than 90 years, which suggests that such clones trend towards inevitability with advancing age Japanese and European individuals exhibited key differences in the genomic locations of mutations in their respective haematopoietic clones; these differences predicted the relative rates of chronic lymphocytic leukaemia (which is more common among European individuals) and T cell leukaemia (which is more common among Japanese individuals) in these populations Three different mutational precursors of chronic lymphocytic leukaemia (including trisomy 12, loss of chromosomes 13q and 13q, and copy-neutral loss of heterozygosity) were between two and six times less common among Japanese individuals, which suggests that the Japanese and European populations differ in selective pressures on clones long before the development of clinically apparent chronic lymphocytic leukaemia Japanese and British populations also exhibited very different rates of clones that arose from B and T cell lineages, which predicted the relative rates of B and T cell cancers in these populations We identified six previously undescribed loci at which inherited variants predispose to mosaic chromosomal alterations that duplicate or remove the inherited risk alleles, including large-effect rare variants at NBN, MRE11 and CTU2 (odds ratio, 28–91) We suggest that selective pressures on clones are modulated by factors that are specific to human populations Further genomic characterization of clonal selection and cancer in populations from around the world is therefore warranted Population-specific patterns of genomic mutations and selection of haematopoietic clones in Japanese and European participants predict the divergent rates of chronic lymphocytic leukaemia and T cell leukaemia in these populations

86 citations

Journal ArticleDOI
TL;DR: In this article , the authors outline the key applications enabled by multimodal artificial intelligence, along with the technical and analytical challenges, and survey the data, modeling and privacy challenges that must be overcome to realize the full potential of multimodAL artificial intelligence in health.
Abstract: The increasing availability of biomedical data from large biobanks, electronic health records, medical imaging, wearable and ambient biosensors, and the lower cost of genome and microbiome sequencing have set the stage for the development of multimodal artificial intelligence solutions that capture the complexity of human health and disease. In this Review, we outline the key applications enabled, along with the technical and analytical challenges. We explore opportunities in personalized medicine, digital clinical trials, remote monitoring and care, pandemic surveillance, digital twin technology and virtual health assistants. Further, we survey the data, modeling and privacy challenges that must be overcome to realize the full potential of multimodal artificial intelligence in health.

86 citations

References
More filters
Journal ArticleDOI
TL;DR: A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score.

88,255 citations

Journal ArticleDOI
TL;DR: SAMtools as discussed by the authors implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments.
Abstract: Summary: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments. Availability: http://samtools.sourceforge.net Contact: [email protected]

45,957 citations

Journal ArticleDOI
TL;DR: A new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format, which allows the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks.
Abstract: Motivation: Testing for correlations between different sets of genomic features is a fundamental task in genomics research. However, searching for overlaps between features with existing webbased methods is complicated by the massive datasets that are routinely produced with current sequencing technologies. Fast and flexible tools are therefore required to ask complex questions of these data in an efficient manner. Results: This article introduces a new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format. BEDTools also supports the comparison of sequence alignments in BAM format to both BED and GFF features. The tools are extremely efficient and allow the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks. BEDTools can be combined with one another as well as with standard UNIX commands, thus facilitating routine genomics tasks as well as pipelines that can quickly answer intricate questions of large genomic datasets. Availability and implementation: BEDTools was written in C++. Source code and a comprehensive user manual are freely available at http://code.google.com/p/bedtools

18,858 citations

Journal ArticleDOI
06 Sep 2012-Nature
TL;DR: The Encyclopedia of DNA Elements project provides new insights into the organization and regulation of the authors' genes and genome, and is an expansive resource of functional annotations for biomedical research.
Abstract: The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.

13,548 citations

Journal ArticleDOI
TL;DR: VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API.
Abstract: Summary: The variant call format (VCF) is a generic format for storing DNA polymorphism data such as SNPs, insertions, deletions and structural variants, together with rich annotations. VCF is usually stored in a compressed manner and can be indexed for fast data retrieval of variants from a range of positions on the reference genome. The format was developed for the 1000 Genomes Project, and has also been adopted by other projects such as UK10K, dbSNP and the NHLBI Exome Project. VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API. Availability: http://vcftools.sourceforge.net Contact: [email protected]

10,164 citations