Author
Kati J. Buckingham
Other affiliations: Western Washington University
Bio: Kati J. Buckingham is an academic researcher from University of Washington. The author has contributed to research in topics: Exome sequencing & Medicine. The author has an hindex of 17, co-authored 38 publications receiving 4751 citations. Previous affiliations of Kati J. Buckingham include Western Washington University.
Papers
More filters
••
TL;DR: Exome sequencing of a small number of unrelated affected individuals is a powerful, efficient strategy for identifying the genes underlying rare mendelian disorders and will likely transform the genetic analysis of monogenic traits.
Abstract: We demonstrate the first successful application of exome sequencing to discover the gene for a rare mendelian disorder of unknown cause, Miller syndrome (MIM%263750). For four affected individuals in three independent kindreds, we captured and sequenced coding regions to a mean coverage of 40x and sufficient depth to call variants at approximately 97% of each targeted exome. Filtering against public SNP databases and eight HapMap exomes for genes with two previously unknown variants in each of the four individuals identified a single candidate gene, DHODH, which encodes a key enzyme in the pyrimidine de novo biosynthesis pathway. Sanger sequencing confirmed the presence of DHODH mutations in three additional families with Miller syndrome. Exome sequencing of a small number of unrelated affected individuals is a powerful, efficient strategy for identifying the genes underlying rare mendelian disorders and will likely transform the genetic analysis of monogenic traits.
1,980 citations
••
TL;DR: The results strongly suggest that mutations in MLL2, which encodes a Trithorax-group histone methyltransferase, are a major cause of Kabuki syndrome.
Abstract: We demonstrate the successful application of exome sequencing to discover a gene for an autosomal dominant disorder, Kabuki syndrome (OMIM%147920). We subjected the exomes of ten unrelated probands to massively parallel sequencing. After filtering against existing SNP databases, there was no compelling candidate gene containing previously unknown variants in all affected individuals. Less stringent filtering criteria allowed for the presence of modest genetic heterogeneity or missing data but also identified multiple candidate genes. However, genotypic and phenotypic stratification highlighted MLL2, which encodes a Trithorax-group histone methyltransferase: seven probands had newly identified nonsense or frameshift mutations in this gene. Follow-up Sanger sequencing detected MLL2 mutations in two of the three remaining individuals with Kabuki syndrome (cases) and in 26 of 43 additional cases. In families where parental DNA was available, the mutation was confirmed to be de novo (n = 12) or transmitted (n = 2) in concordance with phenotype. Our results strongly suggest that mutations in MLL2 are a major cause of Kabuki syndrome.
1,261 citations
••
TL;DR: This collaborative effort has identified 956 genes, including 375 not previously associated with human health, that underlie a Mendelian phenotype, providing insight into study design and analytical strategies, identify novel mechanisms of disease, and reveal the extensive clinical variability of Mendelia phenotypes.
Abstract: Discovering the genetic basis of a Mendelian phenotype establishes a causal link between genotype and phenotype, making possible carrier and population screening and direct diagnosis. Such discoveries also contribute to our knowledge of gene function, gene regulation, development, and biological mechanisms that can be used for developing new therapeutics. As of February 2015, 2,937 genes underlying 4,163 Mendelian phenotypes have been discovered, but the genes underlying ∼50% (i.e., 3,152) of all known Mendelian phenotypes are still unknown, and many more Mendelian conditions have yet to be recognized. This is a formidable gap in biomedical knowledge. Accordingly, in December 2011, the NIH established the Centers for Mendelian Genomics (CMGs) to provide the collaborative framework and infrastructure necessary for undertaking large-scale whole-exome sequencing and discovery of the genetic variants responsible for Mendelian phenotypes. In partnership with 529 investigators from 261 institutions in 36 countries, the CMGs assessed 18,863 samples from 8,838 families representing 579 known and 470 novel Mendelian phenotypes as of January 2015. This collaborative effort has identified 956 genes, including 375 not previously associated with human health, that underlie a Mendelian phenotype. These results provide insight into study design and analytical strategies, identify novel mechanisms of disease, and reveal the extensive clinical variability of Mendelian phenotypes. Discovering the gene underlying every Mendelian phenotype will require tackling challenges such as worldwide ascertainment and phenotypic characterization of families affected by Mendelian conditions, improvement in sequencing and analytical techniques, and pervasive sharing of phenotypic and genomic data among researchers, clinicians, and families.
579 citations
••
University of Washington1, University of Pennsylvania2, Boston Children's Hospital3, Nagasaki University4, Yokohama City University5, University of the Ryukyus6, Dokkyo Medical University7, Chang Gung University8, Health Sciences University of Hokkaido9, Central South University10, University of Nevada, Reno11, University of Manchester12, University of Colorado Denver13
TL;DR: In this paper, the authors reported on the screening of 110 families with Kabuki syndrome and found 81/110 (74%) mutations in the Trithorax-group histone methyltransferase, a protein important in the epigenetic control of active chromatin states.
Abstract: Kabuki syndrome is a rare, multiple malformation disorder characterized by a distinctive facial appearance, cardiac anomalies, skeletal abnormalities, and mild to moderate intellectual disability. Simplex cases make up the vast majority of the reported cases with Kabuki syndrome, but parent-to-child transmission in more than a half-dozen instances indicates that it is an autosomal dominant disorder. We recently reported that Kabuki syndrome is caused by mutations in MLL2, a gene that encodes a Trithorax-group histone methyltransferase, a protein important in the epigenetic control of active chromatin states. Here, we report on the screening of 110 families with Kabuki syndrome. MLL2 mutations were found in 81/110 (74%) of families. In simplex cases for which DNA was available from both parents, 25 mutations were confirmed to be de novo, while a transmitted MLL2 mutation was found in two of three familial cases. The majority of variants found to cause Kabuki syndrome were novel nonsense or frameshift mutations that are predicted to result in haploinsufficiency. The clinical characteristics of MLL2 mutation-positive cases did not differ significantly from MLL2 mutation-negative cases with the exception that renal anomalies were more common in MLL2 mutation-positive cases. These results are important for understanding the phenotypic consequences of MLL2 mutations for individuals and their families as well as for providing a basis for the identification of additional genes for Kabuki syndrome.
170 citations
••
University of Washington1, Boston Children's Hospital2, Pontifical Catholic University of Chile3, University of North Carolina at Chapel Hill4, University of Utah5, University of New Mexico6, University of Manchester7, University of California, San Francisco8, Katholieke Universiteit Leuven9, Christchurch Hospital10, University of Florence11, Cedars-Sinai Medical Center12, University of British Columbia13, University of Texas Health Science Center at Houston14, University College London15, University of Oxford16, Ghent University17, University of New South Wales18, University of Otago19, Cincinnati Children's Hospital Medical Center20, University of Hawaii at Manoa21, Uludağ University22, Near East University23, Indiana University24, Southern General Hospital25, Geisinger Medical Center26
TL;DR: Findings indicate that GS, DA5, and MWS have traditionally been considered separate disorders, are etiologically related and perhaps represent variable expressivity of the same condition.
Abstract: Gordon syndrome (GS), or distal arthrogryposis type 3, is a rare, autosomal-dominant disorder characterized by cleft palate and congenital contractures of the hands and feet. Exome sequencing of five GS-affected families identified mutations in piezo-type mechanosensitive ion channel component 2 (PIEZO2) in each family. Sanger sequencing revealed PIEZO2 mutations in five of seven additional families studied (for a total of 10/12 [83%] individuals), and nine families had an identical c.8057G>A (p.Arg2686His) mutation. The phenotype of GS overlaps with distal arthrogryposis type 5 (DA5) and Marden-Walker syndrome (MWS). Using molecular inversion probes for targeted sequencing to screen PIEZO2, we found mutations in 24/29 (82%) DA5-affected families and one of two MWS-affected families. The presence of cleft palate was significantly associated with c.8057G>A (Fisher's exact test, adjusted p value < 0.0001). Collectively, although GS, DA5, and MWS have traditionally been considered separate disorders, our findings indicate that they are etiologically related and perhaps represent variable expressivity of the same condition.
160 citations
Cited by
More filters
••
TL;DR: The ANNOVAR tool to annotate single nucleotide variants and insertions/deletions, such as examining their functional consequence on genes, inferring cytogenetic bands, reporting functional importance scores, finding variants in conserved regions, or identifying variants reported in the 1000 Genomes Project and dbSNP is developed.
Abstract: High-throughput sequencing platforms are generating massive amounts of genetic variation data for diverse genomes, but it remains a challenge to pinpoint a small subset of functionally important variants. To fill these unmet needs, we developed the ANNOVAR tool to annotate single nucleotide variants (SNVs) and insertions/deletions, such as examining their functional consequence on genes, inferring cytogenetic bands, reporting functional importance scores, finding variants in conserved regions, or identifying variants reported in the 1000 Genomes Project and dbSNP. ANNOVAR can utilize annotation databases from the UCSC Genome Browser or any annotation data set conforming to Generic Feature Format version 3 (GFF3). We also illustrate a 'variants reduction' protocol on 4.7 million SNVs and indels from a human genome, including two causal mutations for Miller syndrome, a rare recessive disease. Through a stepwise procedure, we excluded variants that are unlikely to be causal, and identified 20 candidate genes including the causal gene. Using a desktop computer, ANNOVAR requires ∼4 min to perform gene-based annotation and ∼15 min to perform variants reduction on 4.7 million variants, making it practical to handle hundreds of human genomes in a day. ANNOVAR is freely available at http://www.openbioinformatics.org/annovar/.
10,461 citations
••
TL;DR: A unified analytic framework to discover and genotype variation among multiple samples simultaneously that achieves sensitive and specific results across five sequencing technologies and three distinct, canonical experimental designs is presented.
Abstract: Recent advances in sequencing technology make it possible to comprehensively catalogue genetic variation in population samples, creating a foundation for understanding human disease, ancestry and evolution. The amounts of raw data produced are prodigious and many computational steps are required to translate this output into high-quality variant calls. We present a unified analytic framework to discover and genotype variation among multiple samples simultaneously that achieves sensitive and specific results across five sequencing technologies and three distinct, canonical experimental designs. Our process includes (1) initial read mapping; (2) local realignment around indels; (3) base quality score recalibration; (4) SNP discovery and genotyping to find all potential variants; and (5) machine learning to separate true segregating variation from machine artifacts common to next-generation sequencing technologies. We discuss the application of these tools, instantiated in the Genome Analysis Toolkit (GATK), to deep whole-genome, whole-exome capture, and multi-sample low-pass (~4×) 1000 Genomes Project datasets.
10,056 citations
••
Harvard University1, Broad Institute2, Boston Children's Hospital3, University of Washington4, University of Arizona5, Cardiff University6, Google7, Icahn School of Medicine at Mount Sinai8, Samsung Medical Center9, Vertex Pharmaceuticals10, University of Michigan11, University of Cambridge12, State University of New York Upstate Medical University13, Karolinska Institutet14, University of Eastern Finland15, University of Oxford16, Wellcome Trust Centre for Human Genetics17, Cedars-Sinai Medical Center18, University of Ottawa19, University of Pennsylvania20, University of North Carolina at Chapel Hill21, University of Helsinki22, University of California, San Diego23, University of Mississippi Medical Center24
TL;DR: The aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries generated as part of the Exome Aggregation Consortium (ExAC) provides direct evidence for the presence of widespread mutational recurrence.
Abstract: Large-scale reference data sets of human genetic variation are critical for the medical and functional interpretation of DNA sequence changes. Here we describe the aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries generated as part of the Exome Aggregation Consortium (ExAC). This catalogue of human genetic diversity contains an average of one variant every eight bases of the exome, and provides direct evidence for the presence of widespread mutational recurrence. We have used this catalogue to calculate objective metrics of pathogenicity for sequence variants, and to identify genes subject to strong selection against various classes of mutation; identifying 3,230 genes with near-complete depletion of predicted protein-truncating variants, with 72% of these genes having no currently established human disease phenotype. Finally, we demonstrate that these data can be used for the efficient filtering of candidate disease-causing variants, and for the discovery of human 'knockout' variants in protein-coding genes.
8,758 citations
••
TL;DR: The ability of CADD to prioritize functional, deleterious and pathogenic variants across many functional categories, effect sizes and genetic architectures is unmatched by any current single-annotation method.
Abstract: Our capacity to sequence human genomes has exceeded our ability to interpret genetic variation. Current genomic annotations tend to exploit a single information type (e.g. conservation) and/or are restricted in scope (e.g. to missense changes). Here, we describe Combined Annotation Dependent Depletion (CADD), a framework that objectively integrates many diverse annotations into a single, quantitative score. We implement CADD as a support vector machine trained to differentiate 14.7 million high-frequency human derived alleles from 14.7 million simulated variants. We pre-compute “C-scores” for all 8.6 billion possible human single nucleotide variants and enable scoring of short insertions/deletions. C-scores correlate with allelic diversity, annotations of functionality, pathogenicity, disease severity, experimentally measured regulatory effects, and complex trait associations, and highly rank known pathogenic variants within individual genomes. The ability of CADD to prioritize functional, deleterious, and pathogenic variants across many functional categories, effect sizes and genetic architectures is unmatched by any current annotation.
4,956 citations
••
TL;DR: A catalogue of predicted loss-of-function variants in 125,748 whole-exome and 15,708 whole-genome sequencing datasets from the Genome Aggregation Database (gnomAD) reveals the spectrum of mutational constraints that affect these human protein-coding genes.
Abstract: Genetic variants that inactivate protein-coding genes are a powerful source of information about the phenotypic consequences of gene disruption: genes that are crucial for the function of an organism will be depleted of such variants in natural populations, whereas non-essential genes will tolerate their accumulation. However, predicted loss-of-function variants are enriched for annotation errors, and tend to be found at extremely low frequencies, so their analysis requires careful variant annotation and very large sample sizes1. Here we describe the aggregation of 125,748 exomes and 15,708 genomes from human sequencing studies into the Genome Aggregation Database (gnomAD). We identify 443,769 high-confidence predicted loss-of-function variants in this cohort after filtering for artefacts caused by sequencing and annotation errors. Using an improved model of human mutation rates, we classify human protein-coding genes along a spectrum that represents tolerance to inactivation, validate this classification using data from model organisms and engineered human cells, and show that it can be used to improve the power of gene discovery for both common and rare diseases. A catalogue of predicted loss-of-function variants in 125,748 whole-exome and 15,708 whole-genome sequencing datasets from the Genome Aggregation Database (gnomAD) reveals the spectrum of mutational constraints that affect these human protein-coding genes.
4,913 citations