Author
Sean McGee
Bio: Sean McGee is an academic researcher from University of Washington. The author has contributed to research in topics: Exome sequencing & Medicine. The author has an hindex of 5, co-authored 6 publications receiving 2007 citations.
Topics: Exome sequencing, Medicine, Exome, Genotyping, DNA sequencing
Papers
More filters
••
TL;DR: The findings suggest that most human variation is rare, not shared between populations, and that rare variants are likely to play a role in human health, and show that large sample sizes will be required to associate rare variants with complex traits.
Abstract: As a first step toward understanding how rare variants contribute to risk for complex diseases, we sequenced 15,585 human protein-coding genes to an average median depth of 111× in 2440 individuals of European (n = 1351) and African (n = 1088) ancestry. We identified over 500,000 single-nucleotide variants (SNVs), the majority of which were rare (86% with a minor allele frequency less than 0.5%), previously unknown (82%), and population-specific (82%). On average, 2.3% of the 13,595 SNVs each person carried were predicted to affect protein function of ~313 genes per genome, and ~95.7% of SNVs predicted to be functionally important were rare. This excess of rare functional variants is due to the combined effects of explosive, recent accelerated population growth and weak purifying selection. Furthermore, we show that large sample sizes will be required to associate rare variants with complex traits.
1,680 citations
••
TL;DR: It is concluded that new comprehensive genomic approaches have identified rare variants in BAG3 as causative of DCM.
Abstract: Dilated cardiomyopathy commonly causes heart failure and is the most frequent precipitating cause of heart transplantation. Familial dilated cardiomyopathy has been shown to be caused by rare variant mutations in more than 30 genes but only ∼35% of its genetic cause has been identified, principally by using linkage-based or candidate gene discovery approaches. In a multigenerational family with autosomal dominant transmission, we employed whole-exome sequencing in a proband and three of his affected family members, and genome-wide copy number variation in the proband and his affected father and unaffected mother. Exome sequencing identified 428 single point variants resulting in missense, nonsense, or splice site changes. Genome-wide copy number analysis identified 51 insertion deletions and 440 copy number variants > 1 kb. Of these, a 8733 bp deletion, encompassing exon 4 of the heat shock protein cochaperone BCL2-associated athanogene 3 (BAG3), was found in seven affected family members and was absent in 355 controls. To establish the relevance of variants in this protein class in genetic DCM, we sequenced the coding exons in BAG3 in 311 other unrelated DCM probands and identified one frameshift, two nonsense, and four missense rare variants absent in 355 control DNAs, four of which were familial and segregated with disease. Knockdown of bag3 in a zebrafish model recapitulated DCM and heart failure. We conclude that new comprehensive genomic approaches have identified rare variants in BAG3 as causative of DCM.
326 citations
••
TL;DR: Stargazer is a new bioinformatics tool that uses next-generation sequencing (NGS) data to call star alleles for CYP2D6 and subsequent allele calling with Stargazer will aid the implementation of precision drug therapy.
80 citations
••
TL;DR: In this article, the authors assessed the association of several rare European von Willebrand disease missense variants of VWF (including p.Arg2185Gln and p.His817Gln) with levels of FVIII in apparently healthy African Americans (AAs).
72 citations
••
Baylor College of Medicine1, University of North Carolina at Chapel Hill2, Partners HealthCare3, University of Washington4, National Institutes of Health5, Brigham and Women's Hospital6, Children's Hospital of Philadelphia7, University of Michigan8, Broad Institute9, Renaissance Computing Institute10, University of Pennsylvania11, Harvard University12
TL;DR: A definition for reduced coverage regions is proposed and a set of standards for variant calling in clinical sequencing applications are described to describe a process road map for clinical sequencing centers looking to perform similar analyses on their data.
24 citations
Cited by
More filters
28 Jul 2005
TL;DR: PfPMP1)与感染红细胞、树突状组胞以及胎盘的单个或多个受体作用,在黏附及免疫逃避中起关键的作�ly.
Abstract: 抗原变异可使得多种致病微生物易于逃避宿主免疫应答。表达在感染红细胞表面的恶性疟原虫红细胞表面蛋白1(PfPMP1)与感染红细胞、内皮细胞、树突状细胞以及胎盘的单个或多个受体作用,在黏附及免疫逃避中起关键的作用。每个单倍体基因组var基因家族编码约60种成员,通过启动转录不同的var基因变异体为抗原变异提供了分子基础。
18,940 citations
••
Harvard University1, Broad Institute2, Boston Children's Hospital3, University of Washington4, University of Arizona5, Cardiff University6, Google7, Icahn School of Medicine at Mount Sinai8, Samsung Medical Center9, Vertex Pharmaceuticals10, University of Michigan11, University of Cambridge12, State University of New York Upstate Medical University13, Karolinska Institutet14, University of Eastern Finland15, Wellcome Trust Centre for Human Genetics16, University of Oxford17, Cedars-Sinai Medical Center18, University of Ottawa19, University of Pennsylvania20, University of North Carolina at Chapel Hill21, University of Helsinki22, University of California, San Diego23, University of Mississippi Medical Center24
TL;DR: The aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries generated as part of the Exome Aggregation Consortium (ExAC) provides direct evidence for the presence of widespread mutational recurrence.
Abstract: Large-scale reference data sets of human genetic variation are critical for the medical and functional interpretation of DNA sequence changes. Here we describe the aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries generated as part of the Exome Aggregation Consortium (ExAC). This catalogue of human genetic diversity contains an average of one variant every eight bases of the exome, and provides direct evidence for the presence of widespread mutational recurrence. We have used this catalogue to calculate objective metrics of pathogenicity for sequence variants, and to identify genes subject to strong selection against various classes of mutation; identifying 3,230 genes with near-complete depletion of predicted protein-truncating variants, with 72% of these genes having no currently established human disease phenotype. Finally, we demonstrate that these data can be used for the efficient filtering of candidate disease-causing variants, and for the discovery of human 'knockout' variants in protein-coding genes.
8,758 citations
••
TL;DR: It is shown that evolutionary conservation and coding consequence are key determinants of the strength of purifying selection, that rare-variant load varies substantially across biological pathways, and that each individual contains hundreds of rare non-coding variants at conserved sites, such as motif-disrupting changes in transcription-factor-binding sites.
Abstract: By characterizing the geographic and functional spectrum of human genetic variation, the 1000 Genomes Project aims to build a resource to help to understand the genetic contribution to disease. Here we describe the genomes of 1,092 individuals from 14 populations, constructed using a combination of low-coverage whole-genome and exome sequencing. By developing methods to integrate information across several algorithms and diverse data sources, we provide a validated haplotype map of 38 million single nucleotide polymorphisms, 1.4 million short insertions and deletions, and more than 14,000 larger deletions. We show that individuals from different populations carry different profiles of rare and common variants, and that low-frequency variants show substantial geographic differentiation, which is further increased by the action of purifying selection. We show that evolutionary conservation and coding consequence are key determinants of the strength of purifying selection, that rare-variant load varies substantially across biological pathways, and that each individual contains hundreds of rare non-coding variants at conserved sites, such as motif-disrupting changes in transcription-factor-binding sites. This resource, which captures up to 98% of accessible single nucleotide polymorphisms at a frequency of 1% in related populations, enables analysis of common and low-frequency variants in individuals from diverse, including admixed, populations.
7,710 citations
••
TL;DR: The ability of CADD to prioritize functional, deleterious and pathogenic variants across many functional categories, effect sizes and genetic architectures is unmatched by any current single-annotation method.
Abstract: Our capacity to sequence human genomes has exceeded our ability to interpret genetic variation. Current genomic annotations tend to exploit a single information type (e.g. conservation) and/or are restricted in scope (e.g. to missense changes). Here, we describe Combined Annotation Dependent Depletion (CADD), a framework that objectively integrates many diverse annotations into a single, quantitative score. We implement CADD as a support vector machine trained to differentiate 14.7 million high-frequency human derived alleles from 14.7 million simulated variants. We pre-compute “C-scores” for all 8.6 billion possible human single nucleotide variants and enable scoring of short insertions/deletions. C-scores correlate with allelic diversity, annotations of functionality, pathogenicity, disease severity, experimentally measured regulatory effects, and complex trait associations, and highly rank known pathogenic variants within individual genomes. The ability of CADD to prioritize functional, deleterious, and pathogenic variants across many functional categories, effect sizes and genetic architectures is unmatched by any current annotation.
4,956 citations
••
TL;DR: PolyPhen‐2 (Polymorphism Phenotyping v2), available as software and via a Web server, predicts the possible impact of amino acid substitutions on the stability and function of human proteins using structural and comparative evolutionary considerations.
Abstract: PolyPhen-2 (Polymorphism Phenotyping v2), available as software and via a Web server, predicts the possible impact of amino acid substitutions on the stability and function of human proteins using structural and comparative evolutionary considerations. It performs functional annotation of single-nucleotide polymorphisms (SNPs), maps coding SNPs to gene transcripts, extracts protein sequence annotations and structural attributes, and builds conservation profiles. It then estimates the probability of the missense mutation being damaging based on a combination of all these properties. PolyPhen-2 features include a high-quality multiple protein sequence alignment pipeline and a prediction method employing machine-learning classification. The software also integrates the UCSC Genome Browser's human genome annotations and MultiZ multiple alignments of vertebrate genomes with the human genome. PolyPhen-2 is capable of analyzing large volumes of data produced by next-generation sequencing projects, thanks to built-in support for high-performance computing environments like Grid Engine and Platform LSF.
2,681 citations