scispace - formally typeset
Search or ask a question
Author

David Altshuler

Bio: David Altshuler is an academic researcher from University of Michigan. The author has contributed to research in topics: Genome-wide association study & Population. The author has an hindex of 162, co-authored 345 publications receiving 201782 citations. Previous affiliations of David Altshuler include Vertex Pharmaceuticals & Massachusetts Institute of Technology.


Papers
More filters
Journal ArticleDOI
Leslie A. Lange1, Youna Hu2, He Zhang2, Chenyi Xue2, Ellen M. Schmidt2, Zheng-Zheng Tang1, Chris Bizon3, Ethan M. Lange1, Joshua D. Smith4, Emily H. Turner4, Goo Jun2, Hyun Min Kang2, Gina M. Peloso5, Paul L. Auer6, Kuo Ping Li2, Jason Flannick7, Ji Zhang2, Christian Fuchsberger2, Kyle J. Gaulton8, Cecilia M. Lindgren8, Adam E. Locke2, Alisa K. Manning7, Xueling Sim2, Manuel A. Rivas8, Oddgeir L. Holmen9, Omri Gottesman10, Yingchang Lu10, Douglas M. Ruderfer10, Eli A. Stahl10, Qing Duan1, Yun Li1, Peter Durda11, Shuo Jiao12, Aaron Isaacs13, Albert Hofman13, Joshua C. Bis4, Adolfo Correa14, Michael Griswold14, Johanna Jakobsdottir, Albert V. Smith15, Pamela J. Schreiner16, Mary F. Feitosa17, Qunyuan Zhang17, Jennifer E. Huffman18, Jacy R Crosby19, Christina L. Wassel20, Ron Do5, Nora Franceschini1, Lisa W. Martin21, Jennifer G. Robinson22, Themistocles L. Assimes23, David R. Crosslin4, Elisabeth A. Rosenthal4, Michael Y. Tsai16, Mark J. Rieder4, Deborah N. Farlow5, Aaron R. Folsom16, Thomas Lumley24, Ervin R. Fox14, Christopher S. Carlson12, Ulrike Peters12, Rebecca D. Jackson25, Cornelia M. van Duijn13, André G. Uitterlinden13, Daniel Levy26, Jerome I. Rotter27, Herman A. Taylor28, Vilmundur Gudnason15, David S. Siscovick4, Myriam Fornage19, Ingrid B. Borecki17, Caroline Hayward18, Igor Rudan18, Y. Eugene Chen2, Erwin P. Bottinger10, Ruth J. F. Loos10, Pål Sætrom9, Kristian Hveem9, Michael Boehnke2, Leif Groop29, Mark I. McCarthy8, Thomas Meitinger30, Christie M. Ballantyne31, Stacey Gabriel5, Christopher J. O'Donnell7, Wendy S. Post32, Kari E. North1, Alexander P. Reiner4, Eric Boerwinkle19, Bruce M. Psaty33, David Altshuler7, Sekar Kathiresan7, Danyu Lin1, Gail P. Jarvik4, L. Adrienne Cupples26, Charles Kooperberg12, James G. Wilson14, Deborah A. Nickerson4, Gonçalo R. Abecasis2, Stephen S. Rich34, Russell P. Tracy11, Cristen J. Willer2 
TL;DR: This large whole-exome-sequencing study for LDL-C identified a gene not known to be implicated in LDL- C and provides unique insight into the design and analysis of similar experiments.
Abstract: Elevated low-density lipoprotein cholesterol (LDL-C) is a treatable, heritable risk factor for cardiovascular disease. Genome-wide association studies (GWASs) have identified 157 variants associated with lipid levels but are not well suited to assess the impact of rare and low-frequency variants. To determine whether rare or low-frequency coding variants are associated with LDL-C, we exome sequenced 2,005 individuals, including 554 individuals selected for extreme LDL-C (>98(th) or <2(nd) percentile). Follow-up analyses included sequencing of 1,302 additional individuals and genotype-based analysis of 52,221 individuals. We observed significant evidence of association between LDL-C and the burden of rare or low-frequency variants in PNPLA5, encoding a phospholipase-domain-containing protein, and both known and previously unidentified variants in PCSK9, LDLR and APOB, three known lipid-related genes. The effect sizes for the burden of rare variants for each associated gene were substantially higher than those observed for individual SNPs identified from GWASs. We replicated the PNPLA5 signal in an independent large-scale sequencing study of 2,084 individuals. In conclusion, this large whole-exome-sequencing study for LDL-C identified a gene not known to be implicated in LDL-C and provides unique insight into the design and analysis of similar experiments.

201 citations

Journal ArticleDOI
TL;DR: Saturation mutagenesis and prospective experimental characterization can support immediate diagnostic interpretation of newly discovered missense variants in disease-related genes.
Abstract: Amit Majithia and colleagues employ a pooled assay in human macrophages to assess the functional effects of all possible missense variants in PPARG. Their study shows the value of saturation mutagenesis and prospective experimental characterization to support diagnostic interpretation of newly discovered missense variants in disease-related genes. Clinical exome sequencing routinely identifies missense variants in disease-related genes, but functional characterization is rarely undertaken, leading to diagnostic uncertainty1,2. For example, mutations in PPARG cause Mendelian lipodystrophy3,4 and increase risk of type 2 diabetes (T2D)5. Although approximately 1 in 500 people harbor missense variants in PPARG, most are of unknown consequence. To prospectively characterize PPARγ variants, we used highly parallel oligonucleotide synthesis to construct a library encoding all 9,595 possible single–amino acid substitutions. We developed a pooled functional assay in human macrophages, experimentally evaluated all protein variants, and used the experimental data to train a variant classifier by supervised machine learning. When applied to 55 new missense variants identified in population-based and clinical sequencing, the classifier annotated 6 variants as pathogenic; these were subsequently validated by single-variant assays. Saturation mutagenesis and prospective experimental characterization can support immediate diagnostic interpretation of newly discovered missense variants in disease-related genes.

196 citations

Journal ArticleDOI
10 May 2007-Nature
TL;DR: A community resource project recently launched by the National Human Genome Research Institute to sequence large-insert clones from many individuals, systematically discovering and resolving these complex variants at the DNA sequence level is described.
Abstract: Large-scale studies of human genetic variation have focused largely on understanding the pattern and nature of single-nucleotide differences within the human genome. Recent studies that have identified larger polymorphisms, such as insertions, deletions and inversions, emphasize the value of investing in more comprehensive and systematic studies of human structural genetic variation. We describe a community resource project recently launched by the National Human Genome Research Institute (NHGRI) to sequence large-insert clones from many individuals, systematically discovering and resolving these complex variants at the DNA sequence level. The project includes the discovery of variants through development of clone resources, sequence resolution of variants, and accurate typing of variants in individuals of African, European or Asian ancestry. Sequence resolution of both single-nucleotide and larger-scale genomic variants will improve our picture of natural variation in human populations and will enhance our ability to link genetics and human health.

196 citations

Journal ArticleDOI
TL;DR: The consistency of allelic associations in diverse racial/ethnic groups is not predicted under the hypothesis of Goldstein regarding “synthetic associations” of rare mutations in T2D.
Abstract: It has been recently hypothesized that many of the signals detected in genome-wide association studies (GWAS) to T2D and other diseases, despite being observed to common variants, might in fact result from causal mutations that are rare. One prediction of this hypothesis is that the allelic associations should be population-specific, as the causal mutations arose after the migrations that established different populations around the world. We selected 19 common variants found to be reproducibly associated to T2D risk in European populations and studied them in a large multiethnic case-control study (6,142 cases and 7,403 controls) among men and women from 5 racial/ethnic groups (European Americans, African Americans, Latinos, Japanese Americans, and Native Hawaiians). In analysis pooled across ethnic groups, the allelic associations were in the same direction as the original report for all 19 variants, and 14 of the 19 were significantly associated with risk. In summing the number of risk alleles for each individual, the per-allele associations were highly statistically significant (P,10 24 ) and similar in all populations (odds ratios 1.09–1.12) except in Japanese Americans the estimated effect per allele was larger than in the other populations (1.20; Phet=3.8610 24 ). We did not observe ethnic differences in the distribution of risk that would explain the increased prevalence of type 2 diabetes in these groups as compared to European Americans. The consistency of allelic associations in diverse racial/ethnic groups is not predicted under the hypothesis of Goldstein regarding ‘‘synthetic associations’’ of rare mutations in T2D.

193 citations

Journal ArticleDOI
TL;DR: Electroporation of CD34+ hematopoietic stem and progenitor cells obtained from healthy donors was performed, with CRISPR-Cas9 targeting the BCL11A erythroid-specific enhancer, and approximately 80% of the alleles at this locus were modified, with no evidence of off-target editing.
Abstract: Transfusion-dependent β-thalassemia (TDT) and sickle cell disease (SCD) are severe monogenic diseases with severe and potentially life-threatening manifestations. BCL11A is a transcription factor that represses γ-globin expression and fetal hemoglobin in erythroid cells. We performed electroporation of CD34+ hematopoietic stem and progenitor cells obtained from healthy donors, with CRISPR-Cas9 targeting the BCL11A erythroid-specific enhancer. Approximately 80% of the alleles at this locus were modified, with no evidence of off-target editing. After undergoing myeloablation, two patients - one with TDT and the other with SCD - received autologous CD34+ cells edited with CRISPR-Cas9 targeting the same BCL11A enhancer. More than a year later, both patients had high levels of allelic editing in bone marrow and blood, increases in fetal hemoglobin that were distributed pancellularly, transfusion independence, and (in the patient with SCD) elimination of vaso-occlusive episodes. (Funded by CRISPR Therapeutics and Vertex Pharmaceuticals; ClinicalTrials.gov numbers, NCT03655678 for CLIMB THAL-111 and NCT03745287 for CLIMB SCD-121.).

192 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: Bowtie 2 combines the strengths of the full-text minute index with the flexibility and speed of hardware-accelerated dynamic programming algorithms to achieve a combination of high speed, sensitivity and accuracy.
Abstract: As the rate of sequencing increases, greater throughput is demanded from read aligners. The full-text minute index is often used to make alignment very fast and memory-efficient, but the approach is ill-suited to finding longer, gapped alignments. Bowtie 2 combines the strengths of the full-text minute index with the flexibility and speed of hardware-accelerated dynamic programming algorithms to achieve a combination of high speed, sensitivity and accuracy.

37,898 citations

Journal ArticleDOI
TL;DR: The Gene Set Enrichment Analysis (GSEA) method as discussed by the authors focuses on gene sets, that is, groups of genes that share common biological function, chromosomal location, or regulation.
Abstract: Although genomewide RNA expression analysis has become a routine tool in biomedical research, extracting biological insight from such information remains a major challenge. Here, we describe a powerful analytical method called Gene Set Enrichment Analysis (GSEA) for interpreting gene expression data. The method derives its power by focusing on gene sets, that is, groups of genes that share common biological function, chromosomal location, or regulation. We demonstrate how GSEA yields insights into several cancer-related data sets, including leukemia and lung cancer. Notably, where single-gene analysis finds little similarity between two independent studies of patient survival in lung cancer, GSEA reveals many biological pathways in common. The GSEA method is embodied in a freely available software package, together with an initial database of 1,325 biologically defined gene sets.

34,830 citations

Journal ArticleDOI
TL;DR: This work introduces PLINK, an open-source C/C++ WGAS tool set, and describes the five main domains of function: data management, summary statistics, population stratification, association analysis, and identity-by-descent estimation, which focuses on the estimation and use of identity- by-state and identity/descent information in the context of population-based whole-genome studies.
Abstract: Whole-genome association studies (WGAS) bring new computational, as well as analytic, challenges to researchers. Many existing genetic-analysis tools are not designed to handle such large data sets in a convenient manner and do not necessarily exploit the new opportunities that whole-genome data bring. To address these issues, we developed PLINK, an open-source C/C++ WGAS tool set. With PLINK, large data sets comprising hundreds of thousands of markers genotyped for thousands of individuals can be rapidly manipulated and analyzed in their entirety. As well as providing tools to make the basic analytic steps computationally efficient, PLINK also supports some novel approaches to whole-genome data that take advantage of whole-genome coverage. We introduce PLINK and describe the five main domains of function: data management, summary statistics, population stratification, association analysis, and identity-by-descent estimation. In particular, we focus on the estimation and use of identity-by-state and identity-by-descent information in the context of population-based whole-genome studies. This information can be used to detect and correct for population stratification and to identify extended chromosomal segments that are shared identical by descent between very distantly related individuals. Analysis of the patterns of segmental sharing has the potential to map disease loci that contain multiple rare variants in a population-based linkage analysis.

26,280 citations

Journal ArticleDOI
Eric S. Lander1, Lauren Linton1, Bruce W. Birren1, Chad Nusbaum1  +245 moreInstitutions (29)
15 Feb 2001-Nature
TL;DR: The results of an international collaboration to produce and make freely available a draft sequence of the human genome are reported and an initial analysis is presented, describing some of the insights that can be gleaned from the sequence.
Abstract: The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence.

22,269 citations

Journal ArticleDOI
TL;DR: The philosophy and design of the limma package is reviewed, summarizing both new and historical features, with an emphasis on recent enhancements and features that have not been previously described.
Abstract: limma is an R/Bioconductor software package that provides an integrated solution for analysing data from gene expression experiments. It contains rich features for handling complex experimental designs and for information borrowing to overcome the problem of small sample sizes. Over the past decade, limma has been a popular choice for gene discovery through differential expression analyses of microarray and high-throughput PCR data. The package contains particularly strong facilities for reading, normalizing and exploring such data. Recently, the capabilities of limma have been significantly expanded in two important directions. First, the package can now perform both differential expression and differential splicing analyses of RNA sequencing (RNA-seq) data. All the downstream analysis tools previously restricted to microarray data are now available for RNA-seq as well. These capabilities allow users to analyse both RNA-seq and microarray data with very similar pipelines. Second, the package is now able to go past the traditional gene-wise expression analyses in a variety of ways, analysing expression profiles in terms of co-regulated sets of genes or in terms of higher-order expression signatures. This provides enhanced possibilities for biological interpretation of gene expression differences. This article reviews the philosophy and design of the limma package, summarizing both new and historical features, with an emphasis on recent enhancements and features that have not been previously described.

22,147 citations