scispace - formally typeset
Search or ask a question
Author

David Altshuler

Bio: David Altshuler is an academic researcher from University of Michigan. The author has contributed to research in topics: Genome-wide association study & Population. The author has an hindex of 162, co-authored 345 publications receiving 201782 citations. Previous affiliations of David Altshuler include Vertex Pharmaceuticals & Massachusetts Institute of Technology.


Papers
More filters
Journal ArticleDOI
01 Feb 2017-Diabetes
TL;DR: The results suggest that functional characterization of variants within MODY genes may overcome the limitations of bioinformatics tools for the purposes of presymptomatic diabetes risk prediction in the general population.
Abstract: Variants in HNF1A encoding hepatocyte nuclear factor 1α (HNF-1A) are associated with maturity-onset diabetes of the young form 3 (MODY 3) and type 2 diabetes. We investigated whether functional classification of HNF1A rare coding variants can inform models of diabetes risk prediction in the general population by analyzing the effect of 27 HNF1A variants identified in well-phenotyped populations (n = 4,115). Bioinformatics tools classified 11 variants as likely pathogenic and showed no association with diabetes risk (combined minor allele frequency [MAF] 0.22%; odds ratio [OR] 2.02; 95% CI 0.73-5.60; P = 0.18). However, a different set of 11 variants that reduced HNF-1A transcriptional activity to <60% of normal (wild-type) activity was strongly associated with diabetes in the general population (combined MAF 0.22%; OR 5.04; 95% CI 1.99-12.80; P = 0.0007). Our functional investigations indicate that 0.44% of the population carry HNF1A variants that result in a substantially increased risk for developing diabetes. These results suggest that functional characterization of variants within MODY genes may overcome the limitations of bioinformatics tools for the purposes of presymptomatic diabetes risk prediction in the general population.

51 citations

Journal ArticleDOI
TL;DR: A spectrum of kinase genes whose overexpression can overcome NSCLC cells’ reliance on EGFR are identified and their convergence on the PI3K-AKT and MEK-ERK signaling axes in sustaining EGFR-independent survival is underscored.
Abstract: Lung adenocarcinomas harboring activating mutations in the epidermal growth factor receptor (EGFR) represent a common molecular subset of non-small cell lung cancer (NSCLC) cases. EGFR mutations predict sensitivity to EGFR tyrosine kinase inhibitors (TKIs) and thus represent a dependency in NSCLCs harboring these alterations, but the genetic basis of EGFR dependence is not fully understood. Here, we applied an unbiased, ORF-based screen to identify genetic modifiers of EGFR dependence in EGFR-mutant NSCLC cells. This approach identified 18 kinase and kinase-related genes whose overexpression can substitute for EGFR in EGFR-dependent PC9 cells, and these genes include seven of nine Src family kinase genes, FGFR1, FGFR2, ITK, NTRK1, NTRK2, MOS, MST1R, and RAF1. A subset of these genes can complement loss of EGFR activity across multiple EGFR-dependent models. Unbiased gene-expression profiling of cells overexpressing EGFR bypass genes, together with targeted validation studies, reveals EGFR-independent activation of the MEK-ERK and phosphoinositide 3-kinase (PI3K)-AKT pathways. Combined inhibition of PI3K-mTOR and MEK restores EGFR dependence in cells expressing each of the 18 EGFR bypass genes. Together, these data uncover a broad spectrum of kinases capable of overcoming dependence on EGFR and underscore their convergence on the PI3K-AKT and MEK-ERK signaling axes in sustaining EGFR-independent survival.

50 citations

Journal ArticleDOI
TL;DR: This work proposes a new conditioning approach, which is based in part on the classical technique of liability threshold modeling, and shows that it outperforms both the no conditioning strategy and the standard conditioning strategy, with a properly controlled false-positive rate.
Abstract: Motivation: The question of how to best use information from known associated variants when conducting disease association studies has yet to be answered. Some studies compute a marginal P-value for each Several Nucleotide Polymorphisms independently, ignoring previously discovered variants. Other studies include known variants as covariates in logistic regression, but a weakness of this standard conditioning strategy is that it does not account for disease prevalence and non-random ascertainment, which can induce a correlation structure between candidate variants and known associated variants even if the variants lie on different chromosomes. Here, we propose a new conditioning approach, which is based in part on the classical technique of liability threshold modeling. Roughly, this method estimates model parameters for each known variant while accounting for the published disease prevalence from the epidemiological literature. Results: We show via simulation and application to empirical datasets that our approach outperforms both the no conditioning strategy and the standard conditioning strategy, with a properly controlled false-positive rate. Furthermore, in multiple data sets involving diseases of low prevalence, standard conditioning produces a severe drop in test statistics whereas our approach generally performs as well or better than no conditioning. Our approach may substantially improve disease gene discovery for diseases with many known risk variants. Availability: LTSOFT software is available online http://www.hsph.harvard.edu/faculty/alkes-price/software/ Contact:nzaitlen@hsph.harvard.edu; aprice@hsph.harvard.edu Supplementary information: Supplementary data are available at Bioinformatics online.

46 citations

Patent
14 Jun 2004
TL;DR: In this article, a set of coordinately-regulated genes which regulate oxidative phosphorylation are described. But the authors do not specify how such genes are used to diagnose and diagnose mitochondrial diseases.
Abstract: The invention relates to novel methods of regulating metabolism and mitochondrial biogenesis. Some aspects of the invention relate to methods of treating or preventing diseases in a patient associated with reduced mitochondrial function, to methods of identifying agents to treat such diseases, and to methods of diagnosing such diseases. Other aspects of the invention relate to a set of coordinately-regulated genes which regulate oxidative phosphorylation.

44 citations

Journal ArticleDOI
TL;DR: Bias in estimations of linkage disequilibrium along the human genome and in the population under study are dissected to guide the understanding of empirical LD surveys and has implications for whole-genome association studies.
Abstract: Genetic association studies of common disease often rely on linkage disequilibrium (LD) along the human genome and in the population under study. Although understanding the characteristics of this correlation has been the focus of many large-scale surveys (culminating in genomewide haplotype maps), the results of different studies have yielded wide-ranging estimates. Since understanding these differences (and whether they can be reconciled) has important implications for whole-genome association studies, in this article we dissect biases in these estimations that are due to known aspects of study design and analytic methodology. In particular, we document in the empirical data that the long-known complicating effects of allele frequency, marker density, and sample size largely reconcile all large-scale surveys. Two exceptions are an underappraisal of redundancy among single-nucleotide polymorphisms (SNPs) when evaluation is limited to short regions (as in candidate-gene resequencing studies) and an inflation in the extent of LD in HapMap phase I, which is likely due to oversampling of specific haplotypes in the creation of the public SNP map. Understanding these factors can guide the understanding of empirical LD surveys and has implications for genetic association studies.

43 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: Bowtie 2 combines the strengths of the full-text minute index with the flexibility and speed of hardware-accelerated dynamic programming algorithms to achieve a combination of high speed, sensitivity and accuracy.
Abstract: As the rate of sequencing increases, greater throughput is demanded from read aligners. The full-text minute index is often used to make alignment very fast and memory-efficient, but the approach is ill-suited to finding longer, gapped alignments. Bowtie 2 combines the strengths of the full-text minute index with the flexibility and speed of hardware-accelerated dynamic programming algorithms to achieve a combination of high speed, sensitivity and accuracy.

37,898 citations

Journal ArticleDOI
TL;DR: The Gene Set Enrichment Analysis (GSEA) method as discussed by the authors focuses on gene sets, that is, groups of genes that share common biological function, chromosomal location, or regulation.
Abstract: Although genomewide RNA expression analysis has become a routine tool in biomedical research, extracting biological insight from such information remains a major challenge. Here, we describe a powerful analytical method called Gene Set Enrichment Analysis (GSEA) for interpreting gene expression data. The method derives its power by focusing on gene sets, that is, groups of genes that share common biological function, chromosomal location, or regulation. We demonstrate how GSEA yields insights into several cancer-related data sets, including leukemia and lung cancer. Notably, where single-gene analysis finds little similarity between two independent studies of patient survival in lung cancer, GSEA reveals many biological pathways in common. The GSEA method is embodied in a freely available software package, together with an initial database of 1,325 biologically defined gene sets.

34,830 citations

Journal ArticleDOI
TL;DR: This work introduces PLINK, an open-source C/C++ WGAS tool set, and describes the five main domains of function: data management, summary statistics, population stratification, association analysis, and identity-by-descent estimation, which focuses on the estimation and use of identity- by-state and identity/descent information in the context of population-based whole-genome studies.
Abstract: Whole-genome association studies (WGAS) bring new computational, as well as analytic, challenges to researchers. Many existing genetic-analysis tools are not designed to handle such large data sets in a convenient manner and do not necessarily exploit the new opportunities that whole-genome data bring. To address these issues, we developed PLINK, an open-source C/C++ WGAS tool set. With PLINK, large data sets comprising hundreds of thousands of markers genotyped for thousands of individuals can be rapidly manipulated and analyzed in their entirety. As well as providing tools to make the basic analytic steps computationally efficient, PLINK also supports some novel approaches to whole-genome data that take advantage of whole-genome coverage. We introduce PLINK and describe the five main domains of function: data management, summary statistics, population stratification, association analysis, and identity-by-descent estimation. In particular, we focus on the estimation and use of identity-by-state and identity-by-descent information in the context of population-based whole-genome studies. This information can be used to detect and correct for population stratification and to identify extended chromosomal segments that are shared identical by descent between very distantly related individuals. Analysis of the patterns of segmental sharing has the potential to map disease loci that contain multiple rare variants in a population-based linkage analysis.

26,280 citations

Journal ArticleDOI
Eric S. Lander1, Lauren Linton1, Bruce W. Birren1, Chad Nusbaum1  +245 moreInstitutions (29)
15 Feb 2001-Nature
TL;DR: The results of an international collaboration to produce and make freely available a draft sequence of the human genome are reported and an initial analysis is presented, describing some of the insights that can be gleaned from the sequence.
Abstract: The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence.

22,269 citations

Journal ArticleDOI
TL;DR: The philosophy and design of the limma package is reviewed, summarizing both new and historical features, with an emphasis on recent enhancements and features that have not been previously described.
Abstract: limma is an R/Bioconductor software package that provides an integrated solution for analysing data from gene expression experiments. It contains rich features for handling complex experimental designs and for information borrowing to overcome the problem of small sample sizes. Over the past decade, limma has been a popular choice for gene discovery through differential expression analyses of microarray and high-throughput PCR data. The package contains particularly strong facilities for reading, normalizing and exploring such data. Recently, the capabilities of limma have been significantly expanded in two important directions. First, the package can now perform both differential expression and differential splicing analyses of RNA sequencing (RNA-seq) data. All the downstream analysis tools previously restricted to microarray data are now available for RNA-seq as well. These capabilities allow users to analyse both RNA-seq and microarray data with very similar pipelines. Second, the package is now able to go past the traditional gene-wise expression analyses in a variety of ways, analysing expression profiles in terms of co-regulated sets of genes or in terms of higher-order expression signatures. This provides enhanced possibilities for biological interpretation of gene expression differences. This article reviews the philosophy and design of the limma package, summarizing both new and historical features, with an emphasis on recent enhancements and features that have not been previously described.

22,147 citations