scispace - formally typeset
Search or ask a question
Author

David Altshuler

Bio: David Altshuler is an academic researcher from University of Michigan. The author has contributed to research in topics: Genome-wide association study & Population. The author has an hindex of 162, co-authored 345 publications receiving 201782 citations. Previous affiliations of David Altshuler include Vertex Pharmaceuticals & Massachusetts Institute of Technology.


Papers
More filters
Journal ArticleDOI
TL;DR: Rare sarcomere protein variants cause dominant hypertrophic and dilated cardiomyopathies and were associated with an increased risk for adverse cardiovascular events in the FHS cohort, suggesting that cardiovascular risk assessment in the general population can benefit from rare variant analysis.
Abstract: Rare sarcomere protein variants cause dominant hypertrophic and dilated cardiomyopathies. To evaluate whether allelic variants in eight sarcomere genes are associated with cardiac morphology and function in the community, we sequenced 3,600 individuals from the Framingham Heart Study (FHS) and Jackson Heart Study (JHS) cohorts. Out of the total, 11.2% of individuals had one or more rare nonsynonymous sarcomere variants. The prevalence of likely pathogenic sarcomere variants was 0.6%, twice the previous estimates; however, only four of the 22 individuals had clinical manifestations of hypertrophic cardiomyopathy. Rare sarcomere variants were associated with an increased risk for adverse cardiovascular events (hazard ratio: 2.3) in the FHS cohort, suggesting that cardiovascular risk assessment in the general population can benefit from rare variant analysis.

127 citations

Journal ArticleDOI
TL;DR: Accurate estimates of variant effect sizes from population-based sequencing are needed to avoid falsely predicting a substantial fraction of individuals as being at risk for MODY or other Mendelian diseases.
Abstract: Genome sequencing can identify individuals in the general population who harbor rare coding variants in genes for Mendelian disorders and who may consequently have increased disease risk. Previous studies of rare variants in phenotypically extreme individuals display ascertainment bias and may demonstrate inflated effect-size estimates. We sequenced seven genes for maturity-onset diabetes of the young (MODY) in well-phenotyped population samples (n = 4,003). We filtered rare variants according to two prediction criteria for disease-causing mutations: reported previously in MODY or satisfying stringent de novo thresholds (rare, conserved and protein damaging). Approximately 1.5% and 0.5% of randomly selected individuals from the Framingham and Jackson Heart Studies, respectively, carry variants from these two classes. However, the vast majority of carriers remain euglycemic through middle age. Accurate estimates of variant effect sizes from population-based sequencing are needed to avoid falsely predicting a substantial fraction of individuals as being at risk for MODY or other Mendelian diseases.

123 citations

Journal ArticleDOI
TL;DR: It is found that while studies sampling from extremes have excellent power to discover rare variants, they have limited power to associate them to phenotype—suggesting high false‐negative rates for upcoming studies.
Abstract: Next-generation sequencing technologies are making it possible to study the role of rare variants in human disease. Many studies balance statistical power with cost-effectiveness by (a) sampling from phenotypic extremes and (b) utilizing a two-stage design. Two-stage designs include a broad-based discovery phase and selection of a subset of potential causal genes/variants to be further examined in independent samples. We evaluate three parameters: first, the gain in statistical power due to extreme sampling to discover causal variants; second, the informativeness of initial (Phase I) association statistics to select genes/variants for follow-up; third, the impact of extreme and random sampling in (Phase 2) replication. We present a quantitative method to select individuals from the phenotypic extremes of a binary trait, and simulate disease association studies under a variety of sample sizes and sampling schemes. First, we find that while studies sampling from extremes have excellent power to discover rare variants, they have limited power to associate them to phenotype-suggesting high false-negative rates for upcoming studies. Second, consistent with previous studies, we find that the effect sizes estimated in these studies are expected to be systematically larger compared with the overall population effect size; in a well-cited lipids study, we estimate the reported effect to be twofold larger. Third, replication studies require large samples from the general population to have sufficient power; extreme sampling could reduce the required sample size as much as fourfold. Our observations offer practical guidance for the design and interpretation of studies that utilize extreme sampling. Genet. Epidemiol. 35: 236-246, 2011. (c) 2011 Wiley-Liss, Inc. (Less)

122 citations

Journal ArticleDOI
29 Jun 2017-Cell
TL;DR: It is demonstrated that SLC16A11 is a proton-coupled monocarboxylate transporter and that genetic perturbation of SLC15A11 induces changes in fatty acid and lipid metabolism that are associated with increased T2D risk, and suggested that increasing SLC 16A11 function could be therapeutically beneficial for T1D.

120 citations

Journal ArticleDOI
TL;DR: Inherited variation in IGF1 may play a role in the risk of prostate cancer and four blocks of strong linkage disequilibrium were identified that could account for the haplotype findings.
Abstract: Background: Insulin-like growth factor I (IGF-I) appears to play a role in prostate development and carcinogenesis. We investigated whether genetic variation at the IGF1 locus is associated with prostate cancer risk. Methods: We sequenced IGF1 exons in germline DNA from 95 men with advanced prostate cancer to identify missense variants. IGF1 linkage disequilibrium patterns and common haplotypes were characterized by genotyping 64 single-nucleotide polymorphisms (SNPs) spanning 156 kilobases in 349 control subjects. Associations between IGF1 haplotypes and genotypes were investigated among 2320 patients with prostate cancer and 2290 control subjects from the Multiethnic Cohort. Odds ratios (ORs) and 95% confi dence intervals (CIs) were estimated by unconditional logistic regression to determine the association between prostate cancer and IGF1 haplotypes and genotypes. We used per mutation testing to correct for multiple hypothesis testing. All statistical tests were two-sided. Results: No IGF1 missense variants were observed. We identifi ed four blocks of strong linkage disequilibrium and selected a subset of 29 tagging SNPs that could accurately predict both the common IGF1 haplotypes and the remaining SNPs. Haplotype analysis revealed nominally statistically signifi cant associations with prostate cancer risk in each of the four haplotype blocks: haplotype 1B (OR = 1.21, 95% CI = 1.04 to 1.40), haplotype 2C (OR = 1.24, 95% CI = 1.06 to 1.44), haplotype 3C (OR = 1.25, 95% CI = 1.03 to 1.50), and haplotype 4D (OR = 1.19, 95% CI = 1.02 to 1.39). Two SNPs — rs7978742 ( P trend = .002) and rs7965399 ( P trend = .002) — were perfectly correlated (correlation coeffi cient = 1.0) with one another and also associated with prostate cancer risk. These two SNPs were strong proxies for haplotypes 1B, 2C, 3C, and 4D and could account for the haplotype fi ndings. Permutation testing revealed that a similarly strong result would be observed by chance only 5.6% of the time. Conclusion: Inherited variation in IGF1 may play a role in the risk of prostate cancer. [J Natl Cancer Inst 2006;98:123 – 34]

120 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: Bowtie 2 combines the strengths of the full-text minute index with the flexibility and speed of hardware-accelerated dynamic programming algorithms to achieve a combination of high speed, sensitivity and accuracy.
Abstract: As the rate of sequencing increases, greater throughput is demanded from read aligners. The full-text minute index is often used to make alignment very fast and memory-efficient, but the approach is ill-suited to finding longer, gapped alignments. Bowtie 2 combines the strengths of the full-text minute index with the flexibility and speed of hardware-accelerated dynamic programming algorithms to achieve a combination of high speed, sensitivity and accuracy.

37,898 citations

Journal ArticleDOI
TL;DR: The Gene Set Enrichment Analysis (GSEA) method as discussed by the authors focuses on gene sets, that is, groups of genes that share common biological function, chromosomal location, or regulation.
Abstract: Although genomewide RNA expression analysis has become a routine tool in biomedical research, extracting biological insight from such information remains a major challenge. Here, we describe a powerful analytical method called Gene Set Enrichment Analysis (GSEA) for interpreting gene expression data. The method derives its power by focusing on gene sets, that is, groups of genes that share common biological function, chromosomal location, or regulation. We demonstrate how GSEA yields insights into several cancer-related data sets, including leukemia and lung cancer. Notably, where single-gene analysis finds little similarity between two independent studies of patient survival in lung cancer, GSEA reveals many biological pathways in common. The GSEA method is embodied in a freely available software package, together with an initial database of 1,325 biologically defined gene sets.

34,830 citations

Journal ArticleDOI
TL;DR: This work introduces PLINK, an open-source C/C++ WGAS tool set, and describes the five main domains of function: data management, summary statistics, population stratification, association analysis, and identity-by-descent estimation, which focuses on the estimation and use of identity- by-state and identity/descent information in the context of population-based whole-genome studies.
Abstract: Whole-genome association studies (WGAS) bring new computational, as well as analytic, challenges to researchers. Many existing genetic-analysis tools are not designed to handle such large data sets in a convenient manner and do not necessarily exploit the new opportunities that whole-genome data bring. To address these issues, we developed PLINK, an open-source C/C++ WGAS tool set. With PLINK, large data sets comprising hundreds of thousands of markers genotyped for thousands of individuals can be rapidly manipulated and analyzed in their entirety. As well as providing tools to make the basic analytic steps computationally efficient, PLINK also supports some novel approaches to whole-genome data that take advantage of whole-genome coverage. We introduce PLINK and describe the five main domains of function: data management, summary statistics, population stratification, association analysis, and identity-by-descent estimation. In particular, we focus on the estimation and use of identity-by-state and identity-by-descent information in the context of population-based whole-genome studies. This information can be used to detect and correct for population stratification and to identify extended chromosomal segments that are shared identical by descent between very distantly related individuals. Analysis of the patterns of segmental sharing has the potential to map disease loci that contain multiple rare variants in a population-based linkage analysis.

26,280 citations

Journal ArticleDOI
Eric S. Lander1, Lauren Linton1, Bruce W. Birren1, Chad Nusbaum1  +245 moreInstitutions (29)
15 Feb 2001-Nature
TL;DR: The results of an international collaboration to produce and make freely available a draft sequence of the human genome are reported and an initial analysis is presented, describing some of the insights that can be gleaned from the sequence.
Abstract: The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence.

22,269 citations

Journal ArticleDOI
TL;DR: The philosophy and design of the limma package is reviewed, summarizing both new and historical features, with an emphasis on recent enhancements and features that have not been previously described.
Abstract: limma is an R/Bioconductor software package that provides an integrated solution for analysing data from gene expression experiments. It contains rich features for handling complex experimental designs and for information borrowing to overcome the problem of small sample sizes. Over the past decade, limma has been a popular choice for gene discovery through differential expression analyses of microarray and high-throughput PCR data. The package contains particularly strong facilities for reading, normalizing and exploring such data. Recently, the capabilities of limma have been significantly expanded in two important directions. First, the package can now perform both differential expression and differential splicing analyses of RNA sequencing (RNA-seq) data. All the downstream analysis tools previously restricted to microarray data are now available for RNA-seq as well. These capabilities allow users to analyse both RNA-seq and microarray data with very similar pipelines. Second, the package is now able to go past the traditional gene-wise expression analyses in a variety of ways, analysing expression profiles in terms of co-regulated sets of genes or in terms of higher-order expression signatures. This provides enhanced possibilities for biological interpretation of gene expression differences. This article reviews the philosophy and design of the limma package, summarizing both new and historical features, with an emphasis on recent enhancements and features that have not been previously described.

22,147 citations