scispace - formally typeset
Search or ask a question
Author

David Altshuler

Bio: David Altshuler is an academic researcher from University of Michigan. The author has contributed to research in topics: Genome-wide association study & Population. The author has an hindex of 162, co-authored 345 publications receiving 201782 citations. Previous affiliations of David Altshuler include Vertex Pharmaceuticals & Massachusetts Institute of Technology.


Papers
More filters
Journal ArticleDOI
TL;DR: The long-range LD around common alleles and limited diversity result in improved efficiency in genetic studies in this population and augments the power to detect association of 'hidden SNPs'.
Abstract: Whole-genome association studies are predicted to be especially powerful in isolated populations owing to increased linkage disequilibrium (LD) and decreased allelic diversity, but this possibility has not been empirically tested 1–3 .W e compared genome-wide data on 113,240 SNPs typed on 30 trios from the Pacific island of Kosrae to the same markers typed in the 270 samples from the International HapMap Project 4,5 . The extent of LD is longer and haplotype diversity is lower in Kosrae than in the HapMap populations. More than 98% of Kosraen haplotypes are present in HapMap populations, indicating that HapMap will be useful for genetic studies on Kosrae. The long-range LD around common alleles and limited diversity result in improved efficiency in genetic studies in this population and augments the power to detect association of ‘hidden SNPs’. The use of LD-based mapping strategies makes it practical to perform whole-genome association studies without typing every common variant in the human genome 6,7 . It has been suggested that the power of such studies is increased in isolated populations, and this has been demonstrated for rare alleles 8 . However, it remains unclear whether LD among common alleles in isolated populations extends significantly farther than in other populations—far enough to have impact for studies that seek to identify alleles contributing to common complex traits 9‐11 . Thus, although LD has been observed around rare alleles (such as disease genes) in isolated populations, facilitating disease gene mapping and cloning, this finding has not been extended to the common alleles and haplotypes that are the focus of the HapMap project. Even among isolated populations, variation in demographic history has powerful consequences for LD structure around rare alleles 3,12,13 , necessitating an evaluation of each population to assess the power of an LD-based mapping strategy. Thus, with the intense worldwide effort to generate a haplotype map for the general population, it is important to establish the utility of marker sets developed using samples from the HapMap project for use in other populations. We addressed these issues using samples from the island of Kosrae, Federated States of Micronesia 14 , which was settled by a small number of Micronesian founders B2,000 years ago 15 . In the current study, we generated a genome-wide high-density dataset of B110,000 SNPs to assess the effects of Kosraen population history on genomic variation in 30 Kosraen (KOS) trios. These same SNPs have been typed in the samples used in the HapMap project (see Methods). The 30 Kosraen trios were chosen such that all trios would be five or more generations separated from each other, which represents the most distant branches of the Kosrae pedigree. Thus, the individuals in these trios allow a fuller sampling of the range of diversity on Kosrae than if we had analyzed a random set of individuals. Comparing data on the same set of markers typed in samples included in the HapMap project permits examination of the relative extent of LD and haplotype diversity in Kosrae compared with these other populations.

71 citations

Journal ArticleDOI
TL;DR: The findings highlight the notion that clinical testing for rare missense mutations within CHEK2 may have limited value in predicting breast cancer risk, but that testing for the 1100delC variant may be valuable in phenotypically‐ and geographically‐selected populations.
Abstract: The CHEK2-1100delC mutation is recurrent in the population and is a moderate risk factor for breast cancer. To identify additional CHEK2 mutations potentially contributing to breast cancer susceptibility, we sequenced 248 cases with early-onset disease; functionally characterized new variants and conducted a population-based case-control analysis to evaluate their contribution to breast cancer risk. We identified 1 additional null mutation and 5 missense variants in the germline of cancer patients. In vitro, the CHEK2-H143Y variant resulted in gross protein destabilization, while others had variable suppression of in vitro kinase activity using BRCA1 as a substrate. The germline CHEK2-1100delC mutation was present among 8/1,646 (0.5%) sporadic, 2/400 (0.5%) early-onset and 3/302 (1%) familial breast cancer cases, but undetectable amongst 2,105 multiethnic controls, including 633 from the US. CHEK2-positive breast cancer families also carried a deleterious BRCA1 mutation. 1100delC appears to be the only recurrent CHEK2 mutation associated with a potentially significant contribution to breast cancer risk in the general population. Another recurrent mutation with attenuated in vitro function, CHEK2-P85L, is not associated with increased breast cancer susceptibility, but exhibits a striking difference in frequency across populations with different ancestral histories. These observations illustrate the importance of genotyping ethnically diverse groups when assessing the impact of low-penetrance susceptibility alleles on population risk. Our findings highlight the notion that clinical testing for rare missense mutations within CHEK2 may have limited value in predicting breast cancer risk, but that testing for the 1100delC variant may be valuable in phenotypically- and geographically-selected populations.

71 citations

Journal ArticleDOI
TL;DR: Within the Diabetes Prevention Program, the Ala12 allele influences central obesity, an effect which may differ by treatment group and dietary PUFA intake.
Abstract: Peroxisome proliferator-activated receptor γ (PPARγ), encoded by the PPARG gene, regulates insulin sensitivity and adipogenesis, and may bind polyunsaturated fatty acids (PUFA) and thiazolidinediones in a ligand-dependent manner. The PPARG proline for alanine substitution at position 12 (Pro12Ala polymorphism) has been related with obesity directly and via interaction with PUFA. We tested the effect-modifying role of Pro12Ala on the 1 year change in obesity-related traits in a randomised clinical trial of treatment with metformin (n = 989), troglitazone (n = 363) or lifestyle modification (n = 1,004) vs placebo (n = 1,000) for diabetes prevention in high-risk individuals. At baseline, Ala12 carriers had larger waists (p < 0.001) and, in a subset, more subcutaneous adipose tissue (SAT; lumbar 2/3; p = 0.04) than Pro12 homozygotes. There was a genotype-by-intervention interaction on 1-year weight change (p = 0.01); in the placebo arm, Pro12 homozygotes gained weight and Ala12 carriers lost weight (p = 0.001). In the metformin and lifestyle arms, weight loss occurred across genotypes, but was greatest in Ala12 carriers (p < 0.05). Troglitazone treatment induced weight gain, which tended to be greater in Ala12 carriers (p = 0.08). In the placebo group, SAT (lumbar 2/3, lumbar 4/5) decreased in Ala12 allele carriers, but was unchanged in Pro12 homozygotes (p ≤ 0.005). With metformin treatment, SAT decreased independently of genotype. In the lifestyle arm, SAT (lumbar 2/3) reductions occurred across genotypes, but were greater in Ala12 carriers (p = 0.03). A genotype-by-PUFA intake interaction on reduction in visceral fat (lumbar 4/5; p = 0.04) was also observed, which was most evident with metformin treatment (p < 0.001). Within the Diabetes Prevention Program, the Ala12 allele influences central obesity, an effect which may differ by treatment group and dietary PUFA intake (ClinicalTrials.gov ID no: NCT00004992).

71 citations

01 Jul 2011
TL;DR: In early-onset myocardial infarction, the 9p21.3 variant rs1333040 affects the progression of coronary atherosclerosis and the probability of coronary artery revascularization during long-term follow-up.
Abstract: OBJECTIVES The purpose of this study was to test whether the 9p21.3 variant rs1333040 influences the occurrence of new cardiovascular events and coronary atherosclerosis progression after early-onset myocardial infarction. BACKGROUND 9p21.3 genetic variants are associated with ischemic heart disease, but it is not known whether they influence prognosis after an acute coronary event. METHODS Within the Italian Genetic Study of Early-onset Myocardial Infarction, we genotyped rs1333040 in 1,508 patients hospitalized for a first myocardial infarction before the age of 45 years who underwent coronary angiography without index event coronary revascularization. They were followed up for major cardiovascular events and angiographic coronary atherosclerosis progression. RESULTS Over 16,599 person-years, there were 683 cardiovascular events and 492 primary endpoints: 77 cardiovascular deaths, 223 reoccurrences of myocardial infarction, and 383 coronary artery revascularizations. The rs1333040 genotype had a significant influence (p = 0.01) on the primary endpoint, with an adjusted hazard ratio of 1.19 (95% confidence interval [CI]: 1.08 to 1.37) for heterozygous carriers and 1.41 (95% CI: 1.06 to 1.87) for homozygous carriers. Analysis of the individual components of the primary endpoints provided no significant evidence that the rs1333040 genotype influenced the hazard of cardiovascular death (p = 0.24) or the reoccurrence of myocardial infarction (p = 0.57), but did provide significant evidence that it influenced on the hazard of coronary revascularization, with adjusted heterozygous and homozygous ratios of 1.38 (95% CI: 1.17 to 1.63) and 1.90 (95% CI: 1.36 to 2.65) (p = 0.00015), respectively. It also significantly influenced the angiographic endpoint of coronary atherosclerosis progression (p = 0.002). CONCLUSIONS In early-onset myocardial infarction, the 9p21.3 variant rs1333040 affects the progression of coronary atherosclerosis and the probability of coronary artery revascularization during long-term follow-up.

70 citations

Journal ArticleDOI
Vasiliki Lagou1, Vasiliki Lagou2, Reedik Mägi3, Hottenga J-J.4  +251 moreInstitutions (89)
TL;DR: In this paper, the authors assess sex-dimorphic (73,089/50,404 women and 67,506/47,806 men) and sex-combined (151,188/105,056 individuals) fasting glucose/fasting insulin genetic effects via genome-wide association study meta-analyses.
Abstract: Differences between sexes contribute to variation in the levels of fasting glucose and insulin. Epidemiological studies established a higher prevalence of impaired fasting glucose in men and impaired glucose tolerance in women, however, the genetic component underlying this phenomenon is not established. We assess sex-dimorphic (73,089/50,404 women and 67,506/47,806 men) and sex-combined (151,188/105,056 individuals) fasting glucose/fasting insulin genetic effects via genome-wide association study meta-analyses in individuals of European descent without diabetes. Here we report sex dimorphism in allelic effects on fasting insulin at IRS1 and ZNF12 loci, the latter showing higher RNA expression in whole blood in women compared to men. We also observe sex-homogeneous effects on fasting glucose at seven novel loci. Fasting insulin in women shows stronger genetic correlations than in men with waist-to-hip ratio and anorexia nervosa. Furthermore, waist-to-hip ratio is causally related to insulin resistance in women, but not in men. These results position dissection of metabolic and glycemic health sex dimorphism as a steppingstone for understanding differences in genetic effects between women and men in related phenotypes.

69 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: Bowtie 2 combines the strengths of the full-text minute index with the flexibility and speed of hardware-accelerated dynamic programming algorithms to achieve a combination of high speed, sensitivity and accuracy.
Abstract: As the rate of sequencing increases, greater throughput is demanded from read aligners. The full-text minute index is often used to make alignment very fast and memory-efficient, but the approach is ill-suited to finding longer, gapped alignments. Bowtie 2 combines the strengths of the full-text minute index with the flexibility and speed of hardware-accelerated dynamic programming algorithms to achieve a combination of high speed, sensitivity and accuracy.

37,898 citations

Journal ArticleDOI
TL;DR: The Gene Set Enrichment Analysis (GSEA) method as discussed by the authors focuses on gene sets, that is, groups of genes that share common biological function, chromosomal location, or regulation.
Abstract: Although genomewide RNA expression analysis has become a routine tool in biomedical research, extracting biological insight from such information remains a major challenge. Here, we describe a powerful analytical method called Gene Set Enrichment Analysis (GSEA) for interpreting gene expression data. The method derives its power by focusing on gene sets, that is, groups of genes that share common biological function, chromosomal location, or regulation. We demonstrate how GSEA yields insights into several cancer-related data sets, including leukemia and lung cancer. Notably, where single-gene analysis finds little similarity between two independent studies of patient survival in lung cancer, GSEA reveals many biological pathways in common. The GSEA method is embodied in a freely available software package, together with an initial database of 1,325 biologically defined gene sets.

34,830 citations

Journal ArticleDOI
TL;DR: This work introduces PLINK, an open-source C/C++ WGAS tool set, and describes the five main domains of function: data management, summary statistics, population stratification, association analysis, and identity-by-descent estimation, which focuses on the estimation and use of identity- by-state and identity/descent information in the context of population-based whole-genome studies.
Abstract: Whole-genome association studies (WGAS) bring new computational, as well as analytic, challenges to researchers. Many existing genetic-analysis tools are not designed to handle such large data sets in a convenient manner and do not necessarily exploit the new opportunities that whole-genome data bring. To address these issues, we developed PLINK, an open-source C/C++ WGAS tool set. With PLINK, large data sets comprising hundreds of thousands of markers genotyped for thousands of individuals can be rapidly manipulated and analyzed in their entirety. As well as providing tools to make the basic analytic steps computationally efficient, PLINK also supports some novel approaches to whole-genome data that take advantage of whole-genome coverage. We introduce PLINK and describe the five main domains of function: data management, summary statistics, population stratification, association analysis, and identity-by-descent estimation. In particular, we focus on the estimation and use of identity-by-state and identity-by-descent information in the context of population-based whole-genome studies. This information can be used to detect and correct for population stratification and to identify extended chromosomal segments that are shared identical by descent between very distantly related individuals. Analysis of the patterns of segmental sharing has the potential to map disease loci that contain multiple rare variants in a population-based linkage analysis.

26,280 citations

Journal ArticleDOI
Eric S. Lander1, Lauren Linton1, Bruce W. Birren1, Chad Nusbaum1  +245 moreInstitutions (29)
15 Feb 2001-Nature
TL;DR: The results of an international collaboration to produce and make freely available a draft sequence of the human genome are reported and an initial analysis is presented, describing some of the insights that can be gleaned from the sequence.
Abstract: The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence.

22,269 citations

Journal ArticleDOI
TL;DR: The philosophy and design of the limma package is reviewed, summarizing both new and historical features, with an emphasis on recent enhancements and features that have not been previously described.
Abstract: limma is an R/Bioconductor software package that provides an integrated solution for analysing data from gene expression experiments. It contains rich features for handling complex experimental designs and for information borrowing to overcome the problem of small sample sizes. Over the past decade, limma has been a popular choice for gene discovery through differential expression analyses of microarray and high-throughput PCR data. The package contains particularly strong facilities for reading, normalizing and exploring such data. Recently, the capabilities of limma have been significantly expanded in two important directions. First, the package can now perform both differential expression and differential splicing analyses of RNA sequencing (RNA-seq) data. All the downstream analysis tools previously restricted to microarray data are now available for RNA-seq as well. These capabilities allow users to analyse both RNA-seq and microarray data with very similar pipelines. Second, the package is now able to go past the traditional gene-wise expression analyses in a variety of ways, analysing expression profiles in terms of co-regulated sets of genes or in terms of higher-order expression signatures. This provides enhanced possibilities for biological interpretation of gene expression differences. This article reviews the philosophy and design of the limma package, summarizing both new and historical features, with an emphasis on recent enhancements and features that have not been previously described.

22,147 citations