scispace - formally typeset
Search or ask a question
Author

David Altshuler

Bio: David Altshuler is an academic researcher from University of Michigan. The author has contributed to research in topics: Genome-wide association study & Population. The author has an hindex of 162, co-authored 345 publications receiving 201782 citations. Previous affiliations of David Altshuler include Vertex Pharmaceuticals & Massachusetts Institute of Technology.


Papers
More filters
Journal ArticleDOI
TL;DR: The results in combination with data reported in the literature suggest that G6PC2, a glucose-6-phosphatase almost exclusively expressed in pancreatic islet cells, may underlie variation in fasting glucose, though it is possible that ABCB11, which is expressed primarily in liver, may also contribute to such variation.
Abstract: Identifying the genetic variants that regulate fasting glucose concentrations may further our understanding of the pathogenesis of diabetes. We therefore investigated the association of fasting glucose levels with SNPs in 2 genome-wide scans including a total of 5,088 nondiabetic individuals from Finland and Sardinia. We found a significant association between the SNP rs563694 and fasting glucose concentrations (P = 3.5 x 10(-7)). This association was further investigated in an additional 18,436 nondiabetic individuals of mixed European descent from 7 different studies. The combined P value for association in these follow-up samples was 6.9 x 10(-26), and combining results from all studies resulted in an overall P value for association of 6.4 x 10(-33). Across these studies, fasting glucose concentrations increased 0.01-0.16 mM with each copy of the major allele, accounting for approximately 1% of the total variation in fasting glucose. The rs563694 SNP is located between the genes glucose-6-phosphatase catalytic subunit 2 (G6PC2) and ATP-binding cassette, subfamily B (MDR/TAP), member 11 (ABCB11). Our results in combination with data reported in the literature suggest that G6PC2, a glucose-6-phosphatase almost exclusively expressed in pancreatic islet cells, may underlie variation in fasting glucose, though it is possible that ABCB11, which is expressed primarily in liver, may also contribute to such variation.

173 citations

Journal ArticleDOI
TL;DR: Finding all common variants robustly associated with common diseases, and fully defining genotype-phenotype correlation, will be a key to translating initial clues into pathophysiological understanding and clinical prediction.
Abstract: Genome-wide association studies, exemplified by the Wellcome Trust Case Control Consortium and follow-up studies, have identified dozens of common variants robustly associated with common diseases, providing new clues about genetic architecture in humans. Finding all such loci, and fully defining genotype-phenotype correlation, will be a key to translating initial clues into pathophysiological understanding and clinical prediction.

166 citations

Journal ArticleDOI
TL;DR: The combination of large-scale DNA sequencing and functional testing in the laboratory reveals that approximately 1 in 1,000 individuals carries a variant in PPARG that reduces function in a human adipocyte differentiation assay and is associated with a substantial risk of T2D.
Abstract: Peroxisome proliferator-activated receptor gamma (PPARG) is a master transcriptional regulator of adipocyte differentiation and a canonical target of antidiabetic thiazolidinedione medications. In rare families, loss-of-function (LOF) mutations in PPARG are known to cosegregate with lipodystrophy and insulin resistance; in the general population, the common P12A variant is associated with a decreased risk of type 2 diabetes (T2D). Whether and how rare variants in PPARG and defects in adipocyte differentiation influence risk of T2D in the general population remains undetermined. By sequencing PPARG in 19,752 T2D cases and controls drawn from multiple studies and ethnic groups, we identified 49 previously unidentified, nonsynonymous PPARG variants (MAF < 0.5%). Considered in aggregate (with or without computational prediction of functional consequence), these rare variants showed no association with T2D (OR = 1.35; P = 0.17). The function of the 49 variants was experimentally tested in a novel high-throughput human adipocyte differentiation assay, and nine were found to have reduced activity in the assay. Carrying any of these nine LOF variants was associated with a substantial increase in risk of T2D (OR = 7.22; P = 0.005). The combination of large-scale DNA sequencing and functional testing in the laboratory reveals that approximately 1 in 1,000 individuals carries a variant in PPARG that reduces function in a human adipocyte differentiation assay and is associated with a substantial risk of T2D.

156 citations

Journal ArticleDOI
TL;DR: Although genetic variation in CYP19A1 produces measurable differences in estrogen levels among postmenopausal women, the magnitude of the change was insufficient to contribute detectably to breast cancer.
Abstract: The CYP19A1 gene encodes the enzyme aromatase, which is responsible for the final step in the biosynthesis of estrogens. In this study, we used a systematic two-step approach that included gene resequencing and a haplotype-based analysis to comprehensively survey common genetic variation across the CYP19A1 locus in relation to circulating postmenopausal steroid hormone levels and breast cancer risk. This study was conducted among 5,356 invasive breast cancer cases and 7,129 controls comprised primarily of White women of European descent drawn from five large prospective cohorts within the National Cancer Institute Breast and Prostate Cancer Cohort Consortium. A high-density single-nucleotide polymorphism (SNP) map of 103 common SNPs (> or =5% frequency) was used to identify the linkage disequilibrium and haplotype patterns across the CYP19A1 locus, and 19 haplotype-tagging SNPs were selected to provide high predictability of the common haplotype patterns. We found haplotype-tagging SNPs and common haplotypes spanning the coding and proximal 5' region of CYP19A1 to be significantly associated with a 10% to 20% increase in endogenous estrogen levels in postmenopausal women [effect per copy of the two-SNP haplotype rs749292-rs727479 (A-A) versus noncarriers; P = 4.4 x 10(-15)]. No significant associations were observed, however, with these SNPs or common haplotypes and breast cancer risk. Thus, although genetic variation in CYP19A1 produces measurable differences in estrogen levels among postmenopausal women, the magnitude of the change was insufficient to contribute detectably to breast cancer.

156 citations

Journal ArticleDOI
TL;DR: The goal of this consortium is to characterize variations in approximately 50 genes that mediate two pathways that are associated with these cancers ?
Abstract: Most cases of breast and prostate cancer are not associated with mutations in known high-penetrance genes, indicating the involvement of multiple low-penetrance risk alleles. Studies that have attempted to identify these genes have met with limited success. The National Cancer Institute Breast and Prostate Cancer Cohort Consortium--a pooled analysis of multiple large cohort studies with a total of more than 5,000 cases of breast cancer and 8,000 cases of prostate cancer--was therefore initiated. The goal of this consortium is to characterize variations in approximately 50 genes that mediate two pathways that are associated with these cancers--the steroid-hormone metabolism pathway and the insulin-like growth factor signalling pathway--and to associate these variations with cancer risk.

154 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: Bowtie 2 combines the strengths of the full-text minute index with the flexibility and speed of hardware-accelerated dynamic programming algorithms to achieve a combination of high speed, sensitivity and accuracy.
Abstract: As the rate of sequencing increases, greater throughput is demanded from read aligners. The full-text minute index is often used to make alignment very fast and memory-efficient, but the approach is ill-suited to finding longer, gapped alignments. Bowtie 2 combines the strengths of the full-text minute index with the flexibility and speed of hardware-accelerated dynamic programming algorithms to achieve a combination of high speed, sensitivity and accuracy.

37,898 citations

Journal ArticleDOI
TL;DR: The Gene Set Enrichment Analysis (GSEA) method as discussed by the authors focuses on gene sets, that is, groups of genes that share common biological function, chromosomal location, or regulation.
Abstract: Although genomewide RNA expression analysis has become a routine tool in biomedical research, extracting biological insight from such information remains a major challenge. Here, we describe a powerful analytical method called Gene Set Enrichment Analysis (GSEA) for interpreting gene expression data. The method derives its power by focusing on gene sets, that is, groups of genes that share common biological function, chromosomal location, or regulation. We demonstrate how GSEA yields insights into several cancer-related data sets, including leukemia and lung cancer. Notably, where single-gene analysis finds little similarity between two independent studies of patient survival in lung cancer, GSEA reveals many biological pathways in common. The GSEA method is embodied in a freely available software package, together with an initial database of 1,325 biologically defined gene sets.

34,830 citations

Journal ArticleDOI
TL;DR: This work introduces PLINK, an open-source C/C++ WGAS tool set, and describes the five main domains of function: data management, summary statistics, population stratification, association analysis, and identity-by-descent estimation, which focuses on the estimation and use of identity- by-state and identity/descent information in the context of population-based whole-genome studies.
Abstract: Whole-genome association studies (WGAS) bring new computational, as well as analytic, challenges to researchers. Many existing genetic-analysis tools are not designed to handle such large data sets in a convenient manner and do not necessarily exploit the new opportunities that whole-genome data bring. To address these issues, we developed PLINK, an open-source C/C++ WGAS tool set. With PLINK, large data sets comprising hundreds of thousands of markers genotyped for thousands of individuals can be rapidly manipulated and analyzed in their entirety. As well as providing tools to make the basic analytic steps computationally efficient, PLINK also supports some novel approaches to whole-genome data that take advantage of whole-genome coverage. We introduce PLINK and describe the five main domains of function: data management, summary statistics, population stratification, association analysis, and identity-by-descent estimation. In particular, we focus on the estimation and use of identity-by-state and identity-by-descent information in the context of population-based whole-genome studies. This information can be used to detect and correct for population stratification and to identify extended chromosomal segments that are shared identical by descent between very distantly related individuals. Analysis of the patterns of segmental sharing has the potential to map disease loci that contain multiple rare variants in a population-based linkage analysis.

26,280 citations

Journal ArticleDOI
Eric S. Lander1, Lauren Linton1, Bruce W. Birren1, Chad Nusbaum1  +245 moreInstitutions (29)
15 Feb 2001-Nature
TL;DR: The results of an international collaboration to produce and make freely available a draft sequence of the human genome are reported and an initial analysis is presented, describing some of the insights that can be gleaned from the sequence.
Abstract: The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence.

22,269 citations

Journal ArticleDOI
TL;DR: The philosophy and design of the limma package is reviewed, summarizing both new and historical features, with an emphasis on recent enhancements and features that have not been previously described.
Abstract: limma is an R/Bioconductor software package that provides an integrated solution for analysing data from gene expression experiments. It contains rich features for handling complex experimental designs and for information borrowing to overcome the problem of small sample sizes. Over the past decade, limma has been a popular choice for gene discovery through differential expression analyses of microarray and high-throughput PCR data. The package contains particularly strong facilities for reading, normalizing and exploring such data. Recently, the capabilities of limma have been significantly expanded in two important directions. First, the package can now perform both differential expression and differential splicing analyses of RNA sequencing (RNA-seq) data. All the downstream analysis tools previously restricted to microarray data are now available for RNA-seq as well. These capabilities allow users to analyse both RNA-seq and microarray data with very similar pipelines. Second, the package is now able to go past the traditional gene-wise expression analyses in a variety of ways, analysing expression profiles in terms of co-regulated sets of genes or in terms of higher-order expression signatures. This provides enhanced possibilities for biological interpretation of gene expression differences. This article reviews the philosophy and design of the limma package, summarizing both new and historical features, with an emphasis on recent enhancements and features that have not been previously described.

22,147 citations