Home
/
Authors
/
David Altshuler

Author

David Altshuler

Other affiliations: Vertex Pharmaceuticals, Massachusetts Institute of Technology, Broad Institute ...read more

Bio: David Altshuler is an academic researcher from University of Michigan. The author has contributed to research in topics: Genome-wide association study & Population. The author has an hindex of 162, co-authored 345 publications receiving 201782 citations. Previous affiliations of David Altshuler include Vertex Pharmaceuticals & Massachusetts Institute of Technology.

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1993
1992

Papers

PDF

Open Access

More filters

Journal Article•DOI•

A Map of Human Genome Variation From Population-Scale Sequencing

[...]

Gonçalo R. Abecasis¹, David Altshuler², David Altshuler³, Adam Auton⁴, Lisa D Brooks⁵, Richard Durbin⁶, Richard A. Gibbs⁷, Matthew E. Hurles⁶, Gil McVean⁴ - Show less +5 more•Institutions (7)

University of Michigan¹, Broad Institute², Harvard University³, University of Oxford⁴, Johns Hopkins University⁵, Wellcome Trust Sanger Institute⁶, Baylor College of Medicine⁷

28 Oct 2010-Nature

TL;DR: The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation as a foundation for investigating the relationship between genotype and phenotype as mentioned in this paper, and the results of the pilot phase of the project, designed to develop and compare different strategies for genomewide sequencing with high-throughput platforms.

...read moreread less

Abstract: The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation as a foundation for investigating the relationship between genotype and phenotype. Here we present results of the pilot phase of the project, designed to develop and compare different strategies for genome-wide sequencing with high-throughput platforms. We undertook three projects: low-coverage whole-genome sequencing of 179 individuals from four populations; high-coverage sequencing of two mother-father-child trios; and exon-targeted sequencing of 697 individuals from seven populations. We describe the location, allele frequency and local haplotype structure of approximately 15 million single nucleotide polymorphisms, 1 million short insertions and deletions, and 20,000 structural variants, most of which were previously undescribed. We show that, because we have catalogued the vast majority of common variation, over 95% of the currently accessible variants found in any individual are present in this data set. On average, each person is found to carry approximately 250 to 300 loss-of-function variants in annotated genes and 50 to 100 variants previously implicated in inherited disorders. We demonstrate how these results can be used to inform association and functional studies. From the two trios, we directly estimate the rate of de novo germline base substitution mutations to be approximately 10(-8) per base pair per generation. We explore the data with regard to signatures of natural selection, and identify a marked reduction of genetic variation in the neighbourhood of genes, due to selection at linked sites. These methods and public data will support the next phase of human genetic research.

...read moreread less

7,538 citations

Journal Article•DOI•

The International HapMap Project

[...]

John W. Belmont¹, Paul Hardenbol, Thomas D. Willis, Fuli Yu¹, Huanming Yang², Lan Yang Ch'Ang, Wei Huang³, Bin Liu², Yan Shen³, Paul K.H. Tam⁴, Lap-Chee Tsui⁴, Mary M.Y. Waye⁵, Jeffrey Tze Fei Wong⁶, Changqing Zeng², Qingrun Zhang², Mark S. Chee⁷, Luana Galver⁷, Semyon Kruglyak⁷, Sarah S. Murray⁷, Arnold Oliphant⁷, Alexandre Montpetit⁸, Fanny Chagnon⁸, Vincent Ferretti⁸, Martin Leboeuf⁸, Michael S. Phillips⁸, Andrei Verner⁸, Shenghui Duan⁹, Denise L. Lind¹⁰, Raymond D. Miller⁹, John P. Rice⁹, Nancy L. Saccone⁹, Patricia Taillon-Miller⁹, Ming Xiao¹⁰, Akihiro Sekine, Koki Sorimachi, Yoichi Tanaka, Tatsuhiko Tsunoda, Eiji Yoshino, David R. Bentley¹¹, Sarah E. Hunt¹¹, Don Powell¹¹, Houcan Zhang¹², Ichiro Matsuda¹³, Yoshimitsu Fukushima¹⁴, Darryl Macer¹⁵, Eiko Suda¹⁵, Charles N. Rotimi¹⁶, Clement Adebamowo¹⁷, Toyin Aniagwu¹⁷, Patricia A. Marshall¹⁸, Olayemi Matthew¹⁷, Chibuzor Nkwodimmah¹⁷, Charmaine D.M. Royal¹⁶, Mark Leppert¹⁹, Missy Dixon¹⁹, Fiona Cunningham²⁰, Ardavan Kanani²⁰, Gudmundur A. Thorisson²⁰, Peter E. Chen²¹, David J. Cutler²¹, Carl S. Kashuk²¹, Peter Donnelly²², Jonathan Marchini²², Gilean McVean²², Simon Myers²², Lon R. Cardon²², Andrew P. Morris²², Bruce S. Weir²³, James C. Mullikin²⁴, Michael Feolo²⁴, Mark J. Daly²⁵, Renzong Qiu²⁶, Alastair Kent, Georgia M. Dunston¹⁶, Kazuto Kato²⁷, Norio Niikawa²⁸, Jessica Watkin²⁹, Richard A. Gibbs¹, Erica Sodergren¹, George M. Weinstock¹, Richard K. Wilson⁹, Lucinda Fulton⁹, Jane Rogers¹¹, Bruce W. Birren²⁵, Hua Han², Hongguang Wang, Martin Godbout³⁰, John C. Wallenburg⁸, Paul L'Archevêque, Guy Bellemare, Kazuo Todani, Takashi Fujita, Satoshi Tanaka, Arthur L. Holden, Francis S. Collins²⁴, Lisa D. Brooks²⁴, Jean E. McEwen²⁴, Mark S. Guyer²⁴, Elke Jordan³¹, Jane Peterson²⁴, Jack Spiegel²⁴, Lawrence M. Sung³², Lynn F. Zacharia²⁴, Karen Kennedy²⁹, Michael Dunn²⁹, Richard Seabrook²⁹, Mark Shillito, Barbara Skene²⁹, John Stewart²⁹, David Valle²¹, Ellen Wright Clayton³³, Lynn B. Jorde¹⁹, Aravinda Chakravarti²¹, Mildred K. Cho³⁴, Troy Duster³⁵, Troy Duster³⁶, Morris W. Foster³⁷, Maria Jasperse³⁸, Bartha Maria Knoppers³⁹, Pui-Yan Kwok¹⁰, Julio Licinio⁴⁰, Jeffrey C. Long⁴¹, Pilar N. Ossorio⁴², Vivian Ota Wang³³, Charles N. Rotimi¹⁶, Patricia Spallone²⁹, Patricia Spallone⁴³, Sharon F. Terry⁴⁴, Eric S. Lander²⁵, Eric H. Lai⁴⁵, Deborah A. Nickerson⁴⁶, Gonçalo R. Abecasis⁴¹, David Altshuler⁴⁷, Michael Boehnke⁴¹, Panos Deloukas¹¹, Julie A. Douglas⁴¹, Stacey Gabriel²⁵, Richard R. Hudson⁴⁸, Thomas J. Hudson⁸, Leonid Kruglyak⁴⁹, Yusuke Nakamura⁵⁰, Robert L. Nussbaum²⁴, Stephen F. Schaffner²⁵, Stephen T. Sherry²⁴, Lincoln Stein²⁰, Toshihiro Tanaka - Show less +142 more•Institutions (50)

Baylor College of Medicine¹, Chinese Academy of Sciences², Chinese National Human Genome Center³, University of Hong Kong⁴, The Chinese University of Hong Kong⁵, Hong Kong University of Science and Technology⁶, Illumina⁷, McGill University⁸, Washington University in St. Louis⁹, University of California, San Francisco¹⁰, Wellcome Trust Sanger Institute¹¹, Beijing Normal University¹², Health Sciences University of Hokkaido¹³, Shinshu University¹⁴, University of Tsukuba¹⁵, Howard University¹⁶, University of Ibadan¹⁷, Case Western Reserve University¹⁸, University of Utah¹⁹, Cold Spring Harbor Laboratory²⁰, Johns Hopkins University²¹, University of Oxford²², North Carolina State University²³, National Institutes of Health²⁴, Massachusetts Institute of Technology²⁵, Chinese Academy of Social Sciences²⁶, Kyoto University²⁷, Nagasaki University²⁸, Wellcome Trust²⁹, Genome Canada³⁰, Foundation for the National Institutes of Health³¹, University of Maryland, Baltimore³², Vanderbilt University³³, Stanford University³⁴, University of California, Berkeley³⁵, New York University³⁶, University of Oklahoma³⁷, University of New Mexico³⁸, Université de Montréal³⁹, University of California, Los Angeles⁴⁰, University of Michigan⁴¹, University of Wisconsin-Madison⁴², London School of Economics and Political Science⁴³, Genetic Alliance⁴⁴, GlaxoSmithKline⁴⁵, University of Washington⁴⁶, Harvard University⁴⁷, University of Chicago⁴⁸, Fred Hutchinson Cancer Research Center⁴⁹, University of Tokyo⁵⁰

18 Dec 2003-Nature

TL;DR: The HapMap will allow the discovery of sequence variants that affect common disease, will facilitate development of diagnostic tools, and will enhance the ability to choose targets for therapeutic intervention.

...read moreread less

Abstract: The goal of the International HapMap Project is to determine the common patterns of DNA sequence variation in the human genome and to make this information freely available in the public domain. An international consortium is developing a map of these patterns across the genome by determining the genotypes of one million or more sequence variants, their frequencies and the degree of association between them, in DNA samples from populations with ancestry from parts of Africa, Asia and Europe. The HapMap will allow the discovery of sequence variants that affect common disease, will facilitate development of diagnostic tools, and will enhance our ability to choose targets for therapeutic intervention.

...read moreread less

5,926 citations

Journal Article•DOI•

The Structure of Haplotype Blocks in the Human Genome

[...]

Stacey Gabriel¹, Stephen F. Schaffner¹, Huy Nguyen¹, Jamie Moore¹, Jessica Roy¹, Brendan Blumenstiel¹, John M. Higgins¹, Matthew DeFelice¹, Amy L. Lochner¹, Maura Faggart¹, Shau Neen Liu-Cordero¹, Charles N. Rotimi², Adebowale Adeyemo³, Richard S. Cooper⁴, Ryk Ward⁵, Eric S. Lander¹, Mark J. Daly¹, David Altshuler⁶, David Altshuler¹ - Show less +15 more•Institutions (6)

Massachusetts Institute of Technology¹, Howard University², University of Ibadan³, Loyola University Chicago⁴, University of Oxford⁵, Harvard University⁶

21 Jun 2002-Science

TL;DR: It is shown that the human genome can be parsed objectively into haplotype blocks: sizable regions over which there is little evidence for historical recombination and within which only a few common haplotypes are observed.

...read moreread less

Abstract: Haplotype-based methods offer a powerful approach to disease gene mapping, based on the association between causal mutations and the ancestral haplotypes on which they arose. As part of The SNP Consortium Allele Frequency Projects, we characterized haplotype patterns across 51 autosomal regions (spanning 13 megabases of the human genome) in samples from Africa, Europe, and Asia. We show that the human genome can be parsed objectively into haplotype blocks: sizable regions over which there is little evidence for historical recombination and within which only a few common haplotypes are observed. The boundaries of blocks and specific haplotypes they contain are highly correlated across populations. We demonstrate that such haplotype frameworks provide substantial statistical power in association studies of common genetic variation across each region. Our results provide a foundation for the construction of a haplotype map of the human genome, facilitating comprehensive genetic association studies of human disease.

...read moreread less

5,634 citations

Journal Article•DOI•

A haplotype map of the human genome

[...]

John W. Belmont¹, Andrew Boudreau, Suzanne M. Leal¹, Paul Hardenbol +229 more•Institutions (40)

27 Oct 2005

TL;DR: A public database of common variation in the human genome: more than one million single nucleotide polymorphisms for which accurate and complete genotypes have been obtained in 269 DNA samples from four populations, including ten 500-kilobase regions in which essentially all information about common DNA variation has been extracted.

...read moreread less

Abstract: Inherited genetic variation has a critical but as yet largely uncharacterized role in human disease. Here we report a public database of common variation in the human genome: more than one million single nucleotide polymorphisms (SNPs) for which accurate and complete genotypes have been obtained in 269 DNA samples from four populations, including ten 500-kilobase regions in which essentially all information about common DNA variation has been extracted. These data document the generality of recombination hotspots, a block-like structure of linkage disequilibrium and low haplotype diversity, leading to substantial correlations of SNPs with many of their neighbours. We show how the HapMap resource can guide the design and analysis of genetic association studies, shed light on structural variation and recombination, and identify loci that may have been subject to natural selection during human evolution.

...read moreread less

5,479 citations

Journal Article•DOI•

From FastQ Data to High‐Confidence Variant Calls: The Genome Analysis Toolkit Best Practices Pipeline

[...]

Géraldine A. Van der Auwera¹, Mauricio O. Carneiro¹, Christopher Hartl¹, Ryan Poplin¹, Guillermo del Angel¹, Ami Levy-Moonshine¹, Tadeusz Jordan¹, Khalid Shakir¹, David Roazen¹, Joel Thibault¹, Eric Banks¹, Kiran V. Garimella², David Altshuler¹, Stacey Gabriel¹, Mark A. DePristo¹ - Show less +11 more•Institutions (2)

Broad Institute¹, Wellcome Trust Centre for Human Genetics²

15 Oct 2013-Current protocols in human genetics

TL;DR: This unit describes how to use BWA and the Genome Analysis Toolkit to map genome sequencing data to a reference and produce high‐quality variant calls that can be used in downstream analyses.

...read moreread less

Abstract: This unit describes how to use BWA and the Genome Analysis Toolkit (GATK) to map genome sequencing data to a reference and produce high-quality variant calls that can be used in downstream analyses. The complete workflow includes the core NGS data processing steps that are necessary to make the raw data suitable for analysis by the GATK, as well as the key methods involved in variant discovery using the GATK.

...read moreread less

5,150 citations

1
2
3
4
5
…
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Fast gapped-read alignment with Bowtie 2

[...]

Ben Langmead¹, Steven L. Salzberg¹, Steven L. Salzberg², Steven L. Salzberg³•Institutions (3)

University of Maryland, College Park¹, Johns Hopkins University School of Medicine², Johns Hopkins University³

01 Apr 2012-Nature Methods

TL;DR: Bowtie 2 combines the strengths of the full-text minute index with the flexibility and speed of hardware-accelerated dynamic programming algorithms to achieve a combination of high speed, sensitivity and accuracy.

...read moreread less

Abstract: As the rate of sequencing increases, greater throughput is demanded from read aligners. The full-text minute index is often used to make alignment very fast and memory-efficient, but the approach is ill-suited to finding longer, gapped alignments. Bowtie 2 combines the strengths of the full-text minute index with the flexibility and speed of hardware-accelerated dynamic programming algorithms to achieve a combination of high speed, sensitivity and accuracy.

...read moreread less

37,898 citations

Journal Article•DOI•

Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles

[...]

Aravind Subramanian¹, Pablo Tamayo¹, Vamsi K. Mootha², Sayan Mukherjee³, Benjamin L. Ebert², Michael A. Gillette², Amanda G. Paulovich⁴, Scott L. Pomeroy², Todd R. Golub², Eric S. Lander¹, Jill P. Mesirov¹ - Show less +7 more•Institutions (4)

Massachusetts Institute of Technology¹, Harvard University², Duke University³, Fred Hutchinson Cancer Research Center⁴

25 Oct 2005-Proceedings of the National Academy of Sciences of the United States of America

TL;DR: The Gene Set Enrichment Analysis (GSEA) method as discussed by the authors focuses on gene sets, that is, groups of genes that share common biological function, chromosomal location, or regulation.

...read moreread less

Abstract: Although genomewide RNA expression analysis has become a routine tool in biomedical research, extracting biological insight from such information remains a major challenge. Here, we describe a powerful analytical method called Gene Set Enrichment Analysis (GSEA) for interpreting gene expression data. The method derives its power by focusing on gene sets, that is, groups of genes that share common biological function, chromosomal location, or regulation. We demonstrate how GSEA yields insights into several cancer-related data sets, including leukemia and lung cancer. Notably, where single-gene analysis finds little similarity between two independent studies of patient survival in lung cancer, GSEA reveals many biological pathways in common. The GSEA method is embodied in a freely available software package, together with an initial database of 1,325 biologically defined gene sets.

...read moreread less

34,830 citations

Journal Article•DOI•

PLINK: A Tool Set for Whole-Genome Association and Population-Based Linkage Analyses

[...]

Shaun Purcell¹, Shaun Purcell², Benjamin M. Neale¹, Benjamin M. Neale³, Kathe Todd-Brown², Lori Thomas², Manuel A. R. Ferreira², David Bender², David Bender¹, Julian Maller¹, Julian Maller², Pamela Sklar², Pamela Sklar¹, Paul I.W. de Bakker², Paul I.W. de Bakker¹, Mark J. Daly², Mark J. Daly¹, Pak C. Sham⁴ - Show less +14 more•Institutions (4)

Massachusetts Institute of Technology¹, Harvard University², University of London³, University of Hong Kong⁴

01 Sep 2007-American Journal of Human Genetics

TL;DR: This work introduces PLINK, an open-source C/C++ WGAS tool set, and describes the five main domains of function: data management, summary statistics, population stratification, association analysis, and identity-by-descent estimation, which focuses on the estimation and use of identity- by-state and identity/descent information in the context of population-based whole-genome studies.

...read moreread less

Abstract: Whole-genome association studies (WGAS) bring new computational, as well as analytic, challenges to researchers. Many existing genetic-analysis tools are not designed to handle such large data sets in a convenient manner and do not necessarily exploit the new opportunities that whole-genome data bring. To address these issues, we developed PLINK, an open-source C/C++ WGAS tool set. With PLINK, large data sets comprising hundreds of thousands of markers genotyped for thousands of individuals can be rapidly manipulated and analyzed in their entirety. As well as providing tools to make the basic analytic steps computationally efficient, PLINK also supports some novel approaches to whole-genome data that take advantage of whole-genome coverage. We introduce PLINK and describe the five main domains of function: data management, summary statistics, population stratification, association analysis, and identity-by-descent estimation. In particular, we focus on the estimation and use of identity-by-state and identity-by-descent information in the context of population-based whole-genome studies. This information can be used to detect and correct for population stratification and to identify extended chromosomal segments that are shared identical by descent between very distantly related individuals. Analysis of the patterns of segmental sharing has the potential to map disease loci that contain multiple rare variants in a population-based linkage analysis.

...read moreread less

26,280 citations

Journal Article•DOI•

Initial sequencing and analysis of the human genome.

[...]

Eric S. Lander¹, Lauren Linton¹, Bruce W. Birren¹, Chad Nusbaum¹ +245 more•Institutions (29)

15 Feb 2001-Nature

TL;DR: The results of an international collaboration to produce and make freely available a draft sequence of the human genome are reported and an initial analysis is presented, describing some of the insights that can be gleaned from the sequence.

...read moreread less

Abstract: The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence.

...read moreread less

22,269 citations

Journal Article•DOI•

limma powers differential expression analyses for RNA-sequencing and microarray studies

[...]

Matthew E. Ritchie¹, Belinda Phipson², Di Wu³, Yifang Hu¹, Charity W. Law⁴, Wei Shi¹, Gordon K. Smyth⁵, Gordon K. Smyth¹ - Show less +4 more•Institutions (5)

Walter and Eliza Hall Institute of Medical Research¹, Royal Children's Hospital², Harvard University³, University of Zurich⁴, University of Melbourne⁵

20 Apr 2015-Nucleic Acids Research

TL;DR: The philosophy and design of the limma package is reviewed, summarizing both new and historical features, with an emphasis on recent enhancements and features that have not been previously described.

...read moreread less

Abstract: limma is an R/Bioconductor software package that provides an integrated solution for analysing data from gene expression experiments. It contains rich features for handling complex experimental designs and for information borrowing to overcome the problem of small sample sizes. Over the past decade, limma has been a popular choice for gene discovery through differential expression analyses of microarray and high-throughput PCR data. The package contains particularly strong facilities for reading, normalizing and exploring such data. Recently, the capabilities of limma have been significantly expanded in two important directions. First, the package can now perform both differential expression and differential splicing analyses of RNA sequencing (RNA-seq) data. All the downstream analysis tools previously restricted to microarray data are now available for RNA-seq as well. These capabilities allow users to analyse both RNA-seq and microarray data with very similar pipelines. Second, the package is now able to go past the traditional gene-wise expression analyses in a variety of ways, analysing expression profiles in terms of co-regulated sets of genes or in terms of higher-order expression signatures. This provides enhanced possibilities for biological interpretation of gene expression differences. This article reviews the philosophy and design of the limma package, summarizing both new and historical features, with an emphasis on recent enhancements and features that have not been previously described.

...read moreread less

22,147 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse