Home
/
Authors
/
Arthur L. Holden

Author

Arthur L. Holden

Bio: Arthur L. Holden is an academic researcher from Illumina. The author has contributed to research in topics: International HapMap Project & Single-nucleotide polymorphism. The author has an hindex of 7, co-authored 8 publications receiving 17330 citations.

Papers

PDF

Open Access

More filters

Journal Article•DOI•

The International HapMap Project

[...]

John W. Belmont¹, Paul Hardenbol, Thomas D. Willis, Fuli Yu¹, Huanming Yang², Lan Yang Ch'Ang, Wei Huang³, Bin Liu², Yan Shen³, Paul K.H. Tam⁴, Lap-Chee Tsui⁴, Mary M.Y. Waye⁵, Jeffrey Tze Fei Wong⁶, Changqing Zeng², Qingrun Zhang², Mark S. Chee⁷, Luana Galver⁷, Semyon Kruglyak⁷, Sarah S. Murray⁷, Arnold Oliphant⁷, Alexandre Montpetit⁸, Fanny Chagnon⁸, Vincent Ferretti⁸, Martin Leboeuf⁸, Michael S. Phillips⁸, Andrei Verner⁸, Shenghui Duan⁹, Denise L. Lind¹⁰, Raymond D. Miller⁹, John P. Rice⁹, Nancy L. Saccone⁹, Patricia Taillon-Miller⁹, Ming Xiao¹⁰, Akihiro Sekine, Koki Sorimachi, Yoichi Tanaka, Tatsuhiko Tsunoda, Eiji Yoshino, David R. Bentley¹¹, Sarah E. Hunt¹¹, Don Powell¹¹, Houcan Zhang¹², Ichiro Matsuda¹³, Yoshimitsu Fukushima¹⁴, Darryl Macer¹⁵, Eiko Suda¹⁵, Charles N. Rotimi¹⁶, Clement Adebamowo¹⁷, Toyin Aniagwu¹⁷, Patricia A. Marshall¹⁸, Olayemi Matthew¹⁷, Chibuzor Nkwodimmah¹⁷, Charmaine D.M. Royal¹⁶, Mark Leppert¹⁹, Missy Dixon¹⁹, Fiona Cunningham²⁰, Ardavan Kanani²⁰, Gudmundur A. Thorisson²⁰, Peter E. Chen²¹, David J. Cutler²¹, Carl S. Kashuk²¹, Peter Donnelly²², Jonathan Marchini²², Gilean McVean²², Simon Myers²², Lon R. Cardon²², Andrew P. Morris²², Bruce S. Weir²³, James C. Mullikin²⁴, Michael Feolo²⁴, Mark J. Daly²⁵, Renzong Qiu²⁶, Alastair Kent, Georgia M. Dunston¹⁶, Kazuto Kato²⁷, Norio Niikawa²⁸, Jessica Watkin²⁹, Richard A. Gibbs¹, Erica Sodergren¹, George M. Weinstock¹, Richard K. Wilson⁹, Lucinda Fulton⁹, Jane Rogers¹¹, Bruce W. Birren²⁵, Hua Han², Hongguang Wang, Martin Godbout³⁰, John C. Wallenburg⁸, Paul L'Archevêque, Guy Bellemare, Kazuo Todani, Takashi Fujita, Satoshi Tanaka, Arthur L. Holden, Francis S. Collins²⁴, Lisa D. Brooks²⁴, Jean E. McEwen²⁴, Mark S. Guyer²⁴, Elke Jordan³¹, Jane Peterson²⁴, Jack Spiegel²⁴, Lawrence M. Sung³², Lynn F. Zacharia²⁴, Karen Kennedy²⁹, Michael Dunn²⁹, Richard Seabrook²⁹, Mark Shillito, Barbara Skene²⁹, John Stewart²⁹, David Valle²¹, Ellen Wright Clayton³³, Lynn B. Jorde¹⁹, Aravinda Chakravarti²¹, Mildred K. Cho³⁴, Troy Duster³⁵, Troy Duster³⁶, Morris W. Foster³⁷, Maria Jasperse³⁸, Bartha Maria Knoppers³⁹, Pui-Yan Kwok¹⁰, Julio Licinio⁴⁰, Jeffrey C. Long⁴¹, Pilar N. Ossorio⁴², Vivian Ota Wang³³, Charles N. Rotimi¹⁶, Patricia Spallone²⁹, Patricia Spallone⁴³, Sharon F. Terry⁴⁴, Eric S. Lander²⁵, Eric H. Lai⁴⁵, Deborah A. Nickerson⁴⁶, Gonçalo R. Abecasis⁴¹, David Altshuler⁴⁷, Michael Boehnke⁴¹, Panos Deloukas¹¹, Julie A. Douglas⁴¹, Stacey Gabriel²⁵, Richard R. Hudson⁴⁸, Thomas J. Hudson⁸, Leonid Kruglyak⁴⁹, Yusuke Nakamura⁵⁰, Robert L. Nussbaum²⁴, Stephen F. Schaffner²⁵, Stephen T. Sherry²⁴, Lincoln Stein²⁰, Toshihiro Tanaka - Show less +142 more•Institutions (50)

Baylor College of Medicine¹, Chinese Academy of Sciences², Chinese National Human Genome Center³, University of Hong Kong⁴, The Chinese University of Hong Kong⁵, Hong Kong University of Science and Technology⁶, Illumina⁷, McGill University⁸, Washington University in St. Louis⁹, University of California, San Francisco¹⁰, Wellcome Trust Sanger Institute¹¹, Beijing Normal University¹², Health Sciences University of Hokkaido¹³, Shinshu University¹⁴, University of Tsukuba¹⁵, Howard University¹⁶, University of Ibadan¹⁷, Case Western Reserve University¹⁸, University of Utah¹⁹, Cold Spring Harbor Laboratory²⁰, Johns Hopkins University²¹, University of Oxford²², North Carolina State University²³, National Institutes of Health²⁴, Massachusetts Institute of Technology²⁵, Chinese Academy of Social Sciences²⁶, Kyoto University²⁷, Nagasaki University²⁸, Wellcome Trust²⁹, Genome Canada³⁰, Foundation for the National Institutes of Health³¹, University of Maryland, Baltimore³², Vanderbilt University³³, Stanford University³⁴, University of California, Berkeley³⁵, New York University³⁶, University of Oklahoma³⁷, University of New Mexico³⁸, Université de Montréal³⁹, University of California, Los Angeles⁴⁰, University of Michigan⁴¹, University of Wisconsin-Madison⁴², London School of Economics and Political Science⁴³, Genetic Alliance⁴⁴, GlaxoSmithKline⁴⁵, University of Washington⁴⁶, Harvard University⁴⁷, University of Chicago⁴⁸, Fred Hutchinson Cancer Research Center⁴⁹, University of Tokyo⁵⁰

18 Dec 2003-Nature

TL;DR: The HapMap will allow the discovery of sequence variants that affect common disease, will facilitate development of diagnostic tools, and will enhance the ability to choose targets for therapeutic intervention.

...read moreread less

Abstract: The goal of the International HapMap Project is to determine the common patterns of DNA sequence variation in the human genome and to make this information freely available in the public domain. An international consortium is developing a map of these patterns across the genome by determining the genotypes of one million or more sequence variants, their frequencies and the degree of association between them, in DNA samples from populations with ancestry from parts of Africa, Asia and Europe. The HapMap will allow the discovery of sequence variants that affect common disease, will facilitate development of diagnostic tools, and will enhance our ability to choose targets for therapeutic intervention.

...read moreread less

5,926 citations

Journal Article•DOI•

A haplotype map of the human genome

[...]

John W. Belmont¹, Andrew Boudreau, Suzanne M. Leal¹, Paul Hardenbol +229 more•Institutions (40)

27 Oct 2005

TL;DR: A public database of common variation in the human genome: more than one million single nucleotide polymorphisms for which accurate and complete genotypes have been obtained in 269 DNA samples from four populations, including ten 500-kilobase regions in which essentially all information about common DNA variation has been extracted.

...read moreread less

Abstract: Inherited genetic variation has a critical but as yet largely uncharacterized role in human disease. Here we report a public database of common variation in the human genome: more than one million single nucleotide polymorphisms (SNPs) for which accurate and complete genotypes have been obtained in 269 DNA samples from four populations, including ten 500-kilobase regions in which essentially all information about common DNA variation has been extracted. These data document the generality of recombination hotspots, a block-like structure of linkage disequilibrium and low haplotype diversity, leading to substantial correlations of SNPs with many of their neighbours. We show how the HapMap resource can guide the design and analysis of genetic association studies, shed light on structural variation and recombination, and identify loci that may have been subject to natural selection during human evolution.

...read moreread less

5,479 citations

Journal Article•DOI•

A second generation human haplotype map of over 3.1 million SNPs

[...]

Kelly A. Frazer¹, Dennis G. Ballinger, David R. Cox, David A. Hinds +234 more•Institutions (48)

18 Oct 2007-Nature

TL;DR: The Phase II HapMap is described, which characterizes over 3.1 million human single nucleotide polymorphisms genotyped in 270 individuals from four geographically diverse populations and includes 25–35% of common SNP variation in the populations surveyed, and increased differentiation at non-synonymous, compared to synonymous, SNPs is demonstrated.

...read moreread less

Abstract: We describe the Phase II HapMap, which characterizes over 3.1 million human single nucleotide polymorphisms (SNPs) genotyped in 270 individuals from four geographically diverse populations and includes 25-35% of common SNP variation in the populations surveyed. The map is estimated to capture untyped common variation with an average maximum r2 of between 0.9 and 0.96 depending on population. We demonstrate that the current generation of commercial genome-wide genotyping products captures common Phase II SNPs with an average maximum r2 of up to 0.8 in African and up to 0.95 in non-African populations, and that potential gains in power in association studies can be obtained through imputation. These data also reveal novel aspects of the structure of linkage disequilibrium. We show that 10-30% of pairs of individuals within a population share at least one region of extended genetic identity arising from recent ancestry and that up to 1% of all common variants are untaggable, primarily because they lie within recombination hotspots. We show that recombination rates vary systematically around genes and between genes of different function. Finally, we demonstrate increased differentiation at non-synonymous, compared to synonymous, SNPs, resulting from systematic differences in the strength or efficacy of natural selection between populations.

...read moreread less

4,565 citations

Journal Article•DOI•

Genome-wide detection and characterization of positive selection in human populations

[...]

Pardis C. Sabeti¹, Pardis C. Sabeti², Patrick Varilly¹, Patrick Varilly² +255 more•Institutions (50)

18 Oct 2007-Nature

TL;DR: ‘Long-range haplotype’ methods, which were developed to identify alleles segregating in a population that have undergone recent selection, and new methods that are based on cross-population comparisons to discover alleles that have swept to near-fixation within a population are developed.

...read moreread less

Abstract: With the advent of dense maps of human genetic variation, it is now possible to detect positive natural selection across the human genome. Here we report an analysis of over 3 million polymorphisms from the International HapMap Project Phase 2 (HapMap2). We used 'long-range haplotype' methods, which were developed to identify alleles segregating in a population that have undergone recent selection, and we also developed new methods that are based on cross-population comparisons to discover alleles that have swept to near-fixation within a population. The analysis reveals more than 300 strong candidate regions. Focusing on the strongest 22 regions, we develop a heuristic for scrutinizing these regions to identify candidate targets of selection. In a complementary analysis, we identify 26 non-synonymous, coding, single nucleotide polymorphisms showing regional evidence of positive selection. Examination of these candidates highlights three cases in which two genes in a common biological process have apparently undergone positive selection in the same population:LARGE and DMD, both related to infection by the Lassa virus, in West Africa;SLC24A5 and SLC45A2, both involved in skin pigmentation, in Europe; and EDAR and EDA2R, both involved in development of hair follicles, in Asia.

...read moreread less

1,778 citations

Journal Article•DOI•

A 3.9-Centimorgan-Resolution Human Single-Nucleotide Polymorphism Linkage Map and Screening Set

[...]

Tara C. Matise¹, Ravi Sachidanandam², Andrew G. Clark³, Andrew G. Clark⁴, Leonid Kruglyak⁵, Ellen M. Wijsman⁶, Jerzy M. Kakol², Steven Buyske¹, Buena Chui⁷, Patrick Cohen⁸, Claudia de Toma⁸, Margaret G. Ehm⁹, Stephen Glanowski³, Chunsheng He¹, Jeremy Heil³, Kyriacos Markianos⁵, Ivy McMullen³, Margaret A. Pericak-Vance¹⁰, Arkadiy Silbergleit⁷, Lincoln Stein², Michael J. Wagner⁹, Alexander F. Wilson¹¹, Jeffrey D. Winick⁷, Emily S. Winn-Deen³, Emily S. Winn-Deen¹², Carl T. Yamashiro⁷, Howard M. Cann⁸, Eric H. Lai⁹, Arthur L. Holden - Show less +25 more•Institutions (12)

Rutgers University¹, Cold Spring Harbor Laboratory², Celera Corporation³, Cornell University⁴, Fred Hutchinson Cancer Research Center⁵, University of Washington⁶, Amersham plc⁷, Council on Education for Public Health⁸, Research Triangle Park⁹, Duke University¹⁰, National Institutes of Health¹¹, Hoffmann-La Roche¹²

01 Aug 2003-American Journal of Human Genetics

TL;DR: Evaluations indicate that this SNP screening set is more informative than the Marshfield Clinic's commonly used microsatellite-based screening set and provides a resource for fast genome scanning for disease genes.

...read moreread less

Abstract: Recent advances in technologies for high-throughout single-nucleotide polymorphism (SNP)–based genotyping have improved efficiency and cost so that it is now becoming reasonable to consider the use of SNPs for genomewide linkage analysis. However, a suitable screening set of SNPs and a corresponding linkage map have yet to be described. The SNP maps described here fill this void and provide a resource for fast genome scanning for disease genes. We have evaluated 6,297 SNPs in a diversity panel composed of European Americans, African Americans, and Asians. The markers were assessed for assay robustness, suitable allele frequencies, and informativeness of multi-SNP clusters. Individuals from 56 Centre d'Etude du Polymorphisme Humain pedigrees, with >770 potentially informative meioses altogether, were genotyped with a subset of 2,988 SNPs, for map construction. Extensive genotyping-error analysis was performed, and the resulting SNP linkage map has an average map resolution of 3.9 cM, with map positions containing either a single SNP or several tightly linked SNPs. The order of markers on this map compares favorably with several other linkage and physical maps. We compared map distances between the SNP linkage map and the interpolated SNP linkage map constructed by the deCode Genetics group. We also evaluated cM/Mb distance ratios in females and males, along each chromosome, showing broadly defined regions of increased and decreased rates of recombination. Evaluations indicate that this SNP screening set is more informative than the Marshfield Clinic’s commonly used microsatellite-based screening set.

...read moreread less

118 citations

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data

[...]

Aaron McKenna¹, Matthew Hanna, Eric Banks, Andrey Sivachenko, Kristian Cibulskis, Andrew Kernytsky, Kiran V. Garimella, David Altshuler, Stacey Gabriel, Mark J. Daly, Mark A. DePristo - Show less +7 more•Institutions (1)

Broad Institute¹

01 Sep 2010-Genome Research

TL;DR: The GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.

...read moreread less

Abstract: Next-generation DNA sequencing (NGS) projects, such as the 1000 Genomes Project, are already revolutionizing our understanding of genetic variation among individuals. However, the massive data sets generated by NGS—the 1000 Genome pilot alone includes nearly five terabases—make writing feature-rich, efficient, and robust analysis tools difficult for even computationally sophisticated individuals. Indeed, many professionals are limited in the scope and the ease with which they can answer scientific questions by the complexity of accessing and manipulating the data produced by these machines. Here, we discuss our Genome Analysis Toolkit (GATK), a structured programming framework designed to ease the development of efficient and robust analysis tools for next-generation DNA sequencers using the functional programming philosophy of MapReduce. The GATK provides a small but rich set of data access patterns that encompass the majority of analysis tool needs. Separating specific analysis calculations from common data management infrastructure enables us to optimize the GATK framework for correctness, stability, and CPU and memory efficiency and to enable distributed and shared memory parallelization. We highlight the capabilities of the GATK by describing the implementation and application of robust, scale-tolerant tools like coverage calculators and single nucleotide polymorphism (SNP) calling. We conclude that the GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.

...read moreread less

20,557 citations

Journal Article•DOI•

Haploview: analysis and visualization of LD and haplotype maps

[...]

Jeffrey C. Barrett¹, Ben Fry¹, Julian Maller¹, Mark J. Daly¹•Institutions (1)

Massachusetts Institute of Technology¹

15 Jan 2005-Bioinformatics

TL;DR: Haploview is a software package that provides computation of linkage disequilibrium statistics and population haplotype patterns from primary genotype data in a visually appealing and interactive interface.

...read moreread less

Abstract: Summary: Research over the last few years has revealed significant haplotype structure in the human genome. The characterization of these patterns, particularly in the context of medical genetic association studies, is becoming a routine research activity. Haploview is a software package that provides computation of linkage disequilibrium statistics and population haplotype patterns from primary genotype data in a visually appealing and interactive interface. Availability: http://www.broad.mit.edu/mpg/haploview/ Contact: jcbarret@broad.mit.edu

...read moreread less

13,862 citations

Journal Article•DOI•

DnaSP v5

[...]

Pablo Librado¹, Julio Rozas¹•Institutions (1)

University of Barcelona¹

01 Jun 2009-Bioinformatics

TL;DR: Version 5 implements a number of new features and analytical methods allowing extensive DNA polymorphism analyses on large datasets, including visualizing sliding window results integrated with available genome annotations in the UCSC browser.

...read moreread less

Abstract: Motivation: DnaSP is a software package for a comprehensive analysis of DNA polymorphism data. Version 5 implements a number of new features and analytical methods allowing extensive DNA polymorphism analyses on large datasets. Among other features, the newly implemented methods allow for: (i) analyses on multiple data files; (ii) haplotype phasing; (iii) analyses on insertion/deletion polymorphism data; (iv) visualizing sliding window results integrated with available genome annotations in the UCSC browser. Availability: Freely available to academic users from: http://www.ub.edu/dnasp Contact: [email protected]

...read moreread less

13,511 citations

Journal Article•DOI•

A global reference for human genetic variation.

[...]

Adam Auton¹, Gonçalo R. Abecasis², David Altshuler³, Richard Durbin⁴ +514 more•Institutions (90)

01 Oct 2015-Nature

TL;DR: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations, and has reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-generation sequencing, deep exome sequencing, and dense microarray genotyping.

...read moreread less

Abstract: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.

...read moreread less

12,661 citations

Journal Article•DOI•

Principal components analysis corrects for stratification in genome-wide association studies

[...]

Alkes L. Price¹, Alkes L. Price², Nick Patterson², Robert M. Plenge³, Robert M. Plenge², Michael E. Weinblatt³, Nancy A. Shadick³, David Reich², David Reich¹ - Show less +5 more•Institutions (3)

Harvard University¹, Broad Institute², Brigham and Women's Hospital³

23 Jul 2006-Nature Genetics

TL;DR: This work describes a method that enables explicit detection and correction of population stratification on a genome-wide scale and uses principal components analysis to explicitly model ancestry differences between cases and controls.

...read moreread less

Abstract: Population stratification—allele frequency differences between cases and controls due to systematic ancestry differences—can cause spurious associations in disease studies. We describe a method that enables explicit detection and correction of population stratification on a genome-wide scale. Our method uses principal components analysis to explicitly model ancestry differences between cases and controls. The resulting correction is specific to a candidate marker’s variation in frequency across ancestral populations, minimizing spurious associations while maximizing power to detect true associations. Our simple, efficient approach can easily be applied to disease studies with hundreds of thousands of markers. Population stratification—allele frequency differences between cases and controls due to systematic ancestry differences—can cause spurious associations in disease studies 1‐8 . Because the effects of stratification vary in proportion to the number of samples 9 , stratification will be an increasing problem in the large-scale association studies of the future, which will analyze thousands of samples in an effort to detect common genetic variants of weak effect. The two prevailing methods for dealing with stratification are genomic control and structured association 9‐14 . Although genomic control and structured association have proven useful in a variety of contexts, they have limitations. Genomic control corrects for stratification by adjusting association statistics at each marker by a uniform overall inflation factor. However, some markers differ in their allele frequencies across ancestral populations more than others. Thus, the uniform adjustment applied by genomic control may be insufficient at markers having unusually strong differentiation across ancestral populations and may be superfluous at markers devoid of such differentiation, leading to a loss in power. Structured association uses a program such as STRUCTURE 15 to assign the samples to discrete subpopulation clusters and then aggregates evidence of association within each cluster. If fractional membership in more than one cluster is allowed, the method cannot currently be applied to genome-wide association studies because of its intensive computational cost on large data sets. Furthermore, assignments of individuals to clusters are highly sensitive to the number of clusters, which is not well defined 14,16 .

...read moreread less

9,387 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse