scispace - formally typeset
Open AccessJournal ArticleDOI

Genome-wide and fine-resolution association analysis of malaria in West Africa

Muminatou Jallow, +90 more
- 01 Jun 2009 - 
- Vol. 41, Iss: 6, pp 657-665
Reads0
Chats0
TLDR
These findings provide proof of principle that fine-resolution multipoint imputation, based on population-specific sequencing data, can substantially boost authentic GWA signals and enable fine mapping of causal variants in African populations.
Abstract
We report a genome-wide association (GWA) study of severe malaria in The Gambia. The initial GWA scan included 2,500 children genotyped on the Affymetrix 500K GeneChip, and a replication study included 3,400 children. We used this to examine the performance of GWA methods in Africa. We found considerable population stratification, and also that signals of association at known malaria resistance loci were greatly attenuated owing to weak linkage disequilibrium (LD). To investigate possible solutions to the problem of low LD, we focused on the HbS locus, sequencing this region of the genome in 62 Gambian individuals and then using these data to conduct multipoint imputation in the GWA samples. This increased the signal of association, from P = 4 × 10(-7) to P = 4 × 10(-14), with the peak of the signal located precisely at the HbS causal variant. Our findings provide proof of principle that fine-resolution multipoint imputation, based on population-specific sequencing data, can substantially boost authentic GWA signals and enable fine mapping of causal variants in African populations.

read more

Content maybe subject to copyright    Report

Genome-wide and fine-resolution association analysis
of malaria in West Africa
Muminatou Jallow
1,34
, Yik Ying Teo
2,3,34
, Kerrin S Small
2,3,34
, Kirk A Rockett
2,3
, Panos Deloukas
3
,
Taane G Clark
2,3
, Katja Kivinen
3
, Kalifa A Bojang
1
, David J Conway
1
, Margaret Pinder
1
, Giorgio Sirugo
1
,
Fatou Sisay-Joof
1
, Stanley Usen
1
, Sarah Auburn
2,3
, Suzannah J Bumpstead
3
, Susana Campino
2,3
,
Alison Coffey
3
, Andrew Dunham
3
, Andrew E Fry
2
, Angela Green
2
, Rhian Gwilliam
3
, Sarah E Hunt
3
,
Michael Inouye
3
, Anna E Jeffreys
2
, Alieu Mendy
2
, Aarno Palotie
3
, Simon Potter
3
, Jiannis Ragoussis
2
,
Jane Rogers
3
, Kate Rowlands
2
, Elilan Somaskantharajah
3
, Pamela Whittaker
3
, Claire Widden
3
,
Peter Donnelly
2,4
, Bryan Howie
4
, Jonathan Marchini
2,4
, Andrew Morris
2
, Miguel SanJoaquin
2,5
,
Eric Akum Achidi
6
, Tsiri Agbenyega
7
, Angela Allen
8,9
, Olukemi Amodu
10
, Patrick Corr an
11
,
Abdoulaye Djimde
12
, Amagana Dolo
12
, Ogobara K Doumbo
12
, Chris Drakeley
13,14
, Sarah Dunstan
15
,
Jennifer Evans
7,16
, Jeremy Farrar
15
, Deepika Fernando
17
, Tran Tinh Hien
15
, Rolf D Horstmann
16
,
Muntaser Ibrahim
18
, Nadira Karunaweera
17
, Gilbert Kokwaro
19
, Kwadwo A Koram
20
, Martha Lemnge
21
,
Julie Makani
22
, Kevin Marsh
19
, Pascal Michon
8
, David Modiano
23
, Malcolm E Molyneux
5
, Ivo Mueller
8
,
Michael Parker
24
, Norbert Peshu
19
, Christopher V Plowe
25,26
, Odile Puijalon
27
, John Reeder
8
,
Hugh Reyburn
13,14
, Eleanor M Riley
13,14
, Anavaj Sakuntabhai
27
, Pratap Singhasivanon
28
, Sodiomon Sirima
29
,
Adama Tall
30
, Terrie E Taylor
25,31
, Mahamadou Thera
12
, Marita Troye-Blomberg
32
, Thomas N Williams
19
,
Michael Wilson
20
& Dominic P Kwiatkowski
2,3
, Wellcome Trust Case Control Consortium
33
&
Malaria Genomic Epidemiology Network
33
We report a genome-wide association (GWA) study of severe malaria in The Gambia. The initial GWA scan included 2,500
children genotyped on the Affymetrix 500K GeneChip, and a replication study included 3,400 children. We used th is to examine
the performance of GWA methods in Africa. We found considerable population stratification, and also that signals of association
at known malaria resistance loci were greatly attenuated owing to weak linkage disequilibrium (LD). To investigate possible
solutions to the problem of low LD, we focused on the HbS locus, sequencing this region of the genome in 62 Gambian
individuals and then using these data to conduct multipoint imputation in the GWA samples. This increased the signal of
association, from P ¼ 4 10
7
to P ¼ 4 10
14
, with the peak of the signal located precisely at the HbS causal variant. Our
findings provide proof of principle that fine-resolution multipoint imputation, based on population-specific sequencing data, can
substantially boost authentic GWA signals and enable fine mapping of causal variants in African populations.
Received 14 October 2008; accepted 27 April 2009; published online 24 May 2009; doi:10.1038/ng.388
1
MRC Laboratories, Fajara, Banjul, Gambia.
2
Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK.
3
The Wellcome Trust Sanger Institute,
Hinxton, Cambridge, UK.
4
Department of Statistics, Oxford University, Oxford, UK.
5
Malawi–Liverpool–Wellcome Trust Clinical Research Programme, College of
Medicine, University of Malawi, Chichiri, Blantyre, Malawi.
6
The University of Buea, Buea, South West Province, Cameroon.
7
Kwame Nkrumah University of Science
and Technology, Kumasi, Ghana.
8
Papua New Guinea Institute of Medical Research, Madang, Papua New Guinea.
9
Weatherall Institute of Molecular Medicine,
University of Oxford, Oxford, UK.
10
Institute of Child Health, College of Medicine, University of Ibadan, Ibadan, Nigeria.
11
National Institute for Biological Standards
and Control, Hertfordshire, UK.
12
The Malaria Research & Training Centre, University of Bamako, Bamako, Mali.
13
London School of Hygiene & Tropical Medicine,
London, UK.
14
Joint Malaria Programme, Kilimanjaro Christian Medical Centre, Moshi, Tanzania.
15
Oxford University Clinical Research Unit, The Hospital for Tropical
Diseases, Ho Chi Minh City, Vietnam.
16
Department of Molecular Medicine, Bernhard Nocht Institute for Tropical Medicine, Hamburg, Germany.
17
Faculty of Medicine,
University of Colombo, Colombo, Sri Lanka.
18
Institute for Endemic Diseases, University of Khartoum, Medical Service Science Campus, Khartoum, Sudan.
19
Kenyan
Medical Research Institute (KEMRI)–Wellcome Trust Programme, Kilifi, Kenya.
20
Noguchi Memorial Institute for Medical Research, University of Ghana, Accra, Ghana.
21
National Institute for Medical Research, Dar es Salaam, Tanzania.
22
Muhimbili University of Health and Allied Sciences, Dar es Salaam, Tanzania.
23
University of
Rome ‘La Sapienza’, Rome, Italy.
24
The Ethox Centre, Department of Public Health and Primary Health Care, University of Oxford, Headington, Oxford, UK.
25
Blantyre
Malaria Project, Chichiri, Blantyre 3, Malawi.
26
Howard Hughes Medical Institute/University of Maryland School of Medicine, Baltimore, Maryland, USA.
27
Institut
Pasteur, Unite
´
d’Immunologie Mole
´
culaire des Parasites, Paris, France.
28
Faculty of Tropical Medicine, Mahidol University, Ratchathewi, Bangkok, Thailand.
29
Centre
National de Recherche et Formation sur le Paludisme, Ouagadougou, Burkina Faso.
30
lnstitut Pasteur de Dakar, Dakar, Senegal.
31
Michigan State University,
Department of Internal Medicine, College of Osteopathic Medicine, East Lansing, Michigan, USA.
32
The Wenner-Gren Institute, Stockholm University, Stockholm,
Sweden.
33
A full list of members is provided in the Supplementary Note online.
34
These authors contributed equally to this work. Correspondence should be addressed
to D.P.K. (dominic@sanger.ac.uk).
NATURE GE NETI CS VOLUME 41
[
NUMBER 6
[
JUNE 2009 657
ARTICLES
© 2009 Nature America, Inc. All rights reserved.

The malaria parasite Plasmodium falciparum kills on the order of a
million African children each year
1
, and this is a small fraction of the
number of infected individuals in the population
1–3
. In communities
where everyone is repeatedly infected with P. falciparum,hostgenetic
factors account for B25% of the risk of severe malaria, that is, life-
threatening forms of the disease
3
. The strongest known determinant of
risk, hemoglobin S (HbS), accounts for B2% of the total variation,
implying that only a small fraction of genetic resistance factors have so
far been discovered
3
. Identifying the genetic basis of protective
immunity against severe malaria may provide important insights for
vaccine development.
Here we examine the possibility of approaching this problem by
genome-wide association (GWA) analysis. There are many unsolved
methodological questions about how to conduct an effective GWA
study in Africa
4
. High levels of ethnic diversity may result in false-
positive associations owing to population structure. Variations in
haplotype structure between different ethnic groups may reduce
power to detect GWA signals, particularly when data are amalgamated
across multiple study sites. Low LD implies the need for denser
genotyping arrays than are currently available: a crude estimate is
that an African GWA study with 1.5 million SNPs would have
approximately the same statistical power as a European study with
0.6 million SNPs
5
, but this is based on HapMap data from a single
ethnic group and a larger number of SNPs may be needed to achieve
adequate power across different ethnic groups.
We carried out an initial GWA study in Gambian children that
explores these methodological questions. Genotyping of B500,000
SNPs was conducted on 1,060 cases of severe malaria and 1,500
population controls using the Affymetrix GeneChip 500K Mapping
Array Set. The results reported here are based on a set of 402,814 SNPs
in 958 cases and 1,382 controls that passed
stringent quality control procedures. Access
to these data may be requested online (see
URLs section in Methods).
RESULTS
Examining population structure
Subjects were recruited from an area of
approximately 400 square miles in the Kom-
bos region of The Gambia. Four ethnic
groups, Mandinka, Jola, Wolof and Fula,
accounted for 89% cases and 86% controls
(Supplementary Table 1 online). Using
Wright’s F
ST
across all autosomal SNPs, we
found that differences between ethnic groups
accounted for a small fraction of genetic
variation within the population as a whole
(F
ST
¼ 0.004). The greatest differentiation
was seen between Fula and Jola (F
ST
¼ 0.007) and the least between
Mandinka and Wolof (F
ST
¼ 0.002) (Supplementary Table 2 online).
To investigate the relationship between population structure and
self-reported ethnicity, we carried out principal components analysis
(PCA) of 100,715 SNPs, selected to reduce LD between markers
(Fig. 1)
6
. The first two principal components distinguished Fula and
Jola, and the third principal component separated the Mandinka and
Wolof from others. Some individuals could be confidently assigned to
a specific ethnic group, whereas others seemed to have a more
complex ancestry. These findings were verified using the STRUC-
TURE
7
program on 8,000 SNPs, which gave an optimal model
of population structure with four genetic subpopulations corres-
ponding to the four most common ethnic groups (Supplementary
Fig. 1 online).
To place these findings in the context of global population structure,
we compared the Gambian sample with populations studied by the
HapMap project
5,8
. The Gambian sample can be clearly distinguished
by PCA from the Yoruba people of Ibadan, Nigeria (a different part of
West Africa) but is much closer to Yoruba than to European, Han
Chinese or Japanese samples (Fig. 2a). Individual ethnic groups
within The Gambia seem to have greater genetic diversity than the
HapMap Yoruba sample (compare Fig. 1a with Fig. 2b, Supplemen-
tary Fig. 2 online). This may reflect the fact that Gambian samples
were recruited from the general population, whereas the HapMap
Yoruba samples were collected in a particular community from
individuals with four Yoruba grandparents.
Genome-wide association of severe malaria
To evaluate the likelihood of false-positive GWA findings due to
population structure, we conducted a trend test of association in
−0.02 0 0.02 0.04 0.06 0.08 0.10
−0.04
−0.02
0
0.02
0.04
First principal component
Second principal component
−0.04 −0.02 0 0.02 0.04
−0.06
−0.04
−0.02
0
0.02
0.04
0.06
Second principal component
Third principal component
Mandinka
Wolof
Fula
Jola
Serahuli
Serere
Manjago
Aku
Not recorded
ab
First principal component
Second principal component
ab
Gambia
HapMap CHB+JPT
HapMap CEU
HapMap YRI
0
0.02 0.04 0.06 0.08
−0.05
0
0.05
0.10
First principal component
Second principal component
−0.02 0 0.02 0.04 0.06 0.08 0.10
−0.06
−0.04
−0.02
0
0.02
0.04
Figure 2 Principal components analysis of population structure for the Gambian study sample in
relation to HapMap reference panels. Plots of the first two principal components from EIGENSTRAT
using 100,715 SNPs selected to minimize intermarker LD. Each solid circle represents an individual.
(a) Plot of the first two principal components for HapMap and Gambian samples. (b) Plot of the first
two principal components for HapMap YRI and Gambian samples.
Figure 1 Principal components analysis of
population structure within The Gambia. Plots
of the first three principal components from
EIGENSTRAT using 100,715 SNPs selected
to minimize intermarker LD. Each solid circle
represents an individual, and the color is
assigned according to self-reported ethnicity.
(a) Plot of the first two principal components
for all Gambian samples. (b)Plotofthe
second and third principal component for
all Gambian samples.
658 VOLUME 41
[
NUMBER 6
[
JUNE 2009 NATURE GENETICS
ARTICLES
© 2009 Nature America, Inc. All rights reserved.

cases versus controls on all SNPs, and compared observed w
2
values to
expected values under the null hypothesis in a quantile-quantile plot
(Fig. 3). The overdispersion of association test statistics (l ¼ 1.23)
implies a high number of false-positive associations in the raw data,
but this was greatly reduced by correction for self-reported ethnicity
(l ¼ 1.07), and became negligible when the first three principal
components from the eigenanalysis of population structure were
entered as covariates in logistic regression analysis (l ¼ 1.02). For
comparison, l has been estimated to be 1.03–1.11 in case-control
studies in the British population, a range considered acceptable for
GWA analysis
9
. Thus, with appropriate statistical correction, false-
positive GWA findings arising from the Gambian population structure
can be reduced to a very low level.
After PCA correction for population structure, we tested each SNP
for disease association using an unguided genotypic test with 2 degrees
of freedom (d.f.), as well as tests with 1 d.f. for models of dominance,
recessiveness, heterozygous advantage and trend. Cluster plots were
visually inspected on all potentially significant results, which yielded
139 SNPs with unequivocal genotype results in 100 independent
regions of the genome with Po 10
4
(Supplementary Table 3
online), including 6 with Po 10
6
(Fig. 4). The strongest signal of
association was close to the HBB gene on chromosome 11p15, where
the HbS polymorphism is located, with 13 SNPs at Po 10
4
and a
minimum of P ¼ 3.910
7
by trend test. In the following sections, we
examine the signal of association around the HbS polymorphism,
evaluate other known and putative malaria resistance–associated genes
and describe newly identified signals of association.
Fine-resolution association mapping at the HbS locus
HbS provides a benchmark for evaluating GWA methods, as the causal
polymorphism responsible for the malaria-protective effect is known:
it is a SNP (rs334) in the coding region of HBB on chromosome
11p15.4 which results in replacement of glutamic acid with valine at
amino-acid residue 6 of the b-globin chain. When we genotyped rs334
on the same samples used in the GWA study, using the Sequenom
iPlex platform, we found a much stronger signal of association
(P ¼ 1.3 10
28
). This raises several questions: why was the GWA
signal (P ¼ 3.9 10
7
) much weaker than the true effect; is there an
effective way to increase the GWA signal; and is there an effective way
to get from the GWA signal to identification of the causal variant?
To investigate these questions, we sequenced 111 kb in the center of
the GWA signal on chromosome 11p15 in a reference panel of 62
randomly selected Gambian controls (see Methods). These reference
data were used to impute genotypes for all B2,500 individuals in the
GWA study with the IMPUTE program
10
, and a trend test of
association was conducted at each imputed SNP. Out of 202 SNPs
examined across this 111 kb region, three imputed SNPs had stronger
signals of association than any of the SNPs genotyped on the initial
GWA scan (Fig. 5). The HbS causal polymorphism (rs334) stands out
as the imputed SNP with the strongest association (P ¼ 4.510
14
),
several orders of magnitude more significant than the strongest signal
from SNPs that were directly genotyped (P ¼ 3.910
7
).
This result provides proof of principle that it is possible to
identify the c ausal polymorphism within a GWA signal by regional
seq uencing followed by multipoint association mapping using
model-b ased i mputation, provided the appropriate re ference
panel is used. We observed two f eatures of LD in this region of
the genome in this population, which may together be favorab le for
fine mapping. First, over a 1-Mb region we identified 55 SNPs with
D¢ ¼ 1inrelationtotheHbScausalpolymorphismrs334(Fig. 6 ):
this is consistent with previous ev idence that the HbS allele is
associa ted with an extended haplotype a result of recent positive
selection
5,11
Second, the region as a whole has weak LD with a well-
known recombination hot spot
12
, and the correlation between
rs334 and neighboring SNPs does not exceed an r
2
of 0.36
(Fig. 6). In other words, there are no neighboring SNPs that are
sufficiently strongly correla ted with rs334 to imitate the true signal
of association generated from the causal variant. We are still at an
early stage of understanding how the process of fine mapping is
affected by different patterns of natural variation in the human
genome, and this example of extended haplotype within a region of
generally low LD provides an interesting case study.
In general, the performance of imputation strategies depends on the
overall information content the genotyped SNPs carry for the untyped
SNPs in the region, which was estimated at only 40% for rs334 for our
data. This may explain, in part, why the imputed association signal
(P ¼ 4.510
14
) was weaker than the value obtained when we
genotyped rs334 directly on the same samples (P ¼ 1.310
28
).
Expected quantileExpected quantile Expected quantile
Observed quantile
30
abc
25
20
15
10
0
30
20
10
0
5
25
20
15
10
0
5
20151005 20151005 20151005
Figure 3 Quantile-quantile plots of association
test statistic. (ac) Quantile-quantile plots of the
trend test statistic for the unstratified analysis,
which uses all 958 cases and 1,382 controls (a);
the ethnic-stratified analysis, which tests 854
cases, 1,195 controls from the four major ethnic
groups (b); and the PCA analysis, which corrects
for the first three principal components from
EIGENSTRAT and uses all 958 cases and 1,382
controls (c). The shaded region in gray represents
the lower and upper 95% probability bounds for
the expected quantiles.
Chromosome
22212019181716151413121110987654321
0
2
4
6
8
–log
10
(P)
Figure 4 Genome-wide signals of association with severe malaria. Plot of
the –log
10
P values for the trend test correcting for the first three principal
components from EIGENSTRAT. Each point represents a SNP from the
402,814 remaining after quality control filters were applied. Different
bands of blue are used to differentiate SNPs on consecutive autosomal
chromosomes. SNPs with P values less than 10
4
are represented
by red points.
NATURE GE NETI CS VOLUME 41
[
NUMBER 6
[
JUNE 2009 659
ARTICLES
© 2009 Nature America, Inc. All rights reserved.

The HapMap Yoruba sample has been used as the basis for
designing GWA genotyping arrays intended for African populations
in general, and these data provide an example of where this approach
may fail. When viewed at a macroscopic level, patterns of LD in The
Gambia and the HapMap Yoruba sample are similar, both for the
genome as a whole (Supplementary Fig. 3 online) and for the
genomic region around the HbS locus (Supplementary Fig. 4 online).
However, when we attempted to impute rs334 genotypes in our
Gambian data using the HapMap Yoruba as the reference panel, we
failed to identify any association signal (P ¼ 0.06). This may be
explained by the fact that the SNP on the Affymetrix array with the
strongest LD with rs334 in The Gambia (rs11036238, r
2
¼ 0.32) has
negligible LD in the HapMap Yoruba samples (r
2
¼ 0.009); conversely,
the SNP in strongest LD with rs334 in the HapMap Yoruba
samples (rs7936221, r
2
¼ 0.35) is not in LD with rs334 in The
Gambia (r
2
¼ 0.005). This is consistent with evidence that the HbS
allele has arisen independently in different African populations
11,13,14
.
Although the HbS allele may not be representative of genomic
variation as a whole, it highlights the possibility of local anomalies
particularly in regions under strong selective pressure, and thus raises
important questions about the design of an optimal SNP tagging
strategy for African populations in general.
Taken together, these findings support the v iew that low LD in
African populations can help to distinguish the causal poly-
morphism from neig hboring poly morphisms. But they also high-
light the importance of understanding regional variations in
haplotype structure w hen designing and interpreting GWA studies
in African populations, par ticularly for loci that are under selec-
tive pressure.
Signals at other known loci
The GWA analysis did not identify any of the well-known erythrocyte
variants that have been selected by malaria, other than HbS. This can
partly be explained by population genetic factors; for example, the
Duffy FY*O allele has reached fixation in The Gambia, whereas other
variants such those affecting hemoglobin C and southeast Asian
ovalocytosis are rare or absent in this population. We might have
expected associations at G6PD and HBA1-HBA2, the loci for glucose-
6-phosphate deficiency
15–17
and a+thalassaemia
18–20
, respectively, but
our GWA dataset had no SNP within 100 kb of G6PD and only one
SNP within 50 kb of HBA1-HBA2.
To investigate G6PD in more detail, we used the Sequenom iPlex
platform to genotype rs1050828, a coding polymorphism that has
received considerable attention as a marker of protection against
severe malaria
15,17
, although there are other
polymorphisms associated with reduced
G6PD enzyme activity that have been
less well studied in malaria and could possi-
bly also be involved
21
. The minor allele
frequency of rs1050828 in the Gambian
control sample was 0.03, considerably lower
than samples from Kenya (0.18) and Malawi
(0.19) that we genotyped by the same
method. Power to detect association with
rs1050828 in The Gambia is affected by
this low allele frequency, and the results
were consistent with a modest protective
effect but were not statistically significant:
odds ratio (OR) for male hemizygotes 0.71
(95% CI ¼ 0.34–1.49) and for female hetero-
zygotes 0.79 (95% CI ¼ 0.43–1.46). Even if it
had been a strong effect, it would not have
given a GWA signal because the best tagging
SNP for rs1050828 on the Affymetrix 500K
array had r
2
¼ 0.06.
We also examined the ABO locus,
where the functional variant is known and
an effect has been conclusively replicated
across different populations
22
. A previous
study combining case-control and family-
basedanalysesofB9,000 individuals in
three populations found that individuals
who are not of blood group O (as defined
by the functional variant rs8176719, a
splice-site insertion in the ABO gene) have
B1.2-fold increased risk of severe malaria
with a combined P value of 2 10
-7
(ref. 22). We genotyped rs8176719 in our
GWA sample and found an association that
was entirely consistent with previous data
(OR ¼ 1.26, 95% CI ¼ 1.11–1.44, P ¼
510
4
) but which would not have passed
our initial GWA significance threshold of
4.8
0
2
4
6
8
10
12
0
2
4
6
8
10
12
4.9 5.0 5.1 5.2 5.3 5.4 5.5 5.6 5.7
Recombination
rate (cM/Mb)
80
40
0
Genes
5.19 5.20 5.21 5.22 5.23 5.24 5.25 5.26 5.27 5.28
Recombination
rate (cM/Mb)
80
40
0
HBG2: G-gamma globin
HBB: beta globin
HBD: delta globin
HBG1: A-gamma globin
HBE1: epsilon globin
OR51B4: olfactory recepto
r
Genes
rs11036238
rs334
Novel SNP
rs11036711
−log
10
(P)
rs334
−log
10
(P)
Figure 5 Association signal at the HBB locus. The top panel shows the association signals across a
1-Mb region on chromosome 11 centering on rs334, with the vertical axis representing the –log
10
P values from the Armitage trend test. Points in black represent SNPs that are found on the Affymetrix
array, and points in red represent SNPs imputed with the resequenced Gambian reference panel. The
dashed lines in red indicate the start and end of the sequenced region. The bottom panels focus on the
110-kb sequenced region, together with a map of the recombination rates and genes found in the
region. Recombination rates and genes were extracted from the HapMap Genome Browser.
660 VOLUME 41
[
NUMBER 6
[
JUNE 2009 NATURE GENETICS
ARTICLES
© 2009 Nature America, Inc. All rights reserved.

P o 10
4
. The lack of a GWA signal can be explained by the fact that
the best tagging SNP had r
2
¼ 0.15.
Other SNP associations have been reported for malaria but have not
been conclusively replicated in large studies across different popula-
tions, and are mostly thought to be markers rather than true causal
variants. At seven loci (CD36, CD40LG, CR1, ICAM1, IL22, NOS2,
TNF) we genotyped nine candidate SNPs previously reported to show
association with malaria (Supplementary Table 4 online). A weak
association was identified at TNF for rs2516486 (P ¼ 0.02) but this
did not result in a GWA signal, as the best tagging SNP had r
2
¼ 0.51.
Other SNPs tested showed no significant association, but had they
done so it might have been missed by GWA analysis, as all candidate
SNPs were poorly tagged by the Affymetrix 500K array (median
r
2
¼ 0.45; range 0.01–0.61) (Supplementary Table 4).
In summary, the lack of GWA signals corresponding to previously
reported malaria associations can at least in part be explained by low
tagging efficiency of the Affymetrix 500K array in this population and
other causes of low statistical power, particularly low allele frequencies.
However these data also raise the question of how many previously
reported associations may have been false positives. In some cases an
authentic association may fail to replicate because the effect size was
overestimated in initial reports (‘winner’s curse’); because the fre-
quency of the causal variant varies between populations; because LD
between the marker SNP and the causal variant varies between
populations; or because the effect is complex, for example, due to
allelic heterogeneity or epistasis. These issues are currently being
addressed by the MalariaGEN consortium in a multicenter study
across 11 different malaria-endemic populations
4
.
Loci identified by genome-wide association analysis
From the above analyses it is clear that in the Gambian population the
Affymetrix 500K array may fail to detect authentic resistance loci with
weak effects, and that even strong genetic determinants may give
relatively weak GWA signals. In the following analysis we focus
primarily on GWA signals with P-values o10
4
, although it will be
important to follow up weaker GWA signals in future work.
4.8 5.0 5.2 5.4 5.6
0
0.1
0.2
0.3
0.4
0.5
Position (Mb)
r
2
4.8 5.0 5.2 5.4 5.6
0
0.2
0.4
0.6
0.8
1.0
Position (Mb)
D
Figure 6 Extent of LD surrounding HbS. Each point show r
2
(top panel) and
D¢ (bottom panel) between the HbS SNP (rs334) and SNPs in the Gambian
reference panel. The shaded pink region indicates the boundaries of the
resequenced region. The dashed vertical line indicates the position of rs334.
Table 1 Regions of the genome showing association
Chr. Region (Mb) No. of SNPs SNP Minor allele Model Case MAF Control MAF P value Odds ratio Nearby gene
1p34.1 46.0–46.08 4 rs10890361 A Het 0.37 0.36 9.4 10
6
1.46 (1.24–1.73) MAST2
1p31.1 72.9–72.96 2 rs10889990 A Trend 0.39 0.34 5.1 10
5
1.29 (1.14–1.46)
1p31.1 76.1–76.15 2 rs12405994 T Dom 0.01 0.03 8.2 10
7
0.33 (0.20–0.53) ASB17
2q37.1 231.7–231.71 5 rs10192428 G Dom 0.33 0.38 5.1 10
7
0.65 (0.55–0.77) SPATA3
3p22.1 43.10–43.11 1 rs488069 C Trend 0.27 0.21 7.6 10
7
1.42 (1.24–1.64) C3orf39
4p15.2 26.1–26.19 2 rs2046784 G Trend 0.27 0.33 2.5 10
5
0.76 (0.66–0.86) CCKAR
4q24 107.6–107.67 2 rs2949632 G Trend 0.27 0.33 6.5 10
6
0.74 (0.65–0.85) SCYE1
5p12 43.0–43.13 3 rs316414 A Het 0.16 0.18 4.5 10
7
0.66 (0.49–0.88) ZNF131
7p12.2 50.39–50.40 3 rs10249420 A Het 0.18 0.21 6.8 10
5
0.69 (0.57–0.83) DDC
7q32.3 131.5–131.53 2 rs10269601 A Trend 0.28 0.34 1.1 10
5
0.75 (0.66–0.85) PLXNA4
8p22 13.6–13.61 2 rs1384057 A Rec 0.16 0.20 2.4 10
5
0.32 (0.18–0.57) DLC1
10p12.2 23.1–23.14 2 rs11013140 T Rec 0.31 0.36 5.6 10
6
0.53 (0.40–0.70) PIP4K2A
10q24.2 101.0–101.1 2 rs11190062 A Trend 0.12 0.09 1.3 10
5
0.65 (0.54–0.79) CNNM1
10q26.13 126.2–126.4 2 rs7076268 C Het 0.22 0.25 1.6 10
5
0.68 (0.56–0.81) FAM53B
11p15.4 4.6–5.6 13 rs11036238 C Trend 0.09 0.14 3.9 10
7
0.61 (0.50–0.74) HBB
13q31.3 92.49–92.52 2 rs1444227 C Het 0.15 0.11 3.6 10
5
1.54 (1.25–1.89) GPC6
14q21.2 45.43–45.54 3 rs17728971 G Rec 0.16 0.12 6.5 10
7
1.63 (1.34–1.98)
16q13 56.0–56.1 2 rs16957051 C Dom 0.15 0.20 3.4 10
5
0.68 (0.56–0.81) CIAPIN1
19p13.3 6.8–6.9 2 rs460375 G Dom 0.45 0.48 1.2 10
5
0.66 (0.55–0.79) EMR1
Shown are regions meeting either of two criteria: (i) at least two SNPs with P value o 10
4
for the PC-corrected analysis within 250 kb of each other or (ii) a SNP with P value
o 10
6
. For each region, we report the SNP with the strongest signal and the model which gave the signal. The minor allele is defined with respect to the controls and each odds
ratio is defined for the minor alleles. The most significant model is reported, either trend, dominant (dom), recessive (rec) or heterozygous advantage (het). Nearby gene is defined as
the closest gene within 200 kb of the region.
NATURE GE NETI CS VOLUME 41
[
NUMBER 6
[
JUNE 2009 661
ARTICLES
© 2009 Nature America, Inc. All rights reserved.

Citations
More filters
Journal ArticleDOI

Genomewide Association Studies and Assessment of the Risk of Disease

TL;DR: The design of genomewide association studies is described and the extent to which the data they provide are useful in predicting the risk of disease is considered.
Journal ArticleDOI

Genotype Imputation with Thousands of Genomes

TL;DR: An alternative framework for imputation methods for genome-wide association studies is developed, built around a new approximation that makes it computationally efficient to use all available reference haplotypes, and it is demonstrated that the approximation improves efficiency in large, sequence-based reference panels.
Journal ArticleDOI

Malaria biology and disease pathogenesis: insights for new treatments.

TL;DR: The current understanding of the biology of asexual blood-stage parasites and gametocytes and the ability to culture them in vitro lends optimism that high-throughput screenings of large chemical libraries will produce a new generation of antimalarial drugs.
Journal ArticleDOI

The African Genome Variation Project shapes medical genetics in Africa

TL;DR: It is shown that modern imputation panels (sets of reference genotypes from which unobserved or missing genotypes in study sets can be inferred) can identify association signals at highly differentiated loci across populations in sub-Saharan Africa.
References
More filters
Journal ArticleDOI

Inference of population structure using multilocus genotype data

TL;DR: Pritch et al. as discussed by the authors proposed a model-based clustering method for using multilocus genotype data to infer population structure and assign individuals to populations, which can be applied to most of the commonly used genetic markers, provided that they are not closely linked.
Journal ArticleDOI

Principal components analysis corrects for stratification in genome-wide association studies

TL;DR: This work describes a method that enables explicit detection and correction of population stratification on a genome-wide scale and uses principal components analysis to explicitly model ancestry differences between cases and controls.
Journal ArticleDOI

Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls

Paul Burton, +195 more
- 07 Jun 2007 - 
TL;DR: This study has demonstrated that careful use of a shared control group represents a safe and effective approach to GWA analyses of multiple disease phenotypes; generated a genome-wide genotype database for future studies of common diseases in the British population; and shown that, provided individuals with non-European ancestry are excluded, the extent of population stratification in theBritish population is generally modest.
Journal ArticleDOI

A haplotype map of the human genome

John W. Belmont, +232 more
TL;DR: A public database of common variation in the human genome: more than one million single nucleotide polymorphisms for which accurate and complete genotypes have been obtained in 269 DNA samples from four populations, including ten 500-kilobase regions in which essentially all information about common DNA variation has been extracted.
Journal ArticleDOI

A second generation human haplotype map of over 3.1 million SNPs

Kelly A. Frazer, +237 more
- 18 Oct 2007 - 
TL;DR: The Phase II HapMap is described, which characterizes over 3.1 million human single nucleotide polymorphisms genotyped in 270 individuals from four geographically diverse populations and includes 25–35% of common SNP variation in the populations surveyed, and increased differentiation at non-synonymous, compared to synonymous, SNPs is demonstrated.
Related Papers (5)

Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls

Paul Burton, +195 more
- 07 Jun 2007 - 

A second generation human haplotype map of over 3.1 million SNPs

Kelly A. Frazer, +237 more
- 18 Oct 2007 - 
Frequently Asked Questions (17)
Q1. What are the contributions mentioned in the paper "Genome-wide and fine-resolution association analysis of malaria in west africa" ?

Muminatou Jallow et al. this paper proposed a Malaria Genomic Epidemiology Network ( MGEN ). 

In the near future, this limiting factor will be overcome by advances in genome sequencing technologies, through initiatives such as the 1000 Genomes Project. 

The problem is that replication of association at multiple locations depends on the allele frequency of the marker SNP and the causal variant, as well as the LD between the marker SNP and the causal variant, being relatively constant across locations. 

Development of an optimal genome-wide SNP genotyping plat-form for use in Africa would help to strengthen the signals of association that are directly observed at the first stage of GWA analysis, as well as increase the accuracy of imputation. 

MalariaGEN’s primary funding is from the Wellcome Trust (grant number 077383/Z/05/Z) and from the Bill & Melinda Gates Foundation, through the Foundation for the National Institutes of Health (grant number 566) as part of the Grand Challenges in Global Health initiative. 

The major limiting factor, at all stages of GWA analysis in Africa, is the need for population-specific data on genome sequence variation. 

The authors found that the GWA signal around the HbS variant can be boosted by several orders of magnitude by imputation, from P¼ 3.9 10 7 to 4.5 10 14. 

The challenge is to determine the optimal number of genotyped SNPs that, when combined with genome-wide resequencing data from a representative sample of the same population, would allow accurate imputation of all common variants. 

To avoid edge effects in haplotype phasing and imputation, the data for each sequenced sample was extended by including SNPs from the Affymetrix array flanking both ends of the sequenced region, creating a 1-Mb region centered onrs334 (at 5,204,808 in build 35) from 4,705,000 to 5,705,000 spanning 453 SNPs in total. 

The MalariaGEN Resource Centre is part of the European Union Network of Excellence on the Biology and Pathology of Malaria Parasites. 

The sequenced region spans 110 kb from 5,179,297 to 5,289,530, and encompasses all five beta-globin genes (HBB, HBD, HBE1, HBG1, HBG2) and an olfactory receptor (OR51B1). 

The genetic diversity found across Africa increases the imperative for the data underpinning imputation to be population specific. 

By estimating the protective effect of the HbAS genotype in this study sample the authors can exclude high rates of diagnostic misclassification, which can arise when other severe diseases mimic the clinical features of severe malaria: the authors found ORs of 0.12 (95% CI ¼ 0.07– 0.21) for cerebral malaria, 0.10 (0.04–0.24) for severe malaria anemia, 0.08 (0.02–0.38) for respiratory distress and 0.09 (0.05–0.16) for severe malaria in general. 

The Wellcome Trust (Sanger Institute core funding) and the Medical Research Council (grant number G0600230) provide additional support for genotyping, bioinformatics and analysis. 

This is an additional reason to carry out high-resolution multipoint imputation at the first stage of GWA analysis, as it allows putative causal variants to be tested directly in different populations. 

In practice it is difficult to achieve this threshold in Africa, because of weak LD between the marker SNPs that are genotyped and causal variants. 

At the first stage of GWA analysis, screening many SNPs across the genome, a stringent threshold for statistical significance is used to reduce false-positive rates.