scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Limited statistical evidence for shared genetic effects of eQTLs and autoimmune-disease-associated loci in three major immune-cell types

TL;DR: It is shown that a fraction of gene-regulatory changes suggest strong mechanistic hypotheses for disease risk, but it is concluded that most risk mechanisms are not likely to involve changes in basal gene expression.
Abstract: Most autoimmune-disease-risk effects identified by genome-wide association studies (GWAS) localize to open chromatin with gene-regulatory activity. GWAS loci are also enriched in expression quantitative trait loci (eQTLs), thus suggesting that most risk variants alter gene expression. However, because causal variants are difficult to identify, and cis-eQTLs occur frequently, it remains challenging to identify specific instances of disease-relevant changes to gene regulation. Here, we used a novel joint likelihood framework with higher resolution than that of previous methods to identify loci where autoimmune-disease risk and an eQTL are driven by a single shared genetic effect. Using eQTLs from three major immune subpopulations, we found shared effects in only ∼25% of the loci examined. Thus, we show that a fraction of gene-regulatory changes suggest strong mechanistic hypotheses for disease risk, but we conclude that most risk mechanisms are not likely to involve changes in basal gene expression.

Content maybe subject to copyright    Report

Limited statistical evidence for shared genetic
effects of eQTLs and autoimmune disease-
associated loci in three major immune cell types
Citation
Chun, Sung, Alexandra Casparino, Nikolaos A Patsopoulos, Damien C Croteau-Chonka,
Benjamin A Raby, Philip L De Jager, Shamil R Sunyaev, and Chris Cotsapas. 2017. “Limited
statistical evidence for shared genetic effects of eQTLs and autoimmune disease-associated loci
in three major immune cell types.” Nature genetics 49 (4): 600-605. doi:10.1038/ng.3795. http://
dx.doi.org/10.1038/ng.3795.
Published Version
doi:10.1038/ng.3795
Permanent link
http://nrs.harvard.edu/urn-3:HUL.InstRepos:34375208
Terms of Use
This article was downloaded from Harvard University’s DASH repository, and is made available
under the terms and conditions applicable to Other Posted Material, as set forth at http://
nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of-use#LAA
Share Your Story
The Harvard community has made this article openly available.
Please share how this access benefits you. Submit a story .
Accessibility

Limited statistical evidence for shared genetic effects of eQTLs
and autoimmune disease-associated loci in three major immune
cell types
Sung Chun
1,2,3
, Alexandra Casparino
4
, Nikolaos A Patsopoulos
3,5,6
, Damien C Croteau-
Chonka
2,7
, Benjamin A Raby
2,7
, Philip L De Jager
2,3,5,6
, Shamil R Sunyaev
1,2,3,*
, and Chris
Cotsapas
3,4,8,*
1
Division of Genetics, Brigham and Women’s Hospital, Boston MA, USA
2
Department of Medicine, Harvard School of Medicine, Boston MA, USA
3
Broad Institute of Harvard and MIT, Cambridge MA, USA
4
Department of Neurology, Yale School of Medicine, New Haven CT, USA
5
Department of Neurology, Brigham and Women’s Hospital, Boston MA, USA
6
Ann Romney Center for Neurological Diseases, Brigham and Women’s Hospital, Boston MA,
USA
7
Channing Division of Network Medicine, Brigham and Women’s Hospital, Boston MA, USA
8
Department of Genetics, Yale School of Medicine, New Haven CT, USA
Most autoimmune disease risk effects identified by genome-wide association studies
(GWAS) localize to open chromatin with gene regulatory activity. GWAS loci are also
enriched for expression quantitative trait loci (eQTLs), suggesting that most risk variants
alter gene expression
1,2
. However, because causal variants are difficult to identify and
cis
-
eQTLs occur frequently, it remains challenging to identify specific instances of disease-
relevant changes to gene regulation. Here, we use a novel joint likelihood framework with
Users may view, print, copy, and download text and data-mine the content in such documents, for the purposes of academic research,
subject always to the full Conditions of use: http://www.nature.com/authors/editorial_policies/license.html#terms
*
correspondence to: SRS (ssunyaev@rics.bwh.harvard.edu) and CC (cotsapas@broadinstitute.org).
Code availability
The current implementation of JLIM is available from the Cotsapas and Sunyaev labs: http://www.github.com/cotsapaslab/jlim and
http://genetics.bwh.harvard.edu/wiki/sunyaevlab/jlim
Data availability
The publicly available 1000 Genomes genotype data were downloaded from: ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/
20130502/
. The publicly available gEUVADIS LCL eQTL data were accessed via EBI ArrayExpress site under accession E-GEUV-1.
Gene expression data for CD4
+
T cell and CD14
+
monocytes were accessed viaNCBI Gene Expression Omnibus accession no.
GSE56035. Immunochip GWAS summary statistics are available at http://www.immunobase.org.
Author Contributions
SC designed and performed research and authored the manuscript; AC performed research; NP contributed data and approved the
manuscript, DCC contributed data and approved the manuscript; BR contributed data and approved the manuscript; PDJ contributed
data and approved the manuscript; SRS designed and performed research and authored the manuscript; CC designed and performed
research and authored the manuscript.
Competing Financial Interests statement
The authors declare no competing financial interests.
HHS Public Access
Author manuscript
Nat Genet
. Author manuscript; available in PMC 2017 August 20.
Published in final edited form as:
Nat Genet
. 2017 April ; 49(4): 600–605. doi:10.1038/ng.3795.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript

higher resolution than previous methods to identify loci where autoimmune disease risk and
an eQTL are driven by a single, shared genetic effect. Using eQTLs from three major
immune subpopulations, we find shared effects in only ~25% of loci. Thus, we uncover a
fraction of gene regulatory changes as strong mechanistic hypotheses for disease risk, but
conclude that most risk mechanisms likely do not involve changes to basal gene expression.
The autoimmune and inflammatory diseases (AID) are heritable, complex diseases where
loss of tolerance to self-antigens results in either systemic or tissue-specific immune
attack
3,4
. GWAS have identified hundreds of genomic regions mediating risk to several AID.
These associations are primarily non-coding: lead GWAS SNPs are more likely to be
associated with expression levels of neighboring genes than expected by chance
12,13
, and the
same lead SNPs are enriched in regulatory regions marked by chromatin accessibility and
modification
1,14
. Fine-mapping reveals enrichment of AID-associated variants in enhancer
elements active in stimulated T cell subpopulations
15
, with heritability strongly enriched in
such regulatory regions
16,17
. Collectively, these strands of evidence suggest that the majority
of disease risk is mediated by changes to gene regulation in specific cell subpopulations.
However, these bulk analyses do not formally assess whether expression levels and disease
risk can be attributed to a single underlying variant or to independent effects in a locus
18,19
.
Though several methods have been developed to assess these alternatives using eQTL
data
20–23
, they show limited resolution to detect cases where distinct disease and eQTL
causal variants are in linkage disequilibrium. Here, we present an approach to test if a
GWAS risk association and an eQTL are driven by the same underlying genetic effect,
accounting for the LD between causal variants. Using data from ImmunoChip studies of
seven AID comprising >180,000 samples in total (Supplementary Table 1), we test if
associations in 272 known risk loci are consistent with
cis
-eQTL for genes in each region,
measured in three relevant immune cell populations: lymphoblastoid cell lines (LCLs),
CD4
+
T cells and CD14
+
monocytes
24,25
.
When associations to two traits – here, disease trait and eQTL – are driven by the same
underlying causal variant, the joint evidence of association should be maximized at the
markers in tightest LD with the causal variant
19,26
. Here, we directly evaluate this joint
likelihood (Supplementary Figure 1), unlike previous approaches that look for similarities in
the shape of the association curve over multiple markers
20,21,27,28
. When the underlying
causal effect is shared, joint likelihood is maximized when we model the same causal variant
in both traits; conversely, when the underlying causal variants are different, we expect
maximum joint likelihood when we model their closest proxies. We empirically derive the
null distribution of the joint likelihood ratio statistic by comparing disease associations to
permuted eQTL data(see Methods, Supplementary Figure 2 and Supplementary Notes). We
thus directly evaluate whether two associations in the same locus, observed in different
cohorts, are due to the same underlying effect.
To assess the performance of our method, we benchmarked it against three recently reported
methods:
coloc
20
, a well-calibrated Bayesian framework that considers spatial similarities in
association data across sets of markers; gwas-pw
29
, which extends this idea to hierarchical
priors and optimizes model parameters; and HEIDI/SMR
22
, which applies Mendelian
Chun et al.
Page 2
Nat Genet
. Author manuscript; available in PMC 2017 August 20.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript

randomization between traits. We simulated pairs of case-control cohorts with either the
same or distinct causal variants driving association, and find that our approach shows the
best overall performance (Supplementary Tables 2 and 3). When independent causal variants
(i.e. not in LD) drive GWAS and eQTL associations, our own method,
coloc
and gwas-pw
all had excellent performance. As the LD between the causal variants increases, our method
shows the best performance, maintaining high resolution even when the underlying causal
variants are in strong LD (AUC = 0.883 when 0.7 < r
2
< 0.8, Supplementary Figures 3 and
4), whereas the other methods show substantial false positive rates, reporting distinct effects
as shared. We also found that our method is robust to within-continent levels of population
structure (Supplementary Figures 5 and 6), and when limiting analysis to a subset of SNPs
for computational efficiency (Supplementary Figure 7;
coloc
fares similarly, Supplementary
Figure 8). Our method also performs well when multiple independent causal variants affect
one or both traits (Supplementary Figures 9–11). In practice, our resolution becomes limited
at high LD levels (r
2
>0.8), where the false positive rate increases dramatically. We also have
limited resolution when the eQTL effect is very weak (
p
> 0.01, Supplementary Figures 12–
15). Thus, within these limits, we can accurately detect cases of shared genetic effects
between two traits.
To dissect AID risk loci, we first identified densely genotyped ImmunoChip loci showing
genome-wide significant association, excluding the Major Histocompatibility Locus due to
the extensive LD structure in the region (immunobase.org; Table 1). We next identified
genes in a 1Mb window centered on the most associated variant in each locus. Consistent
with previous observations that eQTLs are frequently found in GWAS loci, we found that
260/272 loci had at least one gene with an eQTL (p < 0.01) in at least one cell type, with
most such effects common across all three tissues (Table 1). We tested if any eQTLs in these
loci appear driven by the same underlying effect as the disease associations. We find
evidence for shared effects for only 77/5,749 pairs in 55/260 (21%) loci across all diseases,
with the proportion varying from 4/34 (12%) for rheumatoid arthritis loci to 6/10 (60%) for
ulcerative colitis loci (false discovery rate < 5%; Tables 1 and 2). Of these 77 shared effects,
45 pass even the more stringent family-wise multiple testing correction (Bonferroni
corrected P < 0.05). Thus, our analysis reveals that in the majority of AID loci, variants
causally involved in disease phenotypes do not overlap variants responsible for eQTL signals
in the three broad cell populations we analyzed, which represent the major arms of the
immune lineages. Overall, we find that >75% of tested disease-eQTL pairs appear associated
to distinct genetic variants in the same locus (Figure 1).
We sought to explain this lack of overlap between disease associations and eQTLs, despite
their frequent co-occurrence in the same loci. In particular, although our method showed
good performance in simulated data (Supplementary Figure 4), we remained concerned that
this lack of overlap may be due to low statistical power in the eQTL data, which come from
cohorts of limited sample size. However, we find that even amongst the most strongly
supported eQTLs (nominal
p
< 10
−5
), <25% show evidence of shared effects with disease
associations. Conversely, we find strong evidence for distinct effects for the majority of
disease-eQTL pairs, with only a subset of comparisons being ambiguous, suggesting that our
method is adequately powered to detect shared effects where they exist (Figure 1a and
Chun et al.
Page 3
Nat Genet
. Author manuscript; available in PMC 2017 August 20.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript

Supplementary Figures 16–18). To assess whether power affects the total number of loci,
rather than eQTL, that can be resolved, we looked more deeply at our significance threshold
settings. We find that more liberal thresholds do not increase the number of true positive
results after adjusting for false positive rate, indicating that most loci do not contain
any
gene with an eQTL consistent with the disease association (Figure 1b and Supplementary
Figure 19). Cumulatively, our results demonstrate that only a minority of AID risk effects
drive eQTLs in the three cell populations we tested, which are drawn from diverse lineages
of the immune system.
We next focused on the subset of 77 disease/eQTL pairs in 55 loci where we could detect
strong evidence of a shared effect (Table 2). We find that 59/77 (77%) of effects are
restricted to one cell population, indicating that tissue-specific eQTLs are important
components of the molecular underpinnings of disease (Supplementary Figures 20 and 21).
The remaining 18 effects are detected in multiple cell populations; for example, the multiple
sclerosis association at rs10783847 on chromosome 12 is consistent with eQTLs for the
transcript of methyltransferase-like 21B (
METTL21B
) in both CD4
+
T cells and CD14
+
monocytes, but not for the remaining 31 genes in the immediate locus (Figure 2). Although
METTL21B
is expressed in LCLs, there is no evidence of an eQTL in this tissue within
1Mb from rs10783847. Similarly, for the multiple sclerosis association at rs1966115 on
chromosome 8 and eQTLs for
ZC2HC1A
, and for the inflammatory bowel disease
association at rs55770741 on chromosome 5 and eQTLs for
ERAP2
, we detect a shared
effect in all three cell populations. In several cases we find tissue-specific shared effects
despite strong eQTLs for the same gene in other tissues: for
ZFP90
and ulcerative colitis risk
at rs889561 on chromosome 16, we also find shared effects in CD4
+
and CD14
+
but not
LCLs, where we observe a
ZFP90
eQTL at p = 0.005 that has a low likelihood of shared
effect with GWAS (joint likelihood P = 0.85). Instead, we find evidence of sharing between
disease risk and an eQTL for
NFAT5
in LCLs. Thus, despite the presence of eQTLs for a
gene in multiple tissues, not all these effects are consistent with disease associations
suggesting that disease-relevant eQTLs are tissue specific.
We also find cases where an eQTL is consistent with associations to multiple diseases. The
ankyrin repeat domain 55 (
ANKRD55
) transcript encoded on chromosome 5 has an eQTL in
CD4
+
T cells that is shared with associations to multiple sclerosis, Crohn disease and
rheumatoid arthritis (Figure 3, all observations are significant after Bonferroni correction).
We also find weaker evidence for shared effects between all three diseases and an eQTL for
interleukin 6 signal transducer (
IL6ST
) in CD4
+
T cells, which passes the false discovery
rate threshold but not Bonferroni correction (Supplementary Figure 22). Similarly, a CD4
+
eQTL for
ELMO1
on chromosome 7 is consistent with associations to both celiac disease
and multiple sclerosis (Supplementary Figure 23), a CD14
+
eQTL for
RGS1
on
chromosome 1 is consistent with associations to both celiac disease and multiple sclerosis
(Supplementary Figure 24), and three other eQTLs are consistent with associations in
multiple diseases (Supplementary Figures 25–27). In all cases, these are the only genome-
wide significant disease associations reported in these loci. As we consider each disease
association independently, these results indicate that the same underlying risk variants drive
risk to multiple diseases in these loci by altering gene expression, consistent with
observations of shared effects across diseases
7
.
Chun et al.
Page 4
Nat Genet
. Author manuscript; available in PMC 2017 August 20.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript

Citations
More filters
Journal ArticleDOI
15 Jun 2017-Cell
TL;DR: It is proposed that gene regulatory networks are sufficiently interconnected such that all genes expressed in disease-relevant cells are liable to affect the functions of core disease-related genes and that most heritability can be explained by effects on genes outside core pathways.

2,257 citations


Cites background from "Limited statistical evidence for sh..."

  • ...Furthermore, although GWAS hits are highly enriched in active chromatin, only a modest fraction can currently be explained by known eQTLs (Chun et al., 2017)....

    [...]

Journal ArticleDOI
TL;DR: This review outlines how newly developed methods can be used together to improve the reliability of Mendelian randomization and discusses the burgeoning treasure trove of genetic associations yielded through genome wide association studies.
Abstract: Pleiotropy, the phenomenon of a single genetic variant influencing multiple traits, is likely widespread in the human genome. If pleiotropy arises because the single nucleotide polymorphism (SNP) influences one trait, which in turn influences another ('vertical pleiotropy'), then Mendelian randomization (MR) can be used to estimate the causal influence between the traits. Of prime focus among the many limitations to MR is the unprovable assumption that apparent pleiotropic associations are mediated by the exposure (i.e. reflect vertical pleiotropy), and do not arise due to SNPs influencing the two traits through independent pathways ('horizontal pleiotropy'). The burgeoning treasure trove of genetic associations yielded through genome wide association studies makes for a tantalizing prospect of phenome-wide causal inference. Recent years have seen substantial attention devoted to the problem of horizontal pleiotropy, and in this review we outline how newly developed methods can be used together to improve the reliability of MR.

640 citations

Journal ArticleDOI
TL;DR: A transcriptome- wide association study integrating genome-wide association data with expression data from brain, blood and adipose tissues identifies new candidate susceptibility genes for schizophrenia, providing a step toward understanding the underlying biology.
Abstract: Genome-wide association studies (GWAS) have identified over 100 risk loci for schizophrenia, but the causal mechanisms remain largely unknown. We performed a transcriptome-wide association study (TWAS) integrating a schizophrenia GWAS of 79,845 individuals from the Psychiatric Genomics Consortium with expression data from brain, blood, and adipose tissues across 3,693 primarily control individuals. We identified 157 TWAS-significant genes, of which 35 did not overlap a known GWAS locus. Of these 157 genes, 42 were associated with specific chromatin features measured in independent samples, thus highlighting potential regulatory targets for follow-up. Suppression of one identified susceptibility gene, mapk3, in zebrafish showed a significant effect on neurodevelopmental phenotypes. Expression and splicing from the brain captured most of the TWAS effect across all genes. This large-scale connection of associations to target genes, tissues, and regulatory features is an essential step in moving toward a mechanistic understanding of GWAS.

379 citations

Journal ArticleDOI
TL;DR: A genome-wide association study of a broad allergic disease phenotype that considers the presence of any one of these three diseases identified 136 independent risk variants, including 73 not previously reported, which implicate 132 nearby genes in allergic disease pathophysiology.
Abstract: Asthma, hay fever (or allergic rhinitis) and eczema (or atopic dermatitis) often coexist in the same individuals, partly because of a shared genetic origin. To identify shared risk variants, we performed a genome-wide association study (GWAS; n = 360,838) of a broad allergic disease phenotype that considers the presence of any one of these three diseases. We identified 136 independent risk variants (P < 3 × 10-8), including 73 not previously reported, which implicate 132 nearby genes in allergic disease pathophysiology. Disease-specific effects were detected for only six variants, confirming that most represent shared risk factors. Tissue-specific heritability and biological process enrichment analyses suggest that shared risk variants influence lymphocyte-mediated immunity. Six target genes provide an opportunity for drug repositioning, while for 36 genes CpG methylation was found to influence transcription independently of genetic effects. Asthma, hay fever and eczema partly coexist because they share many genetic risk variants that dysregulate the expression of immune-related genes.

378 citations

Journal ArticleDOI
TL;DR: New findings from studies performed on human β-cells or on samples obtained from patients with type 1 or type 2 diabetes mellitus are highlighted, focusing on studies performed at the β-cell level and the identification and characterization of the role of T1DM and T2DM candidate genes at theβ-celllevel.
Abstract: Loss of functional β-cell mass is the key mechanism leading to the two main forms of diabetes mellitus - type 1 diabetes mellitus (T1DM) and type 2 diabetes mellitus (T2DM). Understanding the mechanisms behind β-cell failure is critical to prevent or revert disease. Basic pathogenic differences exist in the two forms of diabetes mellitus; T1DM is immune mediated and T2DM is mediated by metabolic mechanisms. These mechanisms differentially affect early β-cell dysfunction and eventual fate. Over the past decade, major advances have been made in the field, mostly delivered by studies on β-cells in human disease. These advances include studies of islet morphology and human β-cell gene expression in T1DM and T2DM, the identification and characterization of the role of T1DM and T2DM candidate genes at the β-cell level and the endoplasmic reticulum stress signalling that contributes to β-cell failure in T1DM (mostly IRE1 driven) and T2DM (mostly PERK-eIF2α dependent). Here, we review these new findings, focusing on studies performed on human β-cells or on samples obtained from patients with diabetes mellitus.

331 citations

References
More filters
Journal ArticleDOI
TL;DR: This work introduces PLINK, an open-source C/C++ WGAS tool set, and describes the five main domains of function: data management, summary statistics, population stratification, association analysis, and identity-by-descent estimation, which focuses on the estimation and use of identity- by-state and identity/descent information in the context of population-based whole-genome studies.
Abstract: Whole-genome association studies (WGAS) bring new computational, as well as analytic, challenges to researchers. Many existing genetic-analysis tools are not designed to handle such large data sets in a convenient manner and do not necessarily exploit the new opportunities that whole-genome data bring. To address these issues, we developed PLINK, an open-source C/C++ WGAS tool set. With PLINK, large data sets comprising hundreds of thousands of markers genotyped for thousands of individuals can be rapidly manipulated and analyzed in their entirety. As well as providing tools to make the basic analytic steps computationally efficient, PLINK also supports some novel approaches to whole-genome data that take advantage of whole-genome coverage. We introduce PLINK and describe the five main domains of function: data management, summary statistics, population stratification, association analysis, and identity-by-descent estimation. In particular, we focus on the estimation and use of identity-by-state and identity-by-descent information in the context of population-based whole-genome studies. This information can be used to detect and correct for population stratification and to identify extended chromosomal segments that are shared identical by descent between very distantly related individuals. Analysis of the patterns of segmental sharing has the potential to map disease loci that contain multiple rare variants in a population-based linkage analysis.

26,280 citations

Journal ArticleDOI
TL;DR: This work describes a method that enables explicit detection and correction of population stratification on a genome-wide scale and uses principal components analysis to explicitly model ancestry differences between cases and controls.
Abstract: Population stratification—allele frequency differences between cases and controls due to systematic ancestry differences—can cause spurious associations in disease studies. We describe a method that enables explicit detection and correction of population stratification on a genome-wide scale. Our method uses principal components analysis to explicitly model ancestry differences between cases and controls. The resulting correction is specific to a candidate marker’s variation in frequency across ancestral populations, minimizing spurious associations while maximizing power to detect true associations. Our simple, efficient approach can easily be applied to disease studies with hundreds of thousands of markers. Population stratification—allele frequency differences between cases and controls due to systematic ancestry differences—can cause spurious associations in disease studies 1‐8 . Because the effects of stratification vary in proportion to the number of samples 9 , stratification will be an increasing problem in the large-scale association studies of the future, which will analyze thousands of samples in an effort to detect common genetic variants of weak effect. The two prevailing methods for dealing with stratification are genomic control and structured association 9‐14 . Although genomic control and structured association have proven useful in a variety of contexts, they have limitations. Genomic control corrects for stratification by adjusting association statistics at each marker by a uniform overall inflation factor. However, some markers differ in their allele frequencies across ancestral populations more than others. Thus, the uniform adjustment applied by genomic control may be insufficient at markers having unusually strong differentiation across ancestral populations and may be superfluous at markers devoid of such differentiation, leading to a loss in power. Structured association uses a program such as STRUCTURE 15 to assign the samples to discrete subpopulation clusters and then aggregates evidence of association within each cluster. If fractional membership in more than one cluster is allowed, the method cannot currently be applied to genome-wide association studies because of its intensive computational cost on large data sets. Furthermore, assignments of individuals to clusters are highly sensitive to the number of clusters, which is not well defined 14,16 .

9,387 citations

Journal ArticleDOI
TL;DR: This work proposes an approach to measuring statistical significance in genomewide studies based on the concept of the false discovery rate, which offers a sensible balance between the number of true and false positives that is automatically calibrated and easily interpreted.
Abstract: With the increase in genomewide experiments and the sequencing of multiple genomes, the analysis of large data sets has become commonplace in biology. It is often the case that thousands of features in a genomewide data set are tested against some null hypothesis, where a number of features are expected to be significant. Here we propose an approach to measuring statistical significance in these genomewide studies based on the concept of the false discovery rate. This approach offers a sensible balance between the number of true and false positives that is automatically calibrated and easily interpreted. In doing so, a measure of statistical significance called the q value is associated with each tested feature. The q value is similar to the well known p value, except it is a measure of significance in terms of the false discovery rate rather than the false positive rate. Our approach avoids a flood of false positive results, while offering a more liberal criterion than what has been used in genome scans for linkage.

9,239 citations

Journal ArticleDOI
28 Oct 2010-Nature
TL;DR: The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation as a foundation for investigating the relationship between genotype and phenotype as mentioned in this paper, and the results of the pilot phase of the project, designed to develop and compare different strategies for genomewide sequencing with high-throughput platforms.
Abstract: The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation as a foundation for investigating the relationship between genotype and phenotype. Here we present results of the pilot phase of the project, designed to develop and compare different strategies for genome-wide sequencing with high-throughput platforms. We undertook three projects: low-coverage whole-genome sequencing of 179 individuals from four populations; high-coverage sequencing of two mother-father-child trios; and exon-targeted sequencing of 697 individuals from seven populations. We describe the location, allele frequency and local haplotype structure of approximately 15 million single nucleotide polymorphisms, 1 million short insertions and deletions, and 20,000 structural variants, most of which were previously undescribed. We show that, because we have catalogued the vast majority of common variation, over 95% of the currently accessible variants found in any individual are present in this data set. On average, each person is found to carry approximately 250 to 300 loss-of-function variants in annotated genes and 50 to 100 variants previously implicated in inherited disorders. We demonstrate how these results can be used to inform association and functional studies. From the two trios, we directly estimate the rate of de novo germline base substitution mutations to be approximately 10(-8) per base pair per generation. We explore the data with regard to signatures of natural selection, and identify a marked reduction of genetic variation in the neighbourhood of genes, due to selection at linked sites. These methods and public data will support the next phase of human genetic research.

7,538 citations

Journal ArticleDOI
TL;DR: The GCTA software is a versatile tool to estimate and partition complex trait variation with large GWAS data sets and focuses on the function of estimating the variance explained by all the SNPs on the X chromosome and testing the hypotheses of dosage compensation.
Abstract: For most human complex diseases and traits, SNPs identified by genome-wide association studies (GWAS) explain only a small fraction of the heritability. Here we report a user-friendly software tool called genome-wide complex trait analysis (GCTA), which was developed based on a method we recently developed to address the “missing heritability” problem. GCTA estimates the variance explained by all the SNPs on a chromosome or on the whole genome for a complex trait rather than testing the association of any particular SNP to the trait. We introduce GCTA's five main functions: data management, estimation of the genetic relationships from SNPs, mixed linear model analysis of variance explained by the SNPs, estimation of the linkage disequilibrium structure, and GWAS simulation. We focus on the function of estimating the variance explained by all the SNPs on the X chromosome and testing the hypotheses of dosage compensation. The GCTA software is a versatile tool to estimate and partition complex trait variation with large GWAS data sets.

5,867 citations

Related Papers (5)
19 Feb 2015-Nature
Anshul Kundaje, Wouter Meuleman, Wouter Meuleman, Jason Ernst, Misha Bilenky, Angela Yen, Angela Yen, Alireza Heravi-Moussavi, Pouya Kheradpour, Pouya Kheradpour, Zhizhuo Zhang, Zhizhuo Zhang, Jianrong Wang, Jianrong Wang, Michael J. Ziller, Viren Amin, John W. Whitaker, Matthew D. Schultz, Lucas D. Ward, Lucas D. Ward, Abhishek Sarkar, Abhishek Sarkar, Gerald Quon, Gerald Quon, Richard Sandstrom, Matthew L. Eaton, Matthew L. Eaton, Yi-Chieh Wu, Yi-Chieh Wu, Andreas R. Pfenning, Andreas R. Pfenning, Xinchen Wang, Xinchen Wang, Melina Claussnitzer, Melina Claussnitzer, Yaping Liu, Yaping Liu, Cristian Coarfa, R. Alan Harris, Noam Shoresh, Charles B. Epstein, Elizabeta Gjoneska, Elizabeta Gjoneska, Danny Leung, Wei Xie, R. David Hawkins, Ryan Lister, Chibo Hong, Philippe Gascard, Andrew J. Mungall, Richard A. Moore, Eric Chuah, Angela Tam, Theresa K. Canfield, R. Scott Hansen, Rajinder Kaul, Peter J. Sabo, Mukul S. Bansal, Mukul S. Bansal, Mukul S. Bansal, Annaick Carles, Jesse R. Dixon, Kai How Farh, Soheil Feizi, Soheil Feizi, Rosa Karlic, Ah Ram Kim, Ah Ram Kim, Ashwinikumar Kulkarni, Daofeng Li, Rebecca F. Lowdon, Ginell Elliott, Tim R. Mercer, Shane Neph, Vitor Onuchic, Paz Polak, Paz Polak, Nisha Rajagopal, Pradipta R. Ray, Richard C Sallari, Richard C Sallari, Kyle Siebenthall, Nicholas A Sinnott-Armstrong, Nicholas A Sinnott-Armstrong, Michael Stevens, Robert E. Thurman, Jie Wu, Bo Zhang, Xin Zhou, Arthur E. Beaudet, Laurie A. Boyer, Philip L. De Jager, Philip L. De Jager, Peggy J. Farnham, Susan J. Fisher, David Haussler, Steven J.M. Jones, Steven J.M. Jones, Wei Li, Marco A. Marra, Michael T. McManus, Shamil R. Sunyaev, Shamil R. Sunyaev, James A. Thomson, Thea D. Tlsty, Li-Huei Tsai, Li-Huei Tsai, Wei Wang, Robert A. Waterland, Michael Q. Zhang, Lisa Helbling Chadwick, Bradley E. Bernstein, Bradley E. Bernstein, Bradley E. Bernstein, Joseph F. Costello, Joseph R. Ecker, Martin Hirst, Alexander Meissner, Aleksandar Milosavljevic, Bing Ren, John A. Stamatoyannopoulos, Ting Wang, Manolis Kellis, Manolis Kellis