scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Genomic control for association studies.

01 Dec 1999-Biometrics (Biometrics)-Vol. 55, Iss: 4, pp 997-1004
TL;DR: The performance of the genomic control method is quite good for plausible effects of liability genes, which bodes well for future genetic analyses of complex disorders.
Abstract: A dense set of single nucleotide polymorphisms (SNP) covering the genome and an efficient method to assess SNP genotypes are expected to be available in the near future. An outstanding question is how to use these technologies efficiently to identify genes affecting liability to complex disorders. To achieve this goal, we propose a statistical method that has several optimal properties: It can be used with case control data and yet, like family-based designs, controls for population heterogeneity; it is insensitive to the usual violations of model assumptions, such as cases failing to be strictly independent; and, by using Bayesian outlier methods, it circumvents the need for Bonferroni correction for multiple tests, leading to better performance in many settings while still constraining risk for false positives. The performance of our genomic control method is quite good for plausible effects of liability genes, which bodes well for future genetic analyses of complex disorders.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: This work introduces PLINK, an open-source C/C++ WGAS tool set, and describes the five main domains of function: data management, summary statistics, population stratification, association analysis, and identity-by-descent estimation, which focuses on the estimation and use of identity- by-state and identity/descent information in the context of population-based whole-genome studies.
Abstract: Whole-genome association studies (WGAS) bring new computational, as well as analytic, challenges to researchers. Many existing genetic-analysis tools are not designed to handle such large data sets in a convenient manner and do not necessarily exploit the new opportunities that whole-genome data bring. To address these issues, we developed PLINK, an open-source C/C++ WGAS tool set. With PLINK, large data sets comprising hundreds of thousands of markers genotyped for thousands of individuals can be rapidly manipulated and analyzed in their entirety. As well as providing tools to make the basic analytic steps computationally efficient, PLINK also supports some novel approaches to whole-genome data that take advantage of whole-genome coverage. We introduce PLINK and describe the five main domains of function: data management, summary statistics, population stratification, association analysis, and identity-by-descent estimation. In particular, we focus on the estimation and use of identity-by-state and identity-by-descent information in the context of population-based whole-genome studies. This information can be used to detect and correct for population stratification and to identify extended chromosomal segments that are shared identical by descent between very distantly related individuals. Analysis of the patterns of segmental sharing has the potential to map disease loci that contain multiple rare variants in a population-based linkage analysis.

26,280 citations

Journal ArticleDOI
TL;DR: This work describes a method that enables explicit detection and correction of population stratification on a genome-wide scale and uses principal components analysis to explicitly model ancestry differences between cases and controls.
Abstract: Population stratification—allele frequency differences between cases and controls due to systematic ancestry differences—can cause spurious associations in disease studies. We describe a method that enables explicit detection and correction of population stratification on a genome-wide scale. Our method uses principal components analysis to explicitly model ancestry differences between cases and controls. The resulting correction is specific to a candidate marker’s variation in frequency across ancestral populations, minimizing spurious associations while maximizing power to detect true associations. Our simple, efficient approach can easily be applied to disease studies with hundreds of thousands of markers. Population stratification—allele frequency differences between cases and controls due to systematic ancestry differences—can cause spurious associations in disease studies 1‐8 . Because the effects of stratification vary in proportion to the number of samples 9 , stratification will be an increasing problem in the large-scale association studies of the future, which will analyze thousands of samples in an effort to detect common genetic variants of weak effect. The two prevailing methods for dealing with stratification are genomic control and structured association 9‐14 . Although genomic control and structured association have proven useful in a variety of contexts, they have limitations. Genomic control corrects for stratification by adjusting association statistics at each marker by a uniform overall inflation factor. However, some markers differ in their allele frequencies across ancestral populations more than others. Thus, the uniform adjustment applied by genomic control may be insufficient at markers having unusually strong differentiation across ancestral populations and may be superfluous at markers devoid of such differentiation, leading to a loss in power. Structured association uses a program such as STRUCTURE 15 to assign the samples to discrete subpopulation clusters and then aggregates evidence of association within each cluster. If fractional membership in more than one cluster is allowed, the method cannot currently be applied to genome-wide association studies because of its intensive computational cost on large data sets. Furthermore, assignments of individuals to clusters are highly sensitive to the number of clusters, which is not well defined 14,16 .

9,387 citations

Journal ArticleDOI
Paul Burton1, David Clayton2, Lon R. Cardon, Nicholas John Craddock3  +192 moreInstitutions (4)
07 Jun 2007-Nature
TL;DR: This study has demonstrated that careful use of a shared control group represents a safe and effective approach to GWA analyses of multiple disease phenotypes; generated a genome-wide genotype database for future studies of common diseases in the British population; and shown that, provided individuals with non-European ancestry are excluded, the extent of population stratification in theBritish population is generally modest.
Abstract: There is increasing evidence that genome-wide association ( GWA) studies represent a powerful approach to the identification of genes involved in common human diseases. We describe a joint GWA study ( using the Affymetrix GeneChip 500K Mapping Array Set) undertaken in the British population, which has examined similar to 2,000 individuals for each of 7 major diseases and a shared set of similar to 3,000 controls. Case-control comparisons identified 24 independent association signals at P < 5 X 10(-7): 1 in bipolar disorder, 1 in coronary artery disease, 9 in Crohn's disease, 3 in rheumatoid arthritis, 7 in type 1 diabetes and 3 in type 2 diabetes. On the basis of prior findings and replication studies thus-far completed, almost all of these signals reflect genuine susceptibility effects. We observed association at many previously identified loci, and found compelling evidence that some loci confer risk for more than one of the diseases studied. Across all diseases, we identified a large number of further signals ( including 58 loci with single-point P values between 10(-5) and 5 X 10(-7)) likely to yield additional susceptibility loci. The importance of appropriately large samples was confirmed by the modest effect sizes observed at most loci identified. This study thus represents a thorough validation of the GWA approach. It has also demonstrated that careful use of a shared control group represents a safe and effective approach to GWA analyses of multiple disease phenotypes; has generated a genome-wide genotype database for future studies of common diseases in the British population; and shown that, provided individuals with non-European ancestry are excluded, the extent of population stratification in the British population is generally modest. Our findings offer new avenues for exploring the pathophysiology of these important disorders. We anticipate that our data, results and software, which will be widely available to other investigators, will provide a powerful resource for human genetics research.

9,244 citations

Journal ArticleDOI
21 Jun 2012-Nature
TL;DR: The results provide a novel molecular stratification of the breast cancer population, derived from the impact of somatic CNAs on the transcriptome, and identify novel subgroups with distinct clinical outcomes, which reproduced in the validation cohort.
Abstract: The elucidation of breast cancer subgroups and their molecular drivers requires integrated views of the genome and transcriptome from representative numbers of patients. We present an integrated analysis of copy number and gene expression in a discovery and validation set of 997 and 995 primary breast tumours, respectively, with long-term clinical follow-up. Inherited variants (copy number variants and single nucleotide polymorphisms) and acquired somatic copy number aberrations (CNAs) were associated with expression in 40% of genes, with the landscape dominated by cisand trans-acting CNAs. By delineating expression outlier genes driven in cis by CNAs, we identified putative cancer genes, including deletions in PPP2R2A, MTAP and MAP2K4. Unsupervised analysis of paired DNA–RNA profiles revealed novel subgroups with distinct clinical outcomes, which reproduced in the validation cohort. These include a high-risk, oestrogen-receptor-positive 11q13/14 cis-acting subgroup and a favourable prognosis subgroup devoid of CNAs. Trans-acting aberration hotspots were found to modulate subgroup-specific gene networks, including a TCR deletion-mediated adaptive immune response in the ‘CNA-devoid’ subgroup and a basal-specific chromosome 5 deletion-associated mitotic network. Our results provide a novel molecular stratification of the breast cancer population, derived from the impact of somatic CNAs on the transcriptome.

4,722 citations


Cites background from "Genomic control for association stu..."

  • ...5, similar to the genomic control approach [37]....

    [...]

Journal ArticleDOI
TL;DR: An approach to studying population structure (principal components analysis) is discussed that was first applied to genetic data by Cavalli-Sforza and colleagues, and results from modern statistics are used to develop formal significance tests for population differentiation.
Abstract: Current methods for inferring population structure from genetic data do not provide formal significance tests for population differentiation. We discuss an approach to studying population structure (principal components analysis) that was first applied to genetic data by Cavalli-Sforza and colleagues. We place the method on a solid statistical footing, using results from modern statistics to develop formal significance tests. We also uncover a general “phase change” phenomenon about the ability to detect structure in genetic data, which emerges from the statistical theory we use, and has an important implication for the ability to discover structure in genetic data: for a fixed but large dataset size, divergence between two populations (as measured, for example, by a statistic like FST) below a threshold is essentially undetectable, but a little above threshold, detection will be easy. This means that we can predict the dataset size needed to detect structure.

4,456 citations


Cites background from "Genomic control for association stu..."

  • ...If e([1]) is the principal eigenvector of the matrix X, this means that the sum of squares...

    [...]

  • ...The second eigenvector e([2]) maximizes the same expression with the constraint that e([1]), e([2]) are orthogonal, and so on....

    [...]

  • ...Are the individuals from a homogeneous population or from a population containing subgroups that are genetically distinct? Can we find evidence for substructure in the data, and quantify it? This question of detecting and quantifying structure arises in medical genetics, for instance, in case-control studies where uncorrected population structure can induce false positives [1]....

    [...]

References
More filters
Book
22 Dec 2012
TL;DR: An overview of statistical decision theory, which emphasizes the use and application of the philosophical ideas and mathematical structure of decision theory.
Abstract: 1. Basic concepts 2. Utility and loss 3. Prior information and subjective probability 4. Bayesian analysis 5. Minimax analysis 6. Invariance 7. Preposterior and sequential analysis 8. Complete and essentially complete classes Appendices.

5,573 citations


"Genomic control for association stu..." refers background in this paper

  • ...Determining if is 1 or 0 is a binary Bayesian hypothesis testing problem ( Berger 1985 )....

    [...]

Journal ArticleDOI
13 Sep 1996-Science
TL;DR: The identification of the genetic basis of complex human diseases such as schizophrenia and diabetes has proven difficult as mentioned in this paper, and Risch and Merikangas proposed that they can best accomplish this goal by combining the power of the human genome project with association studies.
Abstract: The identification of the genetic basis of complex human diseases such as schizophrenia and diabetes has proven difficult. In their Perspective, Risch and Merikangas propose that we can best accomplish this goal by combining the power of the human genome project with association studies, a method for determining the basis of a genetic disease.

5,143 citations

Journal ArticleDOI
TL;DR: An introduction to population genetics theory, An introduction to Population Genetics Theory, Population Genetics theory, Population genetics theory as discussed by the authors, Population genetics, population genetics, and population genetics theories, Population Genetic Theory
Abstract: An introduction to population genetics theory , An introduction to population genetics theory , مرکز فناوری اطلاعات و اطلاع رسانی کشاورزی

4,817 citations

Book
01 Jan 1970
TL;DR: An introduction to population genetics theory, An introduction to Population Genetics theory, and more.
Abstract: An introduction to population genetics theory , An introduction to population genetics theory , مرکز فناوری اطلاعات و اطلاع رسانی کشاورزی

4,273 citations


"Genomic control for association stu..." refers background in this paper

  • ...Several conditions must be met: (a) the loci under study must not have very different mutation rates (Chakraborty and Jin, 1992); (b) they cannot be under strong and subpopulation-specific selection (Crow and Kimura, 1970); and (c) with respect to population substructure, F should not vary greatly across loci....

    [...]