Genome-wide association studies with high-dimensional phenotypes

doi:10.1515/SAGMB-2012-0032

Open AccessJournal ArticleDOI

Genome-wide association studies with high-dimensional phenotypes

Pekka Marttinen, +4 more

- 01 Aug 2013 -

Statistical Applications in Genetics and...

- Vol. 12, Iss: 4, pp 413-431

Chats0

TLDR

The experiments show that canonical correlation analysis has higher power than alternative methods, while remaining computationally tractable for routine use in the GWAS setting, provided the number of samples is sufficient compared to the numbers of phenotype and genotype variables tested.

Abstract:

High-dimensional phenotypes hold promise for richer findings in association studies, but testing of several phenotype traits aggravates the grand challenge of association studies, that of multiple testing. Several methods have recently been proposed for testing jointly all traits in a high-dimensional vector of phenotypes, with prospect of increased power to detect small effects that would be missed if tested individually. However, the methods have rarely been compared to the extent of enabling assessment of their relative merits and setting up guidelines on which method to use, and how to use it. We compare the methods on simulated data and with a real metabolomics data set comprising 137 highly correlated variables and approximately 550,000 SNPs. Applying the methods to genome-wide data with hundreds of thousands of markers inevitably requires division of the problem into manageable parts facilitating parallel processing, parts corresponding to individual genetic variants, pathways, or genes, for example. Here we utilize a straightforward formulation according to which the genome is divided into blocks of nearby correlated genetic markers, tested jointly for association with the phenotypes. This formulation is computationally feasible, reduces the number of tests, and lets the methods take advantage of combining information over several correlated variables not only on the phenotype side, but also on the genotype side. Our experiments show that canonical correlation analysis has higher power than alternative methods, while remaining computationally tractable for routine use in the GWAS setting, provided the number of samples is sufficient compared to the numbers of phenotype and genotype variables tested. Sparse canonical correlation analysis and regression models with latent confounding factors show promising performance when the number of samples is small compared to the dimensionality of the data.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Regularized Machine Learning in the Genetic Prediction of Complex Traits

Sebastian Okser, +5 more

- 13 Nov 2014 -

PLOS Genetics

TL;DR: It is argued here that many medical applications of machine learning models in genetic disease risk prediction rely essentially on two factors: effective model regularization and rigorous model validation.

...read moreread less

Journal ArticleDOI

metaCCA: summary statistics-based multivariate meta-analysis of genome-wide association studies using canonical correlation analysis

Anna Cichonska, +11 more

- 01 Jul 2016 -

Bioinformatics

TL;DR: MetaCCA as discussed by the authors is a computational framework for summary statistics-based analysis of a single or multiple studies that allows multivariate representation of both genotype and phenotype, and employs a covariance shrinkage algorithm to achieve robustness.

...read moreread less

Book ChapterDOI

Association mapping in plants in the post-GWAS genomics era.

Pushpendra Kumar Gupta, +2 more

- 01 Jan 2019 -

Advances in Genetics

TL;DR: The second half of the review is devoted to activities in post-GWAS era, which include different methods that are being used for identification of causal variants and their prioritization, functional characterization of candidate signals, gene- and gene-set based association mapping, GWAS using high dimensional data through machine learning, etc.

...read moreread less

Journal ArticleDOI

A Tutorial on Canonical Correlation Methods

Viivi Uurtio, +5 more

- 22 Nov 2017 -

ACM Computing Surveys

TL;DR: Canonical correlation analysis is a family of multivariate statistical methods for the analysis of paired sets of variables as mentioned in this paper, which has been extended to extract relations between pairs of variables when the sample size is insufficient in relation to the data dimensionality, when the relations have been considered to be non-linear, and when the dimensionality is too large for human interpretation.

...read moreread less

Journal ArticleDOI

Penalized Multimarker vs. Single-Marker Regression Methods for Genome-Wide Association Studies of Quantitative Traits

Hui Yi, +5 more

- 01 Jan 2015 -

Genetics

TL;DR: It is found that PR with FDR control provides substantially more power than SMA with genome-wide type-I error control but somewhat less power than Hochberg FDR control (SMA-BH), and the analytic FDR criterion may currently be the best approach to SNP selection using PR for GWAS.

...read moreread less

References

PDF

Open Access

More filters

Journal ArticleDOI

Regression Shrinkage and Selection via the Lasso

Robert Tibshirani

- 01 Jan 1996 -

Journal of the royal statistical society...

TL;DR: A new method for estimation in linear models called the lasso, which minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant, is proposed.

...read moreread less

Journal ArticleDOI

Haploview: analysis and visualization of LD and haplotype maps

Jeffrey C. Barrett, +3 more

- 15 Jan 2005 -

Bioinformatics

TL;DR: Haploview is a software package that provides computation of linkage disequilibrium statistics and population haplotype patterns from primary genotype data in a visually appealing and interactive interface.

...read moreread less

Journal ArticleDOI

dbSNP: the NCBI database of genetic variation

Stephen T. Sherry, +6 more

- 01 Jan 2001 -

Nucleic Acids Research

TL;DR: The dbSNP database is a general catalog of genome variation to address the large-scale sampling designs required by association studies, gene mapping and evolutionary biology, and is integrated with other sources of information at NCBI such as GenBank, PubMed, LocusLink and the Human Genome Project data.

...read moreread less

Book ChapterDOI

Relations Between Two Sets of Variates

Harold Hotelling

- 01 Dec 1936 -

Biometrika

TL;DR: The concept of correlation and regression may be applied not only to ordinary one-dimensional variates but also to variates of two or more dimensions as discussed by the authors, where the correlation of the horizontal components is ordinarily discussed, whereas the complex consisting of horizontal and vertical deviations may be even more interesting.

...read moreread less

Journal ArticleDOI

A second generation human haplotype map of over 3.1 million SNPs

Kelly A. Frazer, +237 more

- 18 Oct 2007 -

Nature

TL;DR: The Phase II HapMap is described, which characterizes over 3.1 million human single nucleotide polymorphisms genotyped in 270 individuals from four geographically diverse populations and includes 25–35% of common SNP variation in the populations surveyed, and increased differentiation at non-synonymous, compared to synonymous, SNPs is demonstrated.

...read moreread less