scispace - formally typeset
Open AccessJournal ArticleDOI

Genome-wide association studies with high-dimensional phenotypes

Reads0
Chats0
TLDR
The experiments show that canonical correlation analysis has higher power than alternative methods, while remaining computationally tractable for routine use in the GWAS setting, provided the number of samples is sufficient compared to the numbers of phenotype and genotype variables tested.
Abstract
High-dimensional phenotypes hold promise for richer findings in association studies, but testing of several phenotype traits aggravates the grand challenge of association studies, that of multiple testing. Several methods have recently been proposed for testing jointly all traits in a high-dimensional vector of phenotypes, with prospect of increased power to detect small effects that would be missed if tested individually. However, the methods have rarely been compared to the extent of enabling assessment of their relative merits and setting up guidelines on which method to use, and how to use it. We compare the methods on simulated data and with a real metabolomics data set comprising 137 highly correlated variables and approximately 550,000 SNPs. Applying the methods to genome-wide data with hundreds of thousands of markers inevitably requires division of the problem into manageable parts facilitating parallel processing, parts corresponding to individual genetic variants, pathways, or genes, for example. Here we utilize a straightforward formulation according to which the genome is divided into blocks of nearby correlated genetic markers, tested jointly for association with the phenotypes. This formulation is computationally feasible, reduces the number of tests, and lets the methods take advantage of combining information over several correlated variables not only on the phenotype side, but also on the genotype side. Our experiments show that canonical correlation analysis has higher power than alternative methods, while remaining computationally tractable for routine use in the GWAS setting, provided the number of samples is sufficient compared to the numbers of phenotype and genotype variables tested. Sparse canonical correlation analysis and regression models with latent confounding factors show promising performance when the number of samples is small compared to the dimensionality of the data.

read more

Citations
More filters
Journal ArticleDOI

Regularized Machine Learning in the Genetic Prediction of Complex Traits

TL;DR: It is argued here that many medical applications of machine learning models in genetic disease risk prediction rely essentially on two factors: effective model regularization and rigorous model validation.
Book ChapterDOI

Association mapping in plants in the post-GWAS genomics era.

TL;DR: The second half of the review is devoted to activities in post-GWAS era, which include different methods that are being used for identification of causal variants and their prioritization, functional characterization of candidate signals, gene- and gene-set based association mapping, GWAS using high dimensional data through machine learning, etc.
Journal ArticleDOI

A Tutorial on Canonical Correlation Methods

TL;DR: Canonical correlation analysis is a family of multivariate statistical methods for the analysis of paired sets of variables as mentioned in this paper, which has been extended to extract relations between pairs of variables when the sample size is insufficient in relation to the data dimensionality, when the relations have been considered to be non-linear, and when the dimensionality is too large for human interpretation.
Journal ArticleDOI

Penalized Multimarker vs. Single-Marker Regression Methods for Genome-Wide Association Studies of Quantitative Traits

TL;DR: It is found that PR with FDR control provides substantially more power than SMA with genome-wide type-I error control but somewhat less power than Hochberg FDR control (SMA-BH), and the analytic FDR criterion may currently be the best approach to SNP selection using PR for GWAS.
References
More filters
Journal ArticleDOI

Regression Shrinkage and Selection via the Lasso

TL;DR: A new method for estimation in linear models called the lasso, which minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant, is proposed.
Journal ArticleDOI

Haploview: analysis and visualization of LD and haplotype maps

TL;DR: Haploview is a software package that provides computation of linkage disequilibrium statistics and population haplotype patterns from primary genotype data in a visually appealing and interactive interface.
Journal ArticleDOI

dbSNP: the NCBI database of genetic variation

TL;DR: The dbSNP database is a general catalog of genome variation to address the large-scale sampling designs required by association studies, gene mapping and evolutionary biology, and is integrated with other sources of information at NCBI such as GenBank, PubMed, LocusLink and the Human Genome Project data.
Book ChapterDOI

Relations Between Two Sets of Variates

TL;DR: The concept of correlation and regression may be applied not only to ordinary one-dimensional variates but also to variates of two or more dimensions as discussed by the authors, where the correlation of the horizontal components is ordinarily discussed, whereas the complex consisting of horizontal and vertical deviations may be even more interesting.
Journal ArticleDOI

A second generation human haplotype map of over 3.1 million SNPs

Kelly A. Frazer, +237 more
- 18 Oct 2007 - 
TL;DR: The Phase II HapMap is described, which characterizes over 3.1 million human single nucleotide polymorphisms genotyped in 270 individuals from four geographically diverse populations and includes 25–35% of common SNP variation in the populations surveyed, and increased differentiation at non-synonymous, compared to synonymous, SNPs is demonstrated.
Related Papers (5)