Genome-wide association studies with high-dimensional phenotypes
Reads0
Chats0
TLDR
The experiments show that canonical correlation analysis has higher power than alternative methods, while remaining computationally tractable for routine use in the GWAS setting, provided the number of samples is sufficient compared to the numbers of phenotype and genotype variables tested.Abstract:
High-dimensional phenotypes hold promise for richer findings in association studies, but testing of several phenotype traits aggravates the grand challenge of association studies, that of multiple testing. Several methods have recently been proposed for testing jointly all traits in a high-dimensional vector of phenotypes, with prospect of increased power to detect small effects that would be missed if tested individually. However, the methods have rarely been compared to the extent of enabling assessment of their relative merits and setting up guidelines on which method to use, and how to use it. We compare the methods on simulated data and with a real metabolomics data set comprising 137 highly correlated variables and approximately 550,000 SNPs. Applying the methods to genome-wide data with hundreds of thousands of markers inevitably requires division of the problem into manageable parts facilitating parallel processing, parts corresponding to individual genetic variants, pathways, or genes, for example. Here we utilize a straightforward formulation according to which the genome is divided into blocks of nearby correlated genetic markers, tested jointly for association with the phenotypes. This formulation is computationally feasible, reduces the number of tests, and lets the methods take advantage of combining information over several correlated variables not only on the phenotype side, but also on the genotype side. Our experiments show that canonical correlation analysis has higher power than alternative methods, while remaining computationally tractable for routine use in the GWAS setting, provided the number of samples is sufficient compared to the numbers of phenotype and genotype variables tested. Sparse canonical correlation analysis and regression models with latent confounding factors show promising performance when the number of samples is small compared to the dimensionality of the data.read more
Citations
More filters
Journal ArticleDOI
Regularized Machine Learning in the Genetic Prediction of Complex Traits
Sebastian Okser,Tapio Pahikkala,Antti Airola,Tapio Salakoski,Samuli Ripatti,Tero Aittokallio +5 more
TL;DR: It is argued here that many medical applications of machine learning models in genetic disease risk prediction rely essentially on two factors: effective model regularization and rigorous model validation.
Journal ArticleDOI
metaCCA: summary statistics-based multivariate meta-analysis of genome-wide association studies using canonical correlation analysis
Anna Cichonska,Juho Rousu,Pekka Marttinen,Antti J. Kangas,Pasi Soininen,Terho Lehtimäki,Olli T. Raitakari,Marjo-Riitta Järvelin,Veikko Salomaa,Mika Ala-Korpela,Samuli Ripatti,Matti Pirinen +11 more
TL;DR: MetaCCA as discussed by the authors is a computational framework for summary statistics-based analysis of a single or multiple studies that allows multivariate representation of both genotype and phenotype, and employs a covariance shrinkage algorithm to achieve robustness.
Book ChapterDOI
Association mapping in plants in the post-GWAS genomics era.
TL;DR: The second half of the review is devoted to activities in post-GWAS era, which include different methods that are being used for identification of causal variants and their prioritization, functional characterization of candidate signals, gene- and gene-set based association mapping, GWAS using high dimensional data through machine learning, etc.
Journal ArticleDOI
A Tutorial on Canonical Correlation Methods
Viivi Uurtio,Joao M. Monteiro,Jaz S. Kandola,John Shawe-Taylor,Delmiro Fernandez-Reyes,Juho Rousu +5 more
TL;DR: Canonical correlation analysis is a family of multivariate statistical methods for the analysis of paired sets of variables as mentioned in this paper, which has been extended to extract relations between pairs of variables when the sample size is insufficient in relation to the data dimensionality, when the relations have been considered to be non-linear, and when the dimensionality is too large for human interpretation.
Journal ArticleDOI
Penalized Multimarker vs. Single-Marker Regression Methods for Genome-Wide Association Studies of Quantitative Traits
TL;DR: It is found that PR with FDR control provides substantially more power than SMA with genome-wide type-I error control but somewhat less power than Hochberg FDR control (SMA-BH), and the analytic FDR criterion may currently be the best approach to SNP selection using PR for GWAS.
References
More filters
Journal ArticleDOI
Regression Shrinkage and Selection via the Lasso
TL;DR: A new method for estimation in linear models called the lasso, which minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant, is proposed.
Journal ArticleDOI
Haploview: analysis and visualization of LD and haplotype maps
TL;DR: Haploview is a software package that provides computation of linkage disequilibrium statistics and population haplotype patterns from primary genotype data in a visually appealing and interactive interface.
Journal ArticleDOI
dbSNP: the NCBI database of genetic variation
Stephen T. Sherry,Minghong Ward,Michael Kholodov,Jonathan Baker,Lon Phan,Elizabeth M. Smigielski,Karl Sirotkin +6 more
TL;DR: The dbSNP database is a general catalog of genome variation to address the large-scale sampling designs required by association studies, gene mapping and evolutionary biology, and is integrated with other sources of information at NCBI such as GenBank, PubMed, LocusLink and the Human Genome Project data.
Book ChapterDOI
Relations Between Two Sets of Variates
TL;DR: The concept of correlation and regression may be applied not only to ordinary one-dimensional variates but also to variates of two or more dimensions as discussed by the authors, where the correlation of the horizontal components is ordinarily discussed, whereas the complex consisting of horizontal and vertical deviations may be even more interesting.
Journal ArticleDOI
A second generation human haplotype map of over 3.1 million SNPs
Kelly A. Frazer,Dennis G. Ballinger,David R. Cox,David A. Hinds,Laura L. Stuve,Richard A. Gibbs,John W. Belmont,Andrew Boudreau,Paul Hardenbol,Suzanne M. Leal,Shiran Pasternak,David A. Wheeler,Thomas D. Willis,Fuli Yu,Huanming Yang,Changqing Zeng,Gao Yang,H. B. Hu,Weitao Hu,Chaohua Li,Wei Lin,Siqi Liu,Hao Pan,Xiaoli Tang,Jian Wang,Wei Wang,Jun Yu,Bo Zhang,Qingrun Zhang,Hongbin Zhao,Hui Zhao,Jun Zhou,Stacey Gabriel,Rachel Barry,Brendan Blumenstiel,Amy L. Camargo,Matthew Defelice,Maura Faggart,Mary Goyette,Supriya Gupta,Jamie Moore,Huy Nguyen,Robert C. Onofrio,Melissa Parkin,Jessica Roy,Erich Stahl,Ellen Winchester,Liuda Ziaugra,David Altshuler,Yan Shen,Zhijian Yao,Wei Huang,Xun Chu,Yungang He,Li Jin,Yangfan Liu,Yayun Shen,Weiwei Sun,Haifeng Wang,Yi Wang,Ying Wang,Xiaoyan Xiong,Liang Xu,Mary M.Y. Waye,Stephen Kwok-Wing Tsui,Hong Xue,J. Tze Fei Wong,Luana Galver,Jian-Bing Fan,Kevin L. Gunderson,Sarah S. Murray,Arnold Oliphant,Mark S. Chee,Alexandre Montpetit,Fanny Chagnon,Vincent Ferretti,Martin Leboeuf,Jean François Olivier,Michael S. Phillips,Stéphanie Roumy,Clémentine Sallée,Andrei Verner,Thomas J. Hudson,Pui-Yan Kwok,Dongmei Cai,Daniel C. Koboldt,Raymond D. Miller,Ludmila Pawlikowska,Patricia Taillon-Miller,Ming Xiao,Lap-Chee Tsui,William Mak,Qiang Song You,Paul K.H. Tam,Yusuke Nakamura,Takahisa Kawaguchi,Takuya Kitamoto,Takashi Morizono,Atsushi Nagashima,Yozo Ohnishi,Akihiro Sekine,Toshihiro Tanaka,Tatsuhiko Tsunoda,Panos Deloukas,Christine P. Bird,Marcos Delgado,Emmanouil T. Dermitzakis,Rhian Gwilliam,Sarah E. Hunt,Jonathan J. Morrison,Don Powell,Barbara E. Stranger,Pamela Whittaker,David R. Bentley,Mark J. Daly,Paul I.W. de Bakker,Jeffrey C. Barrett,Yves Chretien,Julian Maller,Steve McCarroll,Nick Patterson,Itsik Pe'er,Alkes L. Price,Shaun Purcell,Daniel J. Richter,Pardis C. Sabeti,Richa Saxena,Stephen F. Schaffner,Pak C. Sham,Patrick Varilly,Lincoln Stein,Lalitha Krishnan,Albert V. Smith,Marcela K. Tello-Ruiz,Gudmundur A. Thorisson,Aravinda Chakravarti,Peter E. Chen,David J. Cutler,Carl S. Kashuk,Shin Lin,Gonçalo R. Abecasis,Weihua Guan,Yun Li,Heather M. Munro,Zhaohui S. Qin,Daryl J. Thomas,Gilean McVean,Adam Auton,Leonardo Bottolo,Niall Cardin,Susana Eyheramendy,Colin Freeman,Jonathan Marchini,Simon Myers,Chris C. A. Spencer,Matthew Stephens,Peter Donnelly,Lon R. Cardon,Geraldine M. Clarke,David M. Evans,Andrew P. Morris,Bruce S. Weir,Todd A. Johnson,James C. Mullikin,Stephen T. Sherry,Michael Feolo,Andrew D. Skol,Houcan Zhang,Ichiro Matsuda,Yoshimitsu Fukushima,Darryl Macer,Eiko Suda,Charles N. Rotimi,Clement Adebamowo,Ike Ajayi,Toyin Aniagwu,Patricia A. Marshall,Chibuzor Nkwodimmah,Charmaine D.M. Royal,Mark Leppert,Missy Dixon,Andy Peiffer,Renzong Qiu,Alastair Kent,Kazuto Kato,Norio Niikawa,Isaac F. Adewole,Bartha Maria Knoppers,Morris W. Foster,Ellen Wright Clayton,Jessica Watkin,Donna M. Muzny,Lynne V. Nazareth,Erica Sodergren,George M. Weinstock,Imtaz Yakub,Bruce W. Birren,Richard K. Wilson,Lucinda Fulton,Jane Rogers,John Burton,Nigel P. Carter,C M Clee,Mark Griffiths,Matthew C. Jones,Kirsten McLay,Robert W. Plumb,Mark T. Ross,Sarah Sims,David Willey,Zhu Chen,Hua Han,Le Kang,Martin Godbout,John C. Wallenburg,Paul L'Archevêque,Guy Bellemare,Koji Saeki,Hongguang Wang,Daochang An,Hongbo Fu,Qing Li,Zhen Wang,Renwu Wang,Arthur L. Holden,Lisa D. Brooks,Jean E. McEwen,Mark S. Guyer,Vivian Ota Wang,Jane Peterson,Michael Shi,Jack Spiegel,Lawrence M. Sung,Lynn F. Zacharia,Francis S. Collins,Karen Kennedy,Ruth Jamieson,John Stewart +237 more
TL;DR: The Phase II HapMap is described, which characterizes over 3.1 million human single nucleotide polymorphisms genotyped in 270 individuals from four geographically diverse populations and includes 25–35% of common SNP variation in the populations surveyed, and increased differentiation at non-synonymous, compared to synonymous, SNPs is demonstrated.
Related Papers (5)
Genome-wide association study identifies multiple loci influencing human serum metabolite levels
Johannes Kettunen,Johannes Kettunen,Taru Tukiainen,Antti-Pekka Sarin,Antti-Pekka Sarin,Alfredo Ortega-Alonso,Emmi Tikkanen,Emmi Tikkanen,Leo-Pekka Lyytikäinen,Antti J. Kangas,Pasi Soininen,Pasi Soininen,Peter Würtz,Peter Würtz,Peter Würtz,Kaisa Silander,Kaisa Silander,Danielle M. Dick,Richard J. Rose,Richard J. Rose,Markku J. Savolainen,Jorma Viikari,Mika Kähönen,Terho Lehtimäki,Kirsi H. Pietiläinen,Kirsi H. Pietiläinen,Michael Inouye,Michael Inouye,Mark I. McCarthy,Mark I. McCarthy,Antti Jula,Johan G. Eriksson,Olli T. Raitakari,Olli T. Raitakari,Veikko Salomaa,Jaakko Kaprio,Jaakko Kaprio,Marjo-Riitta Järvelin,Leena Peltonen,Markus Perola,Markus Perola,Markus Perola,Nelson B. Freimer,Mika Ala-Korpela,Aarno Palotie,Samuli Ripatti,Samuli Ripatti,Samuli Ripatti +47 more