PLINK: A Tool Set for Whole-Genome Association and Population-Based Linkage Analyses
Shaun Purcell,Shaun Purcell,Benjamin M. Neale,Benjamin M. Neale,Kathe Todd-Brown,Lori Thomas,Manuel A. R. Ferreira,David Bender,David Bender,Julian Maller,Julian Maller,Pamela Sklar,Pamela Sklar,Paul I.W. de Bakker,Paul I.W. de Bakker,Mark J. Daly,Mark J. Daly,Pak C. Sham +17 more
Reads0
Chats0
TLDR
This work introduces PLINK, an open-source C/C++ WGAS tool set, and describes the five main domains of function: data management, summary statistics, population stratification, association analysis, and identity-by-descent estimation, which focuses on the estimation and use of identity- by-state and identity/descent information in the context of population-based whole-genome studies.Abstract:
Whole-genome association studies (WGAS) bring new computational, as well as analytic, challenges to researchers. Many existing genetic-analysis tools are not designed to handle such large data sets in a convenient manner and do not necessarily exploit the new opportunities that whole-genome data bring. To address these issues, we developed PLINK, an open-source C/C++ WGAS tool set. With PLINK, large data sets comprising hundreds of thousands of markers genotyped for thousands of individuals can be rapidly manipulated and analyzed in their entirety. As well as providing tools to make the basic analytic steps computationally efficient, PLINK also supports some novel approaches to whole-genome data that take advantage of whole-genome coverage. We introduce PLINK and describe the five main domains of function: data management, summary statistics, population stratification, association analysis, and identity-by-descent estimation. In particular, we focus on the estimation and use of identity-by-state and identity-by-descent information in the context of population-based whole-genome studies. This information can be used to detect and correct for population stratification and to identify extended chromosomal segments that are shared identical by descent between very distantly related individuals. Analysis of the patterns of segmental sharing has the potential to map disease loci that contain multiple rare variants in a population-based linkage analysis.read more
Citations
More filters
Journal ArticleDOI
A genome-wide association study of COPD identifies a susceptibility locus on chromosome 19q13
Michael H. Cho,Peter J. Castaldi,Emily S. Wan,Mateusz Siedlinski,Craig P. Hersh,Dawn L. DeMeo,Blanca E. Himes,Jody S. Sylvia,Barbara J. Klanderman,John Ziniti,Christoph Lange,Augusto A. Litonjua,David Sparrow,David Sparrow,Elizabeth A. Regan,Barry J. Make,John E. Hokanson,Tanda Murray,Jacqueline B. Hetmanski,Sreekumar G. Pillai,Xiangyang Kong,Wayne H. Anderson,Ruth Tal-Singer,David A. Lomas,Harvey O. Coxson,Lisa D. Edwards,William MacNee,Jørgen Vestbo,Julie C. Yates,Alvar Agusti,Peter M.A. Calverley,Bartolome R. Celli,Courtney Crim,Stephen I. Rennard,Emiel F.M. Wouters,Per Bakke,Amund Gulsvik,James D. Crapo,Terri H. Beaty,Edwin K. Silverman +39 more
TL;DR: A new genome-wide significant locus on chromosome 19q13, which includes RAB4B, EGLN2, MIA and CYP2A6, and has previously been identified in association with cigarette smoking behavior is identified.
Journal ArticleDOI
Predictive Accuracy of a Polygenic Risk Score Compared With a Clinical Risk Score for Incident Coronary Heart Disease.
Jonathan D. Mosley,Jonathan D. Mosley,Deepak K. Gupta,Jingyi Tan,Jie Yao,Quinn S. Wells,Quinn S. Wells,Christian M. Shaffer,Suman Kundu,Cassianne Robinson-Cohen,Bruce M. Psaty,Stephen S. Rich,Wendy S. Post,Xiuqing Guo,Jerome I. Rotter,Dan M. Roden,Dan M. Roden,Robert E. Gerszten,Thomas J. Wang +18 more
TL;DR: It is suggested that a polygenic risk score may not enhance risk prediction in a general, white middle-aged population.
Journal ArticleDOI
An application of Random Forests to a genome-wide association dataset: Methodological considerations & new findings
TL;DR: This study presents one of the first illustrations of successfully analyzing GWA data with a machine learning algorithm, and it is shown that RF is computationally feasible for G WA data and the results obtained make biologic sense based on previous studies.
Journal ArticleDOI
Genetic variants and risk of lung cancer in never smokers: a genome-wide association study
Yafei Li,Chau-Chyun Sheu,Yuanqing Ye,Mariza de Andrade,Liang Wang,Shen Chih Chang,Marie Christine Aubry,Jeremiah A. Aakre,Mark S. Allen,Feng Chen,Julie M. Cunningham,Claude Deschamps,Ruoxiang Jiang,Jie Lin,Randolph S. Marks,V. Shane Pankratz,Li Su,Yan Li,Zhifu Sun,Hui Tang,George Vasmatzis,Curtis C. Harris,Margaret R. Spitz,Jin Jen,Renyi Wang,Zuo-Feng Zhang,David C. Christiani,Xifeng Wu,Ping Yang +28 more
TL;DR: Genetic variants at 13q31.3 alter the expression of GPC5, and are associated with susceptibility to lung cancer in never smokers, and a cis eQTL analysis showed there was a strong correlation between genotypes of the replicated SNPs and the transcription level of the gene GPC 5 in normal lung tissues.
Journal ArticleDOI
Dissection of the genetics of Parkinson's disease identifies an additional association 5′ of SNCA and multiple associated haplotypes at 17q21
Chris C. A. Spencer,Vincent Plagnol,Amy Strange,Michelle Gardner,Coro Paisán-Ruiz,Gavin Band,Roger A. Barker,Céline Bellenguez,Kailash P. Bhatia,Hannah Blackburn,Jennie M. Blackwell,Jennie M. Blackwell,Elvira Bramon,Martin A. Brown,Matthew A. Brown,David J. Burn,Juan-Pablo Casas,Juan-Pablo Casas,Patrick F. Chinnery,Carl E Clarke,Aiden Corvin,Nicholas John Craddock,Panos Deloukas,Sarah Edkins,J.M. Evans,Colin Freeman,Emma Gray,John Hardy,Gavin Hudson,Sarah E. Hunt,Janusz Jankowski,Cordelia Langford,Andrew J. Lees,Hugh S. Markus,Christopher G. Mathew,Mark I. McCarthy,Karen E. Morrison,Colin N. A. Palmer,J. P. Pearson,Leena Peltonen,Matti Pirinen,Robert Plomin,Simon C. Potter,Anna Rautanen,Stephen Sawcer,Zhan Su,Richard C. Trembath,Ananth C. Viswanathan,Ananth C. Viswanathan,Nigel W. Williams,Huw R. Morris,Peter Donnelly,Nicholas W. Wood +52 more
TL;DR: A genome-wide association study in 1705 Parkinson's disease UK patients and 5175 UK controls, the largest sample size so far for a PD GWAS, found weak but consistent evidence of association for common variants located in three previously published associated regions.
References
More filters
Journal ArticleDOI
Controlling the false discovery rate: a practical and powerful approach to multiple testing
Yoav Benjamini,Yosef Hochberg +1 more
TL;DR: In this paper, a different approach to problems of multiple significance testing is presented, which calls for controlling the expected proportion of falsely rejected hypotheses -the false discovery rate, which is equivalent to the FWER when all hypotheses are true but is smaller otherwise.
Journal ArticleDOI
Inference of population structure using multilocus genotype data
TL;DR: Pritch et al. as discussed by the authors proposed a model-based clustering method for using multilocus genotype data to infer population structure and assign individuals to populations, which can be applied to most of the commonly used genetic markers, provided that they are not closely linked.
Book
Statistical methods for rates and proportions
TL;DR: In this paper, the basic theory of Maximum Likelihood Estimation (MLE) is used to detect a difference between two different proportions of a given proportion in a single proportion.
Journal ArticleDOI
Haploview: analysis and visualization of LD and haplotype maps
TL;DR: Haploview is a software package that provides computation of linkage disequilibrium statistics and population haplotype patterns from primary genotype data in a visually appealing and interactive interface.
Journal ArticleDOI
Categorical Data Analysis
TL;DR: In this article, categorical data analysis was used for categorical classification of categorical categorical datasets.Categorical Data Analysis, categorical Data analysis, CDA, CPDA, CDSA