Genome-wide association analysis by lasso penalized logistic regression
TLDR
The performance of lasso penalized logistic regression in case-control disease gene mapping with a large number of SNPs (single nucleotide polymorphisms) predictors is evaluated and coeliac disease results replicate the previous SNP results and shed light on possible interactions among the SNPs.Abstract:
Motivation: In ordinary regression, imposition of a lasso penalty makes continuous model selection straightforward. Lasso penalized regression is particularly advantageous when the number of predictors far exceeds the number of observations.
Method: The present article evaluates the performance of lasso penalized logistic regression in case–control disease gene mapping with a large number of SNPs (single nucleotide polymorphisms) predictors. The strength of the lasso penalty can be tuned to select a predetermined number of the most relevant SNPs and other predictors. For a given value of the tuning constant, the penalized likelihood is quickly maximized by cyclic coordinate ascent. Once the most potent marginal predictors are identified, their two-way and higher order interactions can also be examined by lasso penalized logistic regression.
Results: This strategy is tested on both simulated and real data. Our findings on coeliac disease replicate the previous SNP results and shed light on possible interactions among the SNPs.
Availability: The software discussed is available in Mendel 9.0 at the UCLA Human Genetics web site.
Contact: klange@ucla.edu
Supplementary information: Supplementary data are available at Bioinformatics online.read more
Citations
More filters
Journal ArticleDOI
Regularization Paths for Generalized Linear Models via Coordinate Descent
TL;DR: In comparative timings, the new algorithms are considerably faster than competing methods and can handle large problems and can also deal efficiently with sparse features.
BookDOI
Statistical Learning with Sparsity: The Lasso and Generalizations
TL;DR: Statistical Learning with Sparsity: The Lasso and Generalizations presents methods that exploit sparsity to help recover the underlying signal in a set of data and extract useful and reproducible patterns from big datasets.
Journal ArticleDOI
Power and Predictive Accuracy of Polygenic Risk Scores
TL;DR: It is shown that published studies with significant association of polygenic scores have been well powered, whereas those with negative results can be explained by low sample size, and that useful levels of prediction may only be approached when predictors are estimated from very large samples.
Journal ArticleDOI
Analysing biological pathways in genome-wide association studies
TL;DR: The development of pathway-based approaches for GWA studies are reviewed, their practical use and caveats are discussed, and it is suggested that pathway- based approaches may also be useful for future GWA study data sets with sequencing data.
Journal ArticleDOI
Polygenic modeling with bayesian sparse linear mixed models.
TL;DR: This work applies Bayesian sparse linear mixed model (BSLMM) and compares it with other methods for two polygenic modeling applications: estimating the proportion of variance in phenotypes explained (PVE) by available genotypes, and phenotype (or breeding value) prediction, and demonstrates that BSLMM considerably outperforms either of the other two methods.
References
More filters
Journal ArticleDOI
LASSO-Patternsearch algorithm with application to ophthalmology and genomic data
TL;DR: The LASSO-Patternsearch algorithm was applied to data from a generative model of Rheumatoid Arthritis based on Problem 3 from the Genetic Analysis Workshop 15, successfully demonstrating its potential to efficiently recover higher order patterns from attribute vectors of length typical of genomic studies.
Journal ArticleDOI
Statistical advances and challenges for analyzing correlated high dimensional SNP data in genomic study for complex diseases
Yulan Liang,Arpad Kelemen +1 more
TL;DR: A review of recent statistical advances and challenges for analyzing correlated high dimensional SNP data in genomic association studies for complex diseases is presented in this paper, which includes both general feature reduction approaches for high dimensional correlated data and more specific approaches for SNPs data, which include unsupervised haplotype mapping, tag SNP selection, and supervised SNPs selection using statistical testing/scoring, statistical modeling and machine learning methods with an emphasis on how to identify interacting loci.
Journal ArticleDOI
Detecting disease-causing genes by LASSO-Patternsearch algorithm.
TL;DR: In this article the original LASSO-Patternsearch algorithm is modified to handle the large number of SNPs plus covariates, and most of the associated SNPs and relevant covariates are identified.
Journal ArticleDOI
Penalized estimation of haplotype frequencies
Kristin L. Ayers,Kenneth Lange +1 more
TL;DR: A diversity penalty that automatically discards potential haplotypes with low explanatory power is introduced, and the new minorize-maximize (MM) algorithm is a useful substitute for the EM algorithm.
Journal ArticleDOI
Model selection based on logistic regression in a highly correlated candidate gene region
Hae-Won Uh,Bart Mertens,Henk Jan van der Wijk,Hein Putter,Hans C. van Houwelingen,Jeanine J Houwing-Duistermaat +5 more
TL;DR: The aim is to develop methods for identifying a (causal) variant or variants from a dense panel of single-nucleotide polymorphisms (SNPs) that are genotyped on the evidence of previous studies and concludes that the newly developed Bayesian selection method performs well.
Related Papers (5)
Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties
Jianqing Fan,Runze Li +1 more