Fast Numerical Optimization for Genome Sequencing Data in Population Biobanks.
Ruilin Li,Christopher C. Chang,Yosuke Tanigawa,Balasubramanian Narasimhan,Trevor Hastie,Robert Tibshirani,Manuel A. Rivas +6 more
Reads0
Chats0
TLDR
In this article, Ravi et al. developed two efficient solvers for optimization problems arising from large-scale regularized regressions on millions of genetic variants sequenced from hundreds of thousands of individuals.Abstract:
Motivation Large-scale and high-dimensional genome sequencing data poses computational challenges. General purpose optimization tools are usually not optimal in terms of computational and memory performance for genetic data. Results We develop two efficient solvers for optimization problems arising from large-scale regularized regressions on millions of genetic variants sequenced from hundreds of thousands of individuals. These genetic variants are encoded by the values in the set {0, 1, 2, NA}. We take advantage of this fact and use two bits to represent each entry in a genetic matrix, which reduces memory requirement by a factor of 32 compared to a double precision floating point representation. Using this representation, we implemented an iteratively reweighted least square algorithm to solve Lasso regressions on genetic matrices, which we name snpnet-2.0. When the dataset contains many rare variants, the predictors can be encoded in a sparse matrix. We utilize the sparsity in the predictor matrix to further reduce memory requirement and computational speed. Our sparse genetic matrix implementation uses both the compact 2-bit representation and a simplified version of compressed sparse block format so that matrix-vector multiplications can be effectively parallelized on multiple CPU cores. To demonstrate the effectiveness of this representation, we implement an accelerated proximal gradient method to solve group Lasso on these sparse genetic matrices. This solver is named sparse-snpnet, and will also be included as part of snpnet R package. Our implementation is able to solve Lasso and group Lasso, linear, logistic and Cox regression problems on sparse genetic matrices that contain 1,000,000 variants and almost 100,000 individuals within 10 minutes and using less than 32GB of memory. Availability https://github.com/rivas-lab/snpnet/tree/compact.read more
Citations
More filters
Journal ArticleDOI
Significant sparse polygenic risk scores across 813 traits in UK Biobank
Yosuke Tanigawa,Junyang Qian,Guhan Venkataraman,Johanne Marie Justesen,Ruilin Li,Robert Tibshirani,Trevor Hastie,Manuel A. Rivas +7 more
TL;DR: The sparse PRS model trained on European individuals showed limited transferability when evaluated on non-European individuals in the UK Biobank.
Journal ArticleDOI
A genome sequencing system for universal newborn screening, diagnosis, and precision medicine for severe genetic diseases
Stephen F. Kingsmore,Laurie D. Smith,Chris M Kunard,Matthew N. Bainbridge,Serge Batalov,Wendy Benson,Eric Blincow,Sara A. Caylor,Christina D. Chambers,Guillermo del Angel,David Dimmock,Yan-li Ding,Katarzyna A. Ellsworth,Annette Feigenbaum,Erwin Frise,Robert C. Green,Lucia Guidugli,Kevin Hall,Christian Hansen,Charlotte A. Hobbs,Scott Kahn,Mark William Kiel,Lucita Van Der Kraan,Chad Krilow,Yong-Hyun Kwon,L. Rao Madhavrao,Jennie M. Le,Sébastien Lefebvre,Rebecca Mardach,William R. Mowrey,Danny Oh,Mallory J Owen,George S. Powley,Gunter Scharer,Seth Shelnutt,Mari Tokita,Shyamal S Mehtalia,Albert Oriol,Stavros Papadopoulos,James Perry,Edwin Rosales,Erica Sanford,Steve Schwartz,Du Tran,Martin G. Reese,Meredith Wright,Narayanan Veeraraghavan,Kristen Wigby,Mary Jo Willis,Aaron R. Wolen,Thomas Defay +50 more
TL;DR: In this article , the authors describe prototypic methods for scalable, parentally consented, feedback-informed NBS and diagnosis of genetic diseases by rWGS and virtual, acute management guidance (NBS-rWGS).
Posted ContentDOI
Significant Sparse Polygenic Risk Scores across 428 traits in UK Biobank
Yosuke Tanigawa,Yosuke Tanigawa,Junyang Qian,Guhan Venkataraman,Johanne Marie Justesen,Ruilin Li,Robert Tibshirani,Trevor Hastie,Manuel A. Rivas +8 more
TL;DR: In this article, a systematic assessment of polygenic risk score (PRS) prediction across more than 1,600 traits using genetic and phenotype data in the UK Biobank is presented.
Journal ArticleDOI
Construction and validation of prognostic prediction established on N6-methyladenosine related genes in cervical squamous cell carcinoma
TL;DR: Wang et al. as mentioned in this paper used the Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO) data to construct and validate prognostic prediction established on m6A-related genes in cervical cancer.
Journal ArticleDOI
Enhancing genomic mutation data storage optimization based on the compression of asymmetry of sparsity
TL;DR: In this paper , a compression algorithm for sparse asymmetric gene mutations (CA_SAGM) based on the characteristics of sparse genomic mutation data was proposed, and the data were first sorted on a row-first basis so that neighboring non-zero elements were as close as possible to each other.
References
More filters
Journal ArticleDOI
Regularization and variable selection via the elastic net
Hui Zou,Trevor Hastie +1 more
TL;DR: It is shown that the elastic net often outperforms the lasso, while enjoying a similar sparsity of representation, and an algorithm called LARS‐EN is proposed for computing elastic net regularization paths efficiently, much like algorithm LARS does for the lamba.
Journal ArticleDOI
Regularization Paths for Generalized Linear Models via Coordinate Descent
TL;DR: In comparative timings, the new algorithms are considerably faster than competing methods and can handle large problems and can also deal efficiently with sparse features.
Journal ArticleDOI
A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems
Amir Beck,Marc Teboulle +1 more
TL;DR: A new fast iterative shrinkage-thresholding algorithm (FISTA) which preserves the computational simplicity of ISTA but with a global rate of convergence which is proven to be significantly better, both theoretically and practically.
Journal ArticleDOI
Model selection and estimation in regression with grouped variables
Ming Yuan,Yi Lin +1 more
TL;DR: In this paper, instead of selecting factors by stepwise backward elimination, the authors focus on the accuracy of estimation and consider extensions of the lasso, the LARS algorithm and the non-negative garrotte for factor selection.
Journal ArticleDOI
Second-generation PLINK: rising to the challenge of larger and richer datasets
Christopher C. Chang,Carson C. Chow,Laurent C. A. M. Tellier,Shashaank Vattikuti,Shaun Purcell,James J. Lee +5 more
TL;DR: The second-generation versions of PLINK will offer dramatic improvements in performance and compatibility, and for the first time, users without access to high-end computing resources can perform several essential analyses of the feature-rich and very large genetic datasets coming into use.