scispace - formally typeset
Open AccessJournal ArticleDOI

Fast and Scalable Private Genotype Imputation Using Machine Learning and Partially Homomorphic Encryption

Reads0
Chats0
TLDR
In this article, the Paillier cryptosystem was used for genotype imputation using Machine Learning (ML) and a standardized homomorphic encryption scheme, achieving up to 99% micro area under curve score on real-world large-scale datasets up to 80,000 targets.
Abstract
The recent advances in genome sequencing technologies provide unprecedented opportunities to understand the relationship between human genetic variation and diseases. However, genotyping whole genomes from a large cohort of individuals is still cost prohibitive. Imputation methods to predict genotypes of missing genetic variants are widely used, especially for genome-wide association studies. Accurate genotype imputation requires complex statistical methods. Due to the data and computing-intensive nature of the problem, imputation is increasingly outsourced, raising serious privacy concerns. In this work, we investigate solutions for fast, scalable, and accurate privacy-preserving genotype imputation using Machine Learning (ML) and a standardized homomorphic encryption scheme, Paillier cryptosystem. ML-based privacy-preserving inference has been largely optimized for computation-heavy non-linear functions in a single-output multi-class classification setting. However, having a large number of multi-class outputs per genome per individual calls for further optimizations and/or approximations specific to this application. Here we explore the effectiveness of linear models for genotype imputation to convert them to privacy-preserving equivalents using standardized homomorphic encryption schemes. Our results show that performance of our privacy-preserving genotype imputation method is equivalent to the state-of-the-art plaintext solutions, achieving up to 99% micro area under curve score, even on real-world large-scale datasets up to 80,000 targets.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

Functional genomics data: privacy risk assessment and technological mitigation.

TL;DR: In this article, the authors highlight privacy issues related to the sharing of functional genomics data, including genotype and phenotype information leakage, and present potential solutions for mitigating privacy risks while allowing broad data dissemination and analysis.
Journal ArticleDOI

Storing and analyzing a genome on a blockchain

TL;DR: In this paper , a private blockchain network is developed to store genomic variants and reference-aligned reads on-chain, which uses nested database indexing with an accompanying tool suite to rapidly access and analyze the data.
Journal ArticleDOI

Privacy-preserving cancer type prediction with homomorphic encryption

TL;DR: In this paper , the authors explore the challenges of privacy preserving cancer type prediction using a dataset consisting of more than 2 million genetic mutations from 2713 patients for several cancer types by building a highly accurate ML model and then implementing its privacy preserving version in HE.
Journal ArticleDOI

Privacy-preserving artificial intelligence in healthcare: Techniques and applications

TL;DR: In this article , the state-of-the-art approaches for preserving privacy in AI-based healthcare applications are summarized along with potential privacy attacks, security challenges, and future directions.
Journal ArticleDOI

Secure and Trustworthy Artificial Intelligence-Extended Reality (AI-XR) for Metaverses

TL;DR: This work designed a metaverse-specific case study and analyzed it through the adversarial lens to highlight the real implications of AI-associated adversarial threats, and presents a taxonomy of potential solutions that could be leveraged to develop secure, private, robust, and trustworthy AI-XR applications.
References
More filters
Journal ArticleDOI

Deep learning

TL;DR: Deep learning is making major advances in solving problems that have resisted the best attempts of the artificial intelligence community for many years, and will have many more successes in the near future because it requires very little engineering by hand and can easily take advantage of increases in the amount of available computation and data.
Journal ArticleDOI

A global reference for human genetic variation.

Adam Auton, +517 more
- 01 Oct 2015 - 
TL;DR: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations, and has reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-generation sequencing, deep exome sequencing, and dense microarray genotyping.
Book ChapterDOI

Public-key cryptosystems based on composite degree residuosity classes

TL;DR: A new trapdoor mechanism is proposed and three encryption schemes are derived : a trapdoor permutation and two homomorphic probabilistic encryption schemes computationally comparable to RSA, which are provably secure under appropriate assumptions in the standard model.
Journal ArticleDOI

The Genotype-Tissue Expression (GTEx) project

John T. Lonsdale, +129 more
- 29 May 2013 - 
TL;DR: The Genotype-Tissue Expression (GTEx) project is described, which will establish a resource database and associated tissue bank for the scientific community to study the relationship between genetic variation and gene expression in human tissues.
Journal ArticleDOI

The use of the area under the ROC curve in the evaluation of machine learning algorithms

TL;DR: AUC exhibits a number of desirable properties when compared to overall accuracy: increased sensitivity in Analysis of Variance (ANOVA) tests; a standard error that decreased as both AUC and the number of test samples increased; decision threshold independent; and it is invariant to a priori class probabilities.
Related Papers (5)