Privacy Risks from Genomic Data-Sharing Beacons
TLDR
The results show that beacons can disclose membership and implied phenotypic information about participants and do not protect privacy a priori and discuss risk mitigation through policies and standards such as not allowing anonymous pings of genetic beacons and requiring minimum beacon sizes.Abstract:
The human genetics community needs robust protocols that enable secure sharing of genomic data from participants in genetic research. Beacons are web servers that answer allele-presence queries—such as “Do you have a genome that has a specific nucleotide (e.g., A) at a specific genomic position (e.g., position 11,272 on chromosome 1)?”—with either “yes” or “no.” Here, we show that individuals in a beacon are susceptible to re-identification even if the only data shared include presence or absence information about alleles in a beacon. Specifically, we propose a likelihood-ratio test of whether a given individual is present in a given genetic beacon. Our test is not dependent on allele frequencies and is the most powerful test for a specified false-positive rate. Through simulations, we showed that in a beacon with 1,000 individuals, re-identification is possible with just 5,000 queries. Relatives can also be identified in the beacon. Re-identification is possible even in the presence of sequencing errors and variant-calling differences. In a beacon constructed with 65 European individuals from the 1000 Genomes Project, we demonstrated that it is possible to detect membership in the beacon with just 250 SNPs. With just 1,000 SNP queries, we were able to detect the presence of an individual genome from the Personal Genome Project in an existing beacon. Our results show that beacons can disclose membership and implied phenotypic information about participants and do not protect privacy a priori. We discuss risk mitigation through policies and standards such as not allowing anonymous pings of genetic beacons and requiring minimum beacon sizes.read more
Citations
More filters
Proceedings ArticleDOI
Privacy Risk in Machine Learning: Analyzing the Connection to Overfitting
TL;DR: The effect that overfitting and influence have on the ability of an attacker to learn information about the training data from machine learning models, either through training set membership inference or attribute inference attacks is examined.
Posted Content
Privacy Risk in Machine Learning: Analyzing the Connection to Overfitting
TL;DR: This article examined the effect of overfitting and influence on the ability of an attacker to learn information about the training data from machine learning models, either through training set membership inference or attribute inference attacks.
Journal ArticleDOI
A federated ecosystem for sharing genomic, clinical data
Angela Page,David Baker,Martin Bobrow,Kym M. Boycott,John Burn,S. J. Chanock,S Donnelly,Edward S. Dove,Richard Durbin,Stephanie O.M. Dyke,Marc Fiume,Paul Flicek,David Glazer,Peter Goodhand,David Haussler,Kazuto Kato,Stephen Keenan,Bartha Maria Knoppers,R Liao,David Lloyd,Nicola Mulder,Arcadi Navarro,Kathryn N. North,Anthony A. Philippakis,Nazneen Rahman,Heidi L. Rehm,Charles L. Sawyers,Adrian Thorogood,James F. Wilson,David Altshuler,Thomas J. Hudson +30 more
TL;DR: This data-sharing effort has led to improved variant interpretation and development of treatments for rare diseases and some cancer types, but such benefits will only be available to the general population if researchers and clinicians can access and make comparisons across data from millions of individuals.
Journal ArticleDOI
Secure genome-wide association analysis using multiparty computation.
TL;DR: This work describes a protocol for large-scale genome-wide analysis that facilitates quality control and population stratification correction in 9K, 13K, and 23K individuals while maintaining the confidentiality of underlying genotypes and phenotypes and shows it could feasibly scale to a million individuals.
Journal ArticleDOI
Deriving genomic diagnoses without revealing patient genomes.
TL;DR: A solution that combines a protocol from modern cryptography with frequency-based clinical genetics used to diagnose causal disease mutations in patients with monogenic disorders is described, which correctly identified the causal gene in cases involving actual patients, while protecting more than 99% of individual participants' most private variants.
References
More filters
Journal ArticleDOI
An integrated map of genetic variation from 1,092 human genomes
Gonçalo R. Abecasis,Adam Auton,Lisa D. Brooks,Mark A. DePristo,Richard Durbin,Robert E. Handsaker,Robert E. Handsaker,Hyun Min Kang,Gabor T. Marth,Gil McVean +9 more
TL;DR: It is shown that evolutionary conservation and coding consequence are key determinants of the strength of purifying selection, that rare-variant load varies substantially across biological pathways, and that each individual contains hundreds of rare non-coding variants at conserved sites, such as motif-disrupting changes in transcription-factor-binding sites.
Journal ArticleDOI
Meta-analysis of the heritability of human traits based on fifty years of twin studies
Tinca J. C. Polderman,Beben Benyamin,Christiaan de Leeuw,Patrick F. Sullivan,Arjen van Bochoven,Peter M. Visscher,Danielle Posthuma +6 more
TL;DR: This study provides the most comprehensive analysis of the causes of individual differences in human traits thus far and will guide future gene-mapping efforts.
Journal ArticleDOI
Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays.
Nils Homer,Nils Homer,Szabolcs Szelinger,Margot Redman,David Duggan,Waibhav Tembe,Jill Muehling,John V. Pearson,Dietrich A. Stephan,Stanley F. Nelson,David Craig +10 more
TL;DR: High-density single nucleotide polymorphism genotyping microarrays are used to demonstrate the ability to accurately and robustly determine whether individuals are in a complex genomic DNA mixture, and suggest future research efforts into assessing the viability of previously sub-optimal DNA sources due to sample contamination.
Journal ArticleDOI
Whole-genome sequence variation, population structure and demographic history of the Dutch population
Laurent C. Francioli,Androniki Menelaou,Sara L. Pulit,Freerk van Dijk,Pier Francesco Palamara,Clara C. Elbers,Pieter B. Neerincx,Kai Ye,Kai Ye,Victor Guryev,Wigard P. Kloosterman,Patrick Deelen,Abdel Abdellaoui,Elisabeth M. van Leeuwen,Mannis van Oven,Martijn Vermaat,Mingkun Li,Jeroen F. J. Laros,Lennart C. Karssen,Alexandros Kanterakis,Najaf Amin,Jouke-Jan Hottenga,Eric-Wubbo Lameijer,Mathijs Kattenberg,Martijn Dijkstra,Heorhiy Byelas,Jessica van Setten,Barbera D. C. van Schaik,Jan Bot,Isaac J. Nijman,Ivo Renkens,Tobias Marschall,Alexander Schönhuth,Jayne Y. Hehir-Kwa,Robert E. Handsaker,Robert E. Handsaker,Paz Polak,Mashaal Sohail,Mashaal Sohail,Dana Vuzman,Fereydoun Hormozdiari,David van Enckevort,Hailiang Mei,Vyacheslav Koval,Matthijs Moed,K. Joeri van der Velde,Fernando Rivadeneira,Fernando Rivadeneira,Fernando Rivadeneira,Karol Estrada,Carolina Medina-Gomez,Aaron Isaacs,Aaron Isaacs,Steven A. McCarroll,Marian Beekman,Anton J. M. de Craen,H. Eka D. Suchiman,Albert Hofman,Ben A. Oostra,André G. Uitterlinden,Gonneke Willemsen,Mathieu Platteel,Jan H. Veldink,Leonard H. van den Berg,Steven J. Pitts,Shobha Potluri,Purnima Sundar,David R. Cox,David R. Cox,Shamil R. Sunyaev,Johan T. den Dunnen,Mark Stoneking,Peter de Knijff,Manfred Kayser,Qibin Li,Yingrui Li,Yuanping Du,Ruoyan Chen,Hongzhi Cao,Ning Li,Sujie Cao,Jun Wang,Jasper A. Bovenberg,Itsik Pe'er,P. Eline Slagboom,Cornelia M. van Duijn,Dorret I. Boomsma,Gert-Jan B. van Ommen,Paul I.W. de Bakker,Paul I.W. de Bakker,Morris A. Swertz,Cisca Wijmenga +91 more
TL;DR: The Genome of the Netherlands (GoNL) Project is described, in which the whole genomes of 250 Dutch parent-offspring families were sequenced and a haplotype map of 20.4 million single-nucleotide variants and 1.2 million insertions and deletions were constructed.
Journal ArticleDOI
Routes for breaching and protecting genetic privacy.
Yaniv Erlich,Arvind Narayanan +1 more
TL;DR: An overview of genetic privacy breaching strategies is presented, outlining the principles of each technique, the underlying assumptions, and their technological complexity and maturation, as well as highlighting different cases that are relevant to genetic applications.