scispace - formally typeset
Open AccessJournal ArticleDOI

Privacy Risks from Genomic Data-Sharing Beacons

Suyash Shringarpure, +1 more
- 05 Nov 2015 - 
- Vol. 97, Iss: 5, pp 631-646
TLDR
The results show that beacons can disclose membership and implied phenotypic information about participants and do not protect privacy a priori and discuss risk mitigation through policies and standards such as not allowing anonymous pings of genetic beacons and requiring minimum beacon sizes.
Abstract
The human genetics community needs robust protocols that enable secure sharing of genomic data from participants in genetic research. Beacons are web servers that answer allele-presence queries—such as “Do you have a genome that has a specific nucleotide (e.g., A) at a specific genomic position (e.g., position 11,272 on chromosome 1)?”—with either “yes” or “no.” Here, we show that individuals in a beacon are susceptible to re-identification even if the only data shared include presence or absence information about alleles in a beacon. Specifically, we propose a likelihood-ratio test of whether a given individual is present in a given genetic beacon. Our test is not dependent on allele frequencies and is the most powerful test for a specified false-positive rate. Through simulations, we showed that in a beacon with 1,000 individuals, re-identification is possible with just 5,000 queries. Relatives can also be identified in the beacon. Re-identification is possible even in the presence of sequencing errors and variant-calling differences. In a beacon constructed with 65 European individuals from the 1000 Genomes Project, we demonstrated that it is possible to detect membership in the beacon with just 250 SNPs. With just 1,000 SNP queries, we were able to detect the presence of an individual genome from the Personal Genome Project in an existing beacon. Our results show that beacons can disclose membership and implied phenotypic information about participants and do not protect privacy a priori. We discuss risk mitigation through policies and standards such as not allowing anonymous pings of genetic beacons and requiring minimum beacon sizes.

read more

Citations
More filters
Proceedings ArticleDOI

Privacy Risk in Machine Learning: Analyzing the Connection to Overfitting

TL;DR: The effect that overfitting and influence have on the ability of an attacker to learn information about the training data from machine learning models, either through training set membership inference or attribute inference attacks is examined.
Posted Content

Privacy Risk in Machine Learning: Analyzing the Connection to Overfitting

TL;DR: This article examined the effect of overfitting and influence on the ability of an attacker to learn information about the training data from machine learning models, either through training set membership inference or attribute inference attacks.
Journal ArticleDOI

A federated ecosystem for sharing genomic, clinical data

TL;DR: This data-sharing effort has led to improved variant interpretation and development of treatments for rare diseases and some cancer types, but such benefits will only be available to the general population if researchers and clinicians can access and make comparisons across data from millions of individuals.
Journal ArticleDOI

Secure genome-wide association analysis using multiparty computation.

TL;DR: This work describes a protocol for large-scale genome-wide analysis that facilitates quality control and population stratification correction in 9K, 13K, and 23K individuals while maintaining the confidentiality of underlying genotypes and phenotypes and shows it could feasibly scale to a million individuals.
Journal ArticleDOI

Deriving genomic diagnoses without revealing patient genomes.

TL;DR: A solution that combines a protocol from modern cryptography with frequency-based clinical genetics used to diagnose causal disease mutations in patients with monogenic disorders is described, which correctly identified the causal gene in cases involving actual patients, while protecting more than 99% of individual participants' most private variants.
References
More filters
Journal ArticleDOI

An integrated map of genetic variation from 1,092 human genomes

TL;DR: It is shown that evolutionary conservation and coding consequence are key determinants of the strength of purifying selection, that rare-variant load varies substantially across biological pathways, and that each individual contains hundreds of rare non-coding variants at conserved sites, such as motif-disrupting changes in transcription-factor-binding sites.
Journal ArticleDOI

Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays.

TL;DR: High-density single nucleotide polymorphism genotyping microarrays are used to demonstrate the ability to accurately and robustly determine whether individuals are in a complex genomic DNA mixture, and suggest future research efforts into assessing the viability of previously sub-optimal DNA sources due to sample contamination.
Journal ArticleDOI

Whole-genome sequence variation, population structure and demographic history of the Dutch population

Laurent C. Francioli, +91 more
- 01 Jun 2014 - 
TL;DR: The Genome of the Netherlands (GoNL) Project is described, in which the whole genomes of 250 Dutch parent-offspring families were sequenced and a haplotype map of 20.4 million single-nucleotide variants and 1.2 million insertions and deletions were constructed.
Journal ArticleDOI

Routes for breaching and protecting genetic privacy.

TL;DR: An overview of genetic privacy breaching strategies is presented, outlining the principles of each technique, the underlying assumptions, and their technological complexity and maturation, as well as highlighting different cases that are relevant to genetic applications.
Related Papers (5)