Showing papers by "Emiliano De Cristofaro published in 2019"

PDF

Open Access

Proceedings Article•DOI•

Exploiting Unintended Feature Leakage in Collaborative Learning

[...]

Luca Melis¹, Congzheng Song², Emiliano De Cristofaro¹, Vitaly Shmatikov²•Institutions (2)

University College London¹, Cornell University²

19 May 2019

TL;DR: In this article, passive and active inference attacks are proposed to exploit the leakage of information about participants' training data in federated learning, where each participant can infer the presence of exact data points and properties that hold only for a subset of the training data and are independent of the properties of the joint model.

...read moreread less

Abstract: Collaborative machine learning and related techniques such as federated learning allow multiple participants, each with his own training dataset, to build a joint model by training locally and periodically exchanging model updates. We demonstrate that these updates leak unintended information about participants' training data and develop passive and active inference attacks to exploit this leakage. First, we show that an adversarial participant can infer the presence of exact data points -- for example, specific locations -- in others' training data (i.e., membership inference). Then, we show how this adversary can infer properties that hold only for a subset of the training data and are independent of the properties that the joint model aims to capture. For example, he can infer when a specific person first appears in the photos used to train a binary gender classifier. We evaluate our attacks on a variety of tasks, datasets, and learning configurations, analyze their limitations, and discuss possible defenses.

...read moreread less

1,084 citations

Journal Article•DOI•

LOGAN: Membership Inference Attacks Against Generative Models

[...]

Jamie Hayes¹, Luca Melis¹, George Danezis¹, Emiliano De Cristofaro¹•Institutions (1)

University College London¹

01 Jan 2019

TL;DR: In this paper, membership inference attacks against generative models are presented, where given a data point, the adversary determines whether or not it was used to train the model, and the attacks leverage Generative Adversarial Networks (GANs), which combine a discriminative and a generative model, to detect overfitting and recognize inputs that were part of training datasets, using the discriminator's capacity to learn statistical differences in distributions.

...read moreread less

Abstract: Generative models estimate the underlying distribution of a dataset to generate realistic samples according to that distribution. In this paper, we present the first membership inference attacks against generative models: given a data point, the adversary determines whether or not it was used to train the model. Our attacks leverage Generative Adversarial Networks (GANs), which combine a discriminative and a generative model, to detect overfitting and recognize inputs that were part of training datasets, using the discriminator's capacity to learn statistical differences in distributions. We present attacks based on both white-box and black-box access to the target model, against several state-of-the-art generative models, over datasets of complex representations of faces (LFW), objects (CIFAR-10), and medical images (Diabetic Retinopathy). We also discuss the sensitivity of the attacks to different training parameters, and their robustness against mitigation strategies, finding that defenses are either ineffective or lead to significantly worse performances of the generative models in terms of training stability and/or sample quality.

...read moreread less

266 citations

Journal Article•DOI•

MaMaDroid: Detecting Android Malware by Building Markov Chains of Behavioral Models (Extended Version)

[...]

Lucky Onwuzurike¹, Enrico Mariconti¹, Panagiotis Andriotis², Emiliano De Cristofaro¹, Gordon J. Ross¹, Gianluca Stringhini³ - Show less +2 more•Institutions (3)

University College London¹, University of the West of England², Boston University³

09 Apr 2019

TL;DR: MAMADROID as discussed by the authors is a static analysis-based system that abstracts API calls to their class, package, or family, and builds a model from their sequences obtained from the call graph of an app as Markov chains.

...read moreread less

Abstract: As Android has become increasingly popular, so has malware targeting it, thus motivating the research community to propose different detection techniques. However, the constant evolution of the Android ecosystem, and of malware itself, makes it hard to design robust tools that can operate for long periods of time without the need for modifications or costly re-training. Aiming to address this issue, we set to detect malware from a behavioral point of view, modeled as the sequence of abstracted API calls. We introduce MAMADROID, a static-analysis-based system that abstracts app’s API calls to their class, package, or family, and builds a model from their sequences obtained from the call graph of an app as Markov chains. This ensures that the model is more resilient to API changes and the features set is of manageable size. We evaluate MAMADROID using a dataset of 8.5K benign and 35.5K malicious apps collected over a period of 6 years, showing that it effectively detects malware (with up to 0.99 F-measure) and keeps its detection capabilities for long periods of time (up to 0.87 F-measure 2 years after training). We also show that MAMADROID remarkably overperforms DROIDAPIMINER, a state-of-the-art detection system that relies on the frequency of (raw) API calls. Aiming to assess whether MAMADROID’s effectiveness mainly stems from the API abstraction or from the sequencing modeling, we also evaluate a variant of it that uses frequency (instead of sequences), of abstracted API calls. We find that it is not as accurate, failing to capture maliciousness when trained on malware samples that include API calls that are equally or more frequently used by benign apps.

...read moreread less

216 citations

Proceedings Article•DOI•

Disinformation Warfare: Understanding State-Sponsored Trolls on Twitter and Their Influence on the Web

[...]

Savvas Zannettou¹, Tristan Caulfield², Emiliano De Cristofaro², Michael Sirivianos¹, Gianluca Stringhini³, Jeremy Blackburn⁴ - Show less +2 more•Institutions (4)

Cyprus University of Technology¹, University College London², Boston University³, University of Alabama at Birmingham⁴

13 May 2019

TL;DR: The authors analyzed 27k tweets posted by 1K Twitter users identified as having ties with Russia's Internet Research Agency and thus likely state-sponsored trolls, and found that Russian trolls managed to stay active for long periods of time and to reach a substantial number of Twitter users with their tweets.

...read moreread less

Abstract: Over the past couple of years, anecdotal evidence has emerged linking coordinated campaigns by state-sponsored actors with efforts to manipulate public opinion on the Web, often around major political events, through dedicated accounts, or “trolls.” Although they are often involved in spreading disinformation on social media, there is little understanding of how these trolls operate, what type of content they disseminate, and most importantly their influence on the information ecosystem. In this paper, we shed light on these questions by analyzing 27K tweets posted by 1K Twitter users identified as having ties with Russia’s Internet Research Agency and thus likely state-sponsored trolls. We compare their behavior to a random set of Twitter users, finding interesting differences in terms of the content they disseminate, the evolution of their account, as well as their general behavior and use of Twitter. Then, using Hawkes Processes, we quantify the influence that trolls had on the dissemination of news on social platforms like Twitter, Reddit, and 4chan. Overall, our findings indicate that Russian trolls managed to stay active for long periods of time and to reach a substantial number of Twitter users with their tweets. When looking at their ability of spreading news content and making it viral, however, we find that their effect on social platforms was minor, with the significant exception of news published by the Russian state-sponsored news outlet RT (Russia Today).

...read moreread less

119 citations

Journal Article•DOI•

Detecting Cyberbullying and Cyberaggression in Social Media

[...]

Despoina Chatzakou, Ilias Leontiadis¹, Jeremy Blackburn², Emiliano De Cristofaro³, Gianluca Stringhini⁴, Athena Vakali⁵, Nicolas Kourtellis⁶ - Show less +3 more•Institutions (6)

Samsung¹, Binghamton University², University College London³, Boston University⁴, Aristotle University of Thessaloniki⁵, Telefónica⁶

14 Oct 2019-ACM Transactions on The Web

TL;DR: This work presents a robust methodology to distinguish bullies and aggressors from normal Twitter users by considering text, user, and network-based attributes, and discusses the current status of Twitter user accounts marked as abusive by the methodology and the performance of potential mechanisms that can be used by Twitter to suspend users in the future.

...read moreread less

Abstract: Cyberbullying and cyberaggression are increasingly worrisome phenomena affecting people across all demographics. More than half of young social media users worldwide have been exposed to such prolonged and/or coordinated digital harassment. Victims can experience a wide range of emotions, with negative consequences such as embarrassment, depression, isolation from other community members, which embed the risk to lead to even more critical consequences, such as suicide attempts.In this work, we take the first concrete steps to understand the characteristics of abusive behavior in Twitter, one of today’s largest social media platforms. We analyze 1.2 million users and 2.1 million tweets, comparing users participating in discussions around seemingly normal topics like the NBA, to those more likely to be hate-related, such as the Gamergate controversy, or the gender pay inequality at the BBC station. We also explore specific manifestations of abusive behavior, i.e., cyberbullying and cyberaggression, in one of the hate-related communities (Gamergate). We present a robust methodology to distinguish bullies and aggressors from normal Twitter users by considering text, user, and network-based attributes. Using various state-of-the-art machine-learning algorithms, we classify these accounts with over 90% accuracy and AUC. Finally, we discuss the current status of Twitter user accounts marked as abusive by our methodology and study the performance of potential mechanisms that can be used by Twitter to suspend users in the future.

...read moreread less

80 citations

Journal Article•DOI•

"You Know What to Do": Proactive Detection of YouTube Videos Targeted by Coordinated Hate Attacks

[...]

Enrico Mariconti¹, Guillermo Suarez-Tangil², Jeremy Blackburn³, Emiliano De Cristofaro¹, Nicolas Kourtellis⁴, Ilias Leontiadis⁵, Jordi Luque Serrano⁴, Gianluca Stringhini⁶ - Show less +4 more•Institutions (6)

University College London¹, King's College London², University of Alabama at Birmingham³, Telefónica⁴, Samsung⁵, Boston University⁶

07 Nov 2019

TL;DR: This paper proposes an automated solution to identify YouTube videos that are likely to be targeted by coordinated harassers from fringe communities like 4chan, and uses an ensemble of classifiers to determine the likelihood that a video will be raided with very good results.

...read moreread less

Abstract: Video sharing platforms like YouTube are increasingly targeted by aggression and hate attacks. Prior work has shown how these attacks often take place as a result of "raids," i.e., organized efforts by ad-hoc mobs coordinating from third-party communities. Despite the increasing relevance of this phenomenon, however, online services often lack effective countermeasures to mitigate it. Unlike well-studied problems like spam and phishing, coordinated aggressive behavior both targets and is perpetrated by humans, making defense mechanisms that look for automated activity unsuitable. Therefore, the de-facto solution is to reactively rely on user reports and human moderation. In this paper, we propose an automated solution to identify YouTube videos that are likely to be targeted by coordinated harassers from fringe communities like 4chan. First, we characterize and model YouTube videos along several axes (metadata, audio transcripts, thumbnails) based on a ground truth dataset of videos that were targeted by raids. Then, we use an ensemble of classifiers to determine the likelihood that a video will be raided with very good results (AUC up to 94%). Overall, our work provides an important first step towards deploying proactive systems to detect and mitigate coordinated hate attacks on platforms like YouTube.

...read moreread less

77 citations

Journal Article•DOI•

Differentially Private Mixture of Generative Neural Networks

[...]

Gergely Acs, Luca Melis¹, Claude Castelluccia², Emiliano De Cristofaro¹•Institutions (2)

University College London¹, French Institute for Research in Computer Science and Automation²

01 Jun 2019-IEEE Transactions on Knowledge and Data Engineering

TL;DR: This paper presents a novel technique for privately releasing generative models and entire high-dimensional datasets produced by these models, and evaluates it using the MNIST dataset, showing that it produces realistic synthetic samples, which can also be used to accurately compute arbitrary number of counting queries.

...read moreread less

Abstract: Generative models are used in a wide range of applications building on large amounts of contextually rich information. Due to possible privacy violations of the individuals whose data is used to train these models, however, publishing or sharing generative models is not always viable. In this paper, we present a novel technique for privately releasing generative models and entire high-dimensional datasets produced by these models. We model the generator distribution of the training data with a mixture of $k$k generative neural networks. These are trained together and collectively learn the generator distribution of a dataset. Data is divided into $k$k clusters, using a novel differentially private kernel $k$k-means, then each cluster is given to separate generative neural networks, such as Restricted Boltzmann Machines or Variational Autoencoders, which are trained only on their own cluster using differentially private gradient descent. We evaluate our approach using the MNIST dataset, as well as call detail records and transit datasets, showing that it produces realistic synthetic samples, which can also be used to accurately compute arbitrary number of counting queries.

...read moreread less

65 citations

Proceedings Article•DOI•

Challenges in the Decentralised Web: The Mastodon Case

[...]

Aravindh Raman¹, Sagar Joglekar¹, Emiliano De Cristofaro², Nishanth Sastry¹, Gareth Tyson³ - Show less +1 more•Institutions (3)

King's College London¹, University College London², Queen Mary University of London³

21 Oct 2019

TL;DR: This paper focuses on identifying key challenges that might disrupt continuing efforts to decentralise the web, and empirically highlight a number of properties that are creating natural pressures towards re-centralisation.

...read moreread less

Abstract: The Decentralised Web (DW) has recently seen a renewed momentum, with a number of DW platforms like Mastodon, PeerTube, and Hubzilla gaining increasing traction. These offer alternatives to traditional social networks like Twitter, YouTube, and Facebook, by enabling the operation of web infrastructure and services without centralised ownership or control. Although their services differ greatly, modern DW platforms mostly rely on two key innovations: first, their open source software allows anybody to setup independent servers ("instances") that people can sign-up to and use within a local community; and second, they build on top of federation protocols so that instances can mesh together, in a peer-to-peer fashion, to offer a globally integrated platform.In this paper, we present a measurement-driven exploration of these two innovations, using a popular DW microblogging platform (Mastodon) as a case study. We focus on identifying key challenges that might disrupt continuing efforts to decentralise the web, and empirically highlight a number of properties that are creating natural pressures towards re-centralisation. Finally, our measurements shed light on the behaviour of both administrators (i.e., people setting up instances) and regular users who sign-up to the platforms, also discussing a few techniques that may address some of the issues observed.

...read moreread less

36 citations

Posted Content•

Challenges in the Decentralised Web: The Mastodon Case.

[...]

Aravindh Raman¹, Sagar Joglekar¹, Emiliano De Cristofaro¹, Nishanth Sastry², Gareth Tyson³ - Show less +1 more•Institutions (3)

King's College London¹, University College London², Queen Mary University of London³

12 Sep 2019-arXiv: Networking and Internet Architecture

TL;DR: In this article, the authors present a measurement-driven exploration of these two innovations, using a popular Decentralized Web microblogging platform (Mastodon) as a case study, identifying key challenges that might disrupt continuing efforts to decentralise the web, and empirically highlighting a number of properties that are creating natural pressures towards recentralisation.

...read moreread less

Abstract: The Decentralised Web (DW) has recently seen a renewed momentum, with a number of DW platforms like Mastodon, Peer-Tube, and Hubzilla gaining increasing traction. These offer alternatives to traditional social networks like Twitter, YouTube, and Facebook, by enabling the operation of web infrastructure and services without centralised ownership or control. Although their services differ greatly, modern DW platforms mostly rely on two key innovations: first, their open source software allows anybody to setup independent servers ("instances") that people can sign-up to and use within a local community; and second, they build on top of federation protocols so that instances can mesh together, in a peer-to-peer fashion, to offer a globally integrated platform. In this paper, we present a measurement-driven exploration of these two innovations, using a popular DW microblogging platform (Mastodon) as a case study. We focus on identifying key challenges that might disrupt continuing efforts to decentralise the web, and empirically highlight a number of properties that are creating natural pressures towards recentralisation. Finally, our measurements shed light on the behaviour of both administrators (i.e., people setting up instances) and regular users who sign-up to the platforms, also discussing a few techniques that may address some of the issues observed.

...read moreread less

36 citations

Posted Content•

"And We Will Fight For Our Race!" A Measurement Study of Genetic Testing Conversations on Reddit and 4chan

[...]

Alexandros Mittos¹, Savvas Zannettou¹, Jeremy Blackburn², Emiliano De Cristofaro³•Institutions (3)

University College London¹, Cyprus University of Technology², Binghamton University³

28 Jan 2019-arXiv: Computers and Society

TL;DR: This paper presents a measurement study shedding light on how genetic testing is being discussed on Web communities in Reddit and 4chan, using NLP and computer vision tools to identify trends, themes, and topics of discussion.

...read moreread less

Abstract: Progress in genomics has enabled the emergence of a booming market for "direct-to-consumer" genetic testing. Nowadays, companies like 23andMe and AncestryDNA provide affordable health, genealogy, and ancestry reports, and have already tested tens of millions of customers. At the same time, alt- and far-right groups have also taken an interest in genetic testing, using them to attack minorities and prove their genetic "purity." In this paper, we present a measurement study shedding light on how genetic testing is being discussed on Web communities in Reddit and 4chan. We collect 1.3M comments posted over 27 months on the two platforms, using a set of 280 keywords related to genetic testing. We then use NLP and computer vision tools to identify trends, themes, and topics of discussion. Our analysis shows that genetic testing attracts a lot of attention on Reddit and 4chan, with discussions often including highly toxic language expressed through hateful, racist, and misogynistic comments. In particular, on 4chan's politically incorrect board (/pol/), content from genetic testing conversations involves several alt-right personalities and openly antisemitic rhetoric, often conveyed through memes. Finally, we find that discussions build around user groups, from technology enthusiasts to communities promoting fringe political views.

...read moreread less

29 citations

Journal Article•DOI•

Systematizing Genome Privacy Research: A Privacy-Enhancing Technologies Perspective

[...]

Alexandros Mittos¹, Bradley A. Malin², Emiliano De Cristofaro¹•Institutions (2)

University College London¹, Vanderbilt University²

01 Jan 2019

TL;DR: In this article, the current knowledge on privacy-enhancing technologies used for testing, storing, and sharing genomic data, using a representative sample of the work published in the past decade.

...read moreread less

Abstract: Rapid advances in human genomics are enabling researchers to gain a better understanding of the role of the genome in our health and well-being, stimulating hope for more effective and cost efficient healthcare. However, this also prompts a number of security and privacy concerns stemming from the distinctive characteristics of genomic data. To address them, a new research community has emerged and produced a large number of publications and initiatives. In this paper, we rely on a structured methodology to contextualize and provide a critical analysis of the current knowledge on privacy-enhancing technologies used for testing, storing, and sharing genomic data, using a representative sample of the work published in the past decade. We identify and discuss limitations, technical challenges, and issues faced by the community, focusing in particular on those that are inherently tied to the nature of the problem and are harder for the community alone to address. Finally, we report on the importance and difficulty of the identified challenges based on an online survey of genome data privacy experts

...read moreread less

Posted Content•

Characterizing the Use of Images in State-Sponsored Information Warfare Operations by Russian Trolls on Twitter

[...]

Savvas Zannettou¹, Tristan Caulfield², Barry Bradlyn³, Emiliano De Cristofaro², Gianluca Stringhini⁴, Jeremy Blackburn⁵ - Show less +2 more•Institutions (5)

Max Planck Society¹, University College London², University of Illinois at Urbana–Champaign³, Boston University⁴, Binghamton University⁵

17 Jan 2019-arXiv: Social and Information Networks

TL;DR: The first study of images shared by state-sponsored accounts by analyzing a ground truth dataset of 1.8M images posted to Twitter by accounts controlled by the Russian Internet Research Agency shows that the trolls were more effective in disseminating politics-related imagery than other images.

...read moreread less

Abstract: State-sponsored organizations are increasingly linked to efforts aimed to exploit social media for information warfare and manipulating public opinion. Typically, their activities rely on a number of social network accounts they control, aka trolls, that post and interact with other users disguised as "regular" users. These accounts often use images and memes, along with textual content, in order to increase the engagement and the credibility of their posts. In this paper, we present the first study of images shared by state-sponsored accounts by analyzing a ground truth dataset of 1.8M images posted to Twitter by accounts controlled by the Russian Internet Research Agency. First, we analyze the content of the images as well as their posting activity. Then, using Hawkes Processes, we quantify their influence on popular Web communities like Twitter, Reddit, 4chan's Politically Incorrect board (/pol/), and Gab, with respect to the dissemination of images. We find that the extensive image posting activity of Russian trolls coincides with real-world events (e.g., the Unite the Right rally in Charlottesville), and shed light on their targets as well as the content disseminated via images. Finally, we show that the trolls were more effective in disseminating politics-related imagery than other images.

...read moreread less

Posted Content•

Under the Hood of Membership Inference Attacks on Aggregate Location Time-Series.

[...]

Apostolos Pyrgelis, Carmela Troncoso, Emiliano De Cristofaro

20 Feb 2019-arXiv: Cryptography and Security

TL;DR: It is shown that, while there is no silver bullet that enables arbitrary analysis, there are defenses that provide reasonable utility for particular tasks while reducing the extent of the inference.

...read moreread less

Abstract: Aggregate location statistics are used in a number of mobility analytics to express how many people are in a certain location at a given time (but not who). However, prior work has shown that an adversary with some prior knowledge of a victim's mobility patterns can mount membership inference attacks to determine whether or not that user contributed to the aggregates. In this paper, we set to understand why such inferences are successful and what can be done to mitigate them. We conduct an in-depth feature analysis, finding that the volume of data contributed and the regularity and particularity of mobility patterns play a crucial role in the attack. We then use these insights to adapt defenses proposed in the location privacy literature to the aggregate setting, and evaluate their privacy-utility trade-offs for common mobility analytics. We show that, while there is no silver bullet that enables arbitrary analysis, there are defenses that provide reasonable utility for particular tasks while reducing the extent of the inference.

...read moreread less

Posted Content•

Measuring Membership Privacy on Aggregate Location Time-Series.

[...]

Apostolos Pyrgelis¹, Carmela Troncoso¹, Emiliano De Cristofaro²•Institutions (2)

École Polytechnique Fédérale de Lausanne¹, University College London²

20 Feb 2019-arXiv: Cryptography and Security

TL;DR: Measurements show that there does not exist a unique generic defense that can preserve the utility of the analytics for arbitrary applications, and provide useful insights regarding the disclosure of sanitized aggregate location time-series.

...read moreread less

Abstract: While location data is extremely valuable for various applications, disclosing it prompts serious threats to individuals' privacy. To limit such concerns, organizations often provide analysts with aggregate time-series that indicate, e.g., how many people are in a location at a time interval, rather than raw individual traces. In this paper, we perform a measurement study to understand Membership Inference Attacks (MIAs) on aggregate location time-series, where an adversary tries to infer whether a specific user contributed to the aggregates. We find that the volume of contributed data, as well as the regularity and particularity of users' mobility patterns, play a crucial role in the attack's success. We experiment with a wide range of defenses based on generalization, hiding, and perturbation, and evaluate their ability to thwart the attack vis-a-vis the utility loss they introduce for various mobility analytics tasks. Our results show that some defenses fail across the board, while others work for specific tasks on aggregate location time-series. For instance, suppressing small counts can be used for ranking hotspots, data generalization for forecasting traffic, hotspot discovery, and map inference, while sampling is effective for location labeling and anomaly detection when the dataset is sparse. Differentially private techniques provide reasonable accuracy only in very specific settings, e.g., discovering hotspots and forecasting their traffic, and more so when using weaker privacy notions like crowd-blending privacy. Overall, our measurements show that there does not exist a unique generic defense that can preserve the utility of the analytics for arbitrary applications, and provide useful insights regarding the disclosure of sanitized aggregate location time-series.

...read moreread less

Proceedings Article•DOI•

Privacy-Preserving Crowd-Sourcing of Web Searches with Private Data Donor

[...]

Vincent Primault¹, Vasileios Lampos¹, Ingemar J. Cox¹, Emiliano De Cristofaro¹•Institutions (1)

University College London¹

13 May 2019

TL;DR: Private Data Donor is presented, a decentralized and private-by-design platform providing crowd-sourced Web searches to researchers, built on a cryptographic protocol for privacy-preserving data aggregation and addressed a few practical challenges to add reliability into the system with regards to users disconnecting or stopping using the platform.

...read moreread less

Abstract: Search engines play an important role on the Web, helping users find relevant resources and answers to their questions. At the same time, search logs can also be of great utility to researchers. For instance, a number of recent research efforts have relied on them to build prediction and inference models, for applications ranging from economics and marketing to public health surveillance. However, companies rarely release search logs, also due to the related privacy issues that ensue, as they are inherently hard to anonymize. As a result, it is very difficult for researchers to have access to search data, and even if they do, they are fully dependent on the company providing them. Aiming to overcome these issues, this paper presents Private Data Donor (PDD), a decentralized and private-by-design platform providing crowd-sourced Web searches to researchers. We build on a cryptographic protocol for privacy-preserving data aggregation, and address a few practical challenges to add reliability into the system with regards to users disconnecting or stopping using the platform. We discuss how PDD can be used to build a flu monitoring model, and evaluate the impact of the privacy-preserving layer on the quality of the results. Finally, we present the implementation of our platform, as a browser extension and a server, and report on a pilot deployment with real users.

...read moreread less

Proceedings Article•DOI•

How Much Does GenoGuard Really "Guard"?: An Empirical Analysis of Long-Term Security for Genomic Data

[...]

Bristena Oprisanu¹, Christophe Dessimoz¹, Emiliano De Cristofaro¹•Institutions (1)

University College London¹

11 Nov 2019

TL;DR: Huang et al. as discussed by the authors analyzed the real-world security guarantees provided by GenoGuard and showed that if the adversary has access to side information in the form of partial information from the target sequence, it does appreciably increase her power in determining the rest of the sequence.

...read moreread less

Abstract: Due to its hereditary nature, genomic data is not only linked to its owner but to that of close relatives as well. As a result, its sensitivity does not really degrade over time; in fact, the relevance of a genomic sequence is likely to be longer than the security provided by encryption. This prompts the need for specialized techniques providing long-term security for genomic data, yet the only available tool for this purpose is GenoGuard~\citehuang_genoguard:_2015. By relying on \em Honey Encryption, GenoGuard is secure against an adversary that can brute force all possible keys; i.e., whenever an attacker tries to decrypt using an incorrect password, she will obtain an incorrect but plausible looking decoy sequence. In this paper, we set to analyze the real-world security guarantees provided by GenoGuard; specifically, assess how much more information does access to a ciphertext encrypted using GenoGuard yield, compared to one that was not. Overall, we find that, if the adversary has access to side information in the form of partial information from the target sequence, the use of GenoGuard does appreciably increase her power in determining the rest of the sequence. We show that, in the case of a sequence encrypted using an easily guessable (low-entropy) password, the adversary is able to rule out most decoy sequences, and obtain the target sequence with just 2.5% of it available as side information. In the case of a harder-to-guess (high-entropy) password, we show that the adversary still obtains, on average, better accuracy in guessing the rest of the target sequences than using state-of-the-art genomic sequence inference methods, obtaining up to 15% improvement in accuracy.

...read moreread less

Journal Article•DOI•

On collaborative predictive blacklisting

[...]

Luca Melis¹, Apostolos Pyrgelis¹, Emiliano De Cristofaro¹•Institutions (1)

University College London¹

28 Jan 2019

TL;DR: In this article, the authors present a measurement study of state-of-the-art collaborative predictive blacklisting (CPB) techniques, aiming to shed light on the actual impact of collaboration.

...read moreread less

Abstract: Collaborative predictive blacklisting (CPB) allows to forecast future attack sources based on logs and alerts contributed by multiple organizations. Unfortunately, however, research on CPB has only focused on increasing the number of predicted attacks but has not considered the impact on false positives and false negatives. Moreover, sharing alerts is often hindered by confidentiality, trust, and liability issues, which motivates the need for privacy-preserving approaches to the problem. In this paper, we present a measurement study of state-of-the-art CPB techniques, aiming to shed light on the actual impact of collaboration. To this end, we reproduce and measure two systems: a non privacy-friendly one that uses a trusted coordinating party with access to all alerts [12] and a peer-to-peer one using privacy-preserving data sharing [8]. We show that, while collaboration boosts the number of predicted attacks, it also yields high false positives, ultimately leading to poor accuracy. This motivates us to present a hybrid approach, using a semi-trusted central entity, aiming to increase utility from collaboration while, at the same time, limiting information disclosure and false positives. This leads to a better trade-off of true and false positive rates, while at the same time addressing privacy concerns.

...read moreread less

Journal Article•DOI•

Fast privacy-preserving network function outsourcing

[...]

Hassan Jameel Asghar¹, Hassan Jameel Asghar², Emiliano De Cristofaro³, Guillaume Jourjon¹, Mohammed Ali Kaafar², Mohammed Ali Kaafar¹, Laurent Mathy⁴, Luca Melis³, Craig Russell¹, Mang Yu¹ - Show less +6 more•Institutions (4)

Commonwealth Scientific and Industrial Research Organisation¹, Macquarie University², University College London³, University of Liège⁴

09 Nov 2019-Computer Networks

TL;DR: The design and implementation of SplitBox are presented, a system for privacy-preserving processing of network functions outsourced to cloud middleboxes—i.e., without revealing the policies governing these functions, while providing provably secure guarantees.

...read moreread less

Posted Content•

How Much Does GenoGuard Really "Guard"? An Empirical Analysis of Long-Term Security for Genomic Data

[...]

Bristena Oprisanu¹, Christophe Dessimoz¹, Emiliano De Cristofaro¹•Institutions (1)

University College London¹

29 Aug 2019-arXiv: Cryptography and Security

TL;DR: Overall, it is found that, if the adversary has access to side information in the form of partial information from the target sequence, the use of GenoGuard does appreciably increase her power in determining the rest of the sequence.

...read moreread less

Abstract: Due to its hereditary nature, genomic data is not only linked to its owner but to that of close relatives as well. As a result, its sensitivity does not really degrade over time; in fact, the relevance of a genomic sequence is likely to be longer than the security provided by encryption. This prompts the need for specialized techniques providing long-term security for genomic data, yet the only available tool for this purpose is GenoGuard (Huang et al., 2015). By relying on Honey Encryption, GenoGuard is secure against an adversary that can brute force all possible keys; i.e., whenever an attacker tries to decrypt using an incorrect password, she will obtain an incorrect but plausible looking decoy sequence. In this paper, we set to analyze the real-world security guarantees provided by GenoGuard; specifically, assess how much more information does access to a ciphertext encrypted using GenoGuard yield, compared to one that was not. Overall, we find that, if the adversary has access to side information in the form of partial information from the target sequence, the use of GenoGuard does appreciably increase her power in determining the rest of the sequence. We show that, in the case of a sequence encrypted using an easily guessable (low-entropy) password, the adversary is able to rule out most decoy sequences, and obtain the target sequence with just 2.5\% of it available as side information. In the case of a harder-to-guess (high-entropy) password, we show that the adversary still obtains, on average, better accuracy in guessing the rest of the target sequences than using state-of-the-art genomic sequence inference methods, obtaining up to 15% improvement in accuracy.

...read moreread less

Posted Content•

Detecting Cyberbullying and Cyberaggression in Social Media

[...]

Despoina Chatzakou, Ilias Leontiadis¹, Jeremy Blackburn², Emiliano De Cristofaro³, Gianluca Stringhini⁴, Athena Vakali⁵, Nicolas Kourtellis⁶ - Show less +3 more•Institutions (6)

Samsung¹, Binghamton University², University College London³, Boston University⁴, Aristotle University of Thessaloniki⁵, Telefónica⁶

20 Jul 2019-arXiv: Social and Information Networks

TL;DR: This paper analyzed 1.2 million users and 2.1 million tweets, comparing users participating in discussions around seemingly normal topics like the NBA, to those more likely to be hate-related, such as the Gamergate controversy, or the gender pay inequality at the BBC station.

...read moreread less

Abstract: Cyberbullying and cyberaggression are increasingly worrisome phenomena affecting people across all demographics. More than half of young social media users worldwide have been exposed to such prolonged and/or coordinated digital harassment. Victims can experience a wide range of emotions, with negative consequences such as embarrassment, depression, isolation from other community members, which embed the risk to lead to even more critical consequences, such as suicide attempts. In this work, we take the first concrete steps to understand the characteristics of abusive behavior in Twitter, one of today's largest social media platforms. We analyze 1.2 million users and 2.1 million tweets, comparing users participating in discussions around seemingly normal topics like the NBA, to those more likely to be hate-related, such as the Gamergate controversy, or the gender pay inequality at the BBC station. We also explore specific manifestations of abusive behavior, i.e., cyberbullying and cyberaggression, in one of the hate-related communities (Gamergate). We present a robust methodology to distinguish bullies and aggressors from normal Twitter users by considering text, user, and network-based attributes. Using various state-of-the-art machine learning algorithms, we classify these accounts with over 90% accuracy and AUC. Finally, we discuss the current status of Twitter user accounts marked as abusive by our methodology, and study the performance of potential mechanisms that can be used by Twitter to suspend users in the future.

...read moreread less