scispace - formally typeset
Proceedings ArticleDOI

I am Robot: (Deep) Learning to Break Semantic Image CAPTCHAs

Reads0
Chats0
TLDR
A comprehensive study of reCaptcha is conducted, and a novel low-cost attack that leverages deep learning technologies for the semantic annotation of images is designed, which is extremely effective and applies to the Facebook image captcha.
Abstract
Since their inception, captchas have been widely used for preventing fraudsters from performing illicit actions. Nevertheless, economic incentives have resulted in an arms race, where fraudsters develop automated solvers and, in turn, captcha services tweak their design to break the solvers. Recent work, however, presented a generic attack that can be applied to any text-based captcha scheme. Fittingly, Google recently unveiled the latest version of reCaptcha. The goal of their new system is twofold, to minimize the effort for legitimate users, while requiring tasks that are more challenging to computers than text recognition. ReCaptcha is driven by an "advanced risk analysis system" that evaluates requests and selects the difficulty of the captcha that will be returned. Users may be required to click in a checkbox, or solve a challenge by identifying images with similar content. In this paper, we conduct a comprehensive study of reCaptcha, and explore how the risk analysis process is influenced by each aspect of the request. Through extensive experimentation, we identify flaws that allow adversaries to effortlessly influence the risk analysis, bypass restrictions, and deploy large-scale attacks. Subsequently, we design a novel low-cost attack that leverages deep learning technologies for the semantic annotation of images. Our system is extremely effective, automatically solving 70.78% of the image reCaptcha challenges, while requiring only 19 seconds per challenge. We also apply our attack to the Facebook image captcha and achieve an accuracy of 83.5%. Based on our experimental findings, we propose a series of safeguards and modifications for impacting the scalability and accuracy of our attacks. Overall, while our study focuses on reCaptcha, our findings have wide implications, as the semantic information conveyed via images is increasingly within the realm of automated reasoning, the future of captchas relies on the exploration of novel directions.

read more

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI

Chameleon: A Hybrid Secure Computation Framework for Machine Learning Applications

TL;DR: Chameleon as mentioned in this paper is a hybrid mixed protocol for secure function evaluation (SFE) which enables two parties to jointly compute a function without disclosing their private inputs, but does not support signed fixed-point numbers.
Posted Content

Chameleon: A Hybrid Secure Computation Framework for Machine Learning Applications

TL;DR: Chameleon combines the best aspects of generic SFE protocols with the ones that are based upon additive secret sharing, and improves the efficiency of mining and classification of encrypted data for algorithms based upon heavy matrix multiplications.
Posted Content

Defeating Image Obfuscation with Deep Learning

TL;DR: It is shown how to train artificial neural networks to successfully identify faces and recognize objects and handwritten digits even if the images are protected using any of the above obfuscation techniques.
Journal ArticleDOI

No Bot Expects the DeepCAPTCHA! Introducing Immutable Adversarial Examples, With Applications to CAPTCHA Generation

TL;DR: This paper introduces DeepCAPTCHA, a new and secure CAPTCHA scheme based on adversarial examples, an inherit limitation of the current DL networks, and implements a proof of concept system, which shows that the scheme offers high security and good usability compared with the best previously existing CAPTCHAs.
Proceedings ArticleDOI

Yet Another Text Captcha Solver: A Generative Adversarial Network Based Approach

TL;DR: This paper presents a generic, yet effective text captcha solver based on the generative adversarial network and demonstrates that the attack is generally applicable and can bypass the advanced security features employed by most modern text captcha schemes.
References
More filters
Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Posted Content

Efficient Estimation of Word Representations in Vector Space

TL;DR: This paper proposed two novel model architectures for computing continuous vector representations of words from very large data sets, and the quality of these representations is measured in a word similarity task and the results are compared to the previously best performing techniques based on different types of neural networks.
Posted Content

Caffe: Convolutional Architecture for Fast Feature Embedding

TL;DR: Caffe as discussed by the authors is a BSD-licensed C++ library with Python and MATLAB bindings for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures.
Proceedings ArticleDOI

Caffe: Convolutional Architecture for Fast Feature Embedding

TL;DR: Caffe provides multimedia scientists and practitioners with a clean and modifiable framework for state-of-the-art deep learning algorithms and a collection of reference models for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures.
Proceedings Article

Intriguing properties of neural networks

TL;DR: It is found that there is no distinction between individual highlevel units and random linear combinations of high level units, according to various methods of unit analysis, and it is suggested that it is the space, rather than the individual units, that contains of the semantic information in the high layers of neural networks.