scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Breaking text-based CAPTCHAs with variable word and character orientation

TL;DR: The obtained very satisfactory results confirm that the proposed approach may be used for development of new security mechanisms to protect users against cyber-criminal activities and Internet threats.
About: This article is published in Pattern Recognition.The article was published on 2015-04-01. It has received 59 citations till now. The article focuses on the topics: CAPTCHA.
Citations
More filters
Proceedings ArticleDOI
21 Mar 2016
TL;DR: A comprehensive study of reCaptcha is conducted, and a novel low-cost attack that leverages deep learning technologies for the semantic annotation of images is designed, which is extremely effective and applies to the Facebook image captcha.
Abstract: Since their inception, captchas have been widely used for preventing fraudsters from performing illicit actions. Nevertheless, economic incentives have resulted in an arms race, where fraudsters develop automated solvers and, in turn, captcha services tweak their design to break the solvers. Recent work, however, presented a generic attack that can be applied to any text-based captcha scheme. Fittingly, Google recently unveiled the latest version of reCaptcha. The goal of their new system is twofold, to minimize the effort for legitimate users, while requiring tasks that are more challenging to computers than text recognition. ReCaptcha is driven by an "advanced risk analysis system" that evaluates requests and selects the difficulty of the captcha that will be returned. Users may be required to click in a checkbox, or solve a challenge by identifying images with similar content. In this paper, we conduct a comprehensive study of reCaptcha, and explore how the risk analysis process is influenced by each aspect of the request. Through extensive experimentation, we identify flaws that allow adversaries to effortlessly influence the risk analysis, bypass restrictions, and deploy large-scale attacks. Subsequently, we design a novel low-cost attack that leverages deep learning technologies for the semantic annotation of images. Our system is extremely effective, automatically solving 70.78% of the image reCaptcha challenges, while requiring only 19 seconds per challenge. We also apply our attack to the Facebook image captcha and achieve an accuracy of 83.5%. Based on our experimental findings, we propose a series of safeguards and modifications for impacting the scalability and accuracy of our attacks. Overall, while our study focuses on reCaptcha, our findings have wide implications, as the semantic information conveyed via images is increasingly within the realm of automated reasoning, the future of captchas relies on the exploration of novel directions.

119 citations


Cites background from "Breaking text-based CAPTCHAs with v..."

  • ...Various attacks have been demonstrated against previous text versions of reCaptcha [32]–[34]....

    [...]

Posted Content
TL;DR: In this paper, a method for using deep neural networks to amortize the cost of inference in models from the family induced by universal probabilistic programming languages is introduced, establishing a framework that combines the strengths of Probabilistic Programming and deep learning methods.
Abstract: We introduce a method for using deep neural networks to amortize the cost of inference in models from the family induced by universal probabilistic programming languages, establishing a framework that combines the strengths of probabilistic programming and deep learning methods. We call what we do "compilation of inference" because our method transforms a denotational specification of an inference problem in the form of a probabilistic program written in a universal programming language into a trained neural network denoted in a neural network specification language. When at test time this neural network is fed observational data and executed, it performs approximate inference in the original model specified by the probabilistic program. Our training objective and learning procedure are designed to allow the trained neural network to be used as a proposal distribution in a sequential importance sampling inference engine. We illustrate our method on mixture models and Captcha solving and show significant speedups in the efficiency of inference.

87 citations

Journal ArticleDOI
Mengyun Tang1, Haichang Gao1, Yang Zhang1, Yi Liu1, Ping Zhang1, Ping Wang1 
TL;DR: A novel image-based Captcha named Style Area Captcha (SACaptcha) is proposed in this paper, which is based on semantic information understanding, pixel-level segmentation, and deep learning techniques and it is hoped that this proposal shows promise in the development of image- based Captchas usingDeep learning techniques.
Abstract: The ability of hackers to infiltrate computer systems using computer attack programs and bots led to the development of Captchas or Completely Automated Public Turing Tests to Tell Computers and Humans Apart. The text Captcha is the most popular Captcha scheme given its ease of construction and user friendliness. However, the next generation of hackers and programmers has decreased the expected security of these mechanisms, leaving websites open to attack. Text Captchas are still widely used, because it is believed that the attack speeds are slow, typically two to five seconds per image, and this is not seen as a critical threat. In this paper, we introduce a simple, generic, and fast attack on text Captchas that effectively challenges that supposition. With deep learning techniques, our attack demonstrates a high success rate in breaking the Roman-character-based text Captchas deployed by the top 50 most popular international websites and three Chinese Captchas that use a larger character set. These targeted schemes cover almost all existing resistance mechanisms, demonstrating that our attack techniques are also applicable to other existing Captchas. Does this work then spell the beginning of the end for text-based Captcha? We believe so. A novel image-based Captcha named Style Area Captcha (SACaptcha) is proposed in this paper, which is based on semantic information understanding, pixel-level segmentation, and deep learning techniques. Having demonstrated that text Captchas are no longer secure, we hope that our proposal shows promise in the development of image-based Captchas using deep learning techniques.

71 citations


Cites background from "Breaking text-based CAPTCHAs with v..."

  • ...and [25]–[27], of which [25]–[27] reported various attacks on previous text versions of reCAPTCHA, whereas [6] presented...

    [...]

Proceedings ArticleDOI
01 May 2017
TL;DR: A formal connection is drawn between using synthetic training data to optimize neural network parameters and approximate, Bayesian, model-based reasoning and successful breaking of real-world Captchas currently used by Facebook and Wikipedia is demonstrated.
Abstract: We draw a formal connection between using synthetic training data to optimize neural network parameters and approximate, Bayesian, model-based reasoning. In particular, training a neural network using synthetic data can be viewed as learning a proposal distribution generator for approximate inference in the synthetic-data generative model. We demonstrate this connection in a recognition task where we develop a novel Captcha-breaking architecture and train it using synthetic data, demonstrating both state-of-the-art performance and a way of computing task-specific posterior uncertainty. Using a neural network trained this way, we also demonstrate successful breaking of real-world Captchas currently used by Facebook and Wikipedia. Reasoning from these empirical results and drawing connections with Bayesian modeling, we discuss the robustness of synthetic data results and suggest important considerations for ensuring good neural network generalization when training with synthetic data.

57 citations


Cites methods from "Breaking text-based CAPTCHAs with v..."

  • ...We wrote synthetic data generative models for seven different Captcha styles, covering the types frequently found in the Captcha breaking literature [18, 17, 20, 19]....

    [...]

Posted Content
TL;DR: In this article, the authors draw a formal connection between using synthetic training data to optimize neural network parameters and approximate, Bayesian, model-based reasoning and demonstrate successful breaking of real-world Captchas currently used by Facebook and Wikipedia.
Abstract: We draw a formal connection between using synthetic training data to optimize neural network parameters and approximate, Bayesian, model-based reasoning. In particular, training a neural network using synthetic data can be viewed as learning a proposal distribution generator for approximate inference in the synthetic-data generative model. We demonstrate this connection in a recognition task where we develop a novel Captcha-breaking architecture and train it using synthetic data, demonstrating both state-of-the-art performance and a way of computing task-specific posterior uncertainty. Using a neural network trained this way, we also demonstrate successful breaking of real-world Captchas currently used by Facebook and Wikipedia. Reasoning from these empirical results and drawing connections with Bayesian modeling, we discuss the robustness of synthetic data results and suggest important considerations for ensuring good neural network generalization when training with synthetic data.

51 citations

References
More filters
Proceedings ArticleDOI
17 Oct 2011
TL;DR: It is found that 13 current visual CAPTCHAs based on distorted characters that are augmented with anti-segmentation techniques from popular web sites are vulnerable to automated attacks.
Abstract: We carry out a systematic study of existing visual CAPTCHAs based on distorted characters that are augmented with anti-segmentation techniques. Applying a systematic evaluation methodology to 15 current CAPTCHA schemes from popular web sites, we find that 13 are vulnerable to automated attacks. Based on this evaluation, we identify a series of recommendations for CAPTCHA designers and attackers, and possible future directions for producing more reliable human/computer distinguishers.

312 citations

Proceedings ArticleDOI
13 Apr 2010
TL;DR: This paper shows that this new CAPTCHA scheme deployed until very recently by Megaupload can be segmented using a simple but new automated attack with a success rate of 78%.
Abstract: CAPTCHA is a standard security technology that presents tests to tell computers and humans apart. In this paper, we examine the security of a new CAPTCHA that was deployed until very recently by Megaupload, a leading online storage and delivery website. The security of this scheme relies on a novel segmentation resistance mechanism. However, we show that this CAPTCHA can be segmented using a simple but new automated attack with a success rate of 78%. It takes about 120 ms on average to segment each challenge on a standard desktop computer.

72 citations

Proceedings ArticleDOI
06 Dec 2010
TL;DR: This paper reports the first comprehensive study on e-banking CAPTCHAs deployed around the world, and proposes a new set of image processing and pattern recognition techniques that can be used to break all e-Banks CAPTCHA schemes that are found over the Internet.
Abstract: Many financial institutions have deployed CAPTCHAs to protect their services (e.g., e-banking) from automated attacks. In addition to CAPTCHAs for login, CAPTCHAs are also used to prevent malicious manipulation of e-banking transactions by automated Man-in-the-Middle (MitM) attackers. Despite serious financial risks, security of e-banking CAPTCHAs is largely unexplored. In this paper, we report the first comprehensive study on e-banking CAPTCHAs deployed around the world. A new set of image processing and pattern recognition techniques is proposed to break all e-banking CAPTCHA schemes that we found over the Internet, including three e-banking CAPTCHA schemes for transaction verification and 41 schemes for login. These broken e-banking CAPTCHA schemes are used by thousands of financial institutions worldwide, which are serving hundreds of millions of e-banking customers. The success rate of our proposed attacks are either equal to or close to 100%. We also discuss possible improvements to these e-banking CAPTCHA schemes and show essential difficulties of designing e-banking CAPTCHAs that are both secure and usable.

65 citations

Journal ArticleDOI
TL;DR: The proposed algorithm generates a face image-based CAPTCHA that offers better human accuracy and lower machine attack rates compared to existing approaches.

64 citations

Proceedings ArticleDOI
26 Apr 2014
TL;DR: This paper describes how two new CAPTCHA schemes for Google that focus on maximizing usability are designed and tested, and how the resulting scheme is now an integral part of the production system and is served to millions of users.
Abstract: Websites present users with puzzles called CAPTCHAs to curb abuse caused by computer algorithms masquerading as people. While CAPTCHAs are generally effective at stopping abuse, they might impair website usability if they are not properly designed. In this paper we describe how we designed two new CAPTCHA schemes for Google that focus on maximizing usability. We began by running an evaluation on Amazon Mechanical Turk with over 27,000 respondents to test the usability of different feature combinations. Then we studied user preferences using Google's consumer survey infrastructure. Finally, drawing on the insights gleaned during those studies, we tested our new captcha schemes first on Mechanical Turk and then on a fraction of production traffic. The resulting scheme is now an integral part of our production system and is served to millions of users. Our scheme achieved a 95.3% human accuracy, a 6.7.

54 citations