Breaking text-based CAPTCHAs with variable word and character orientation

doi:10.1016/J.PATCOG.2014.09.006

Home
/
Papers
/
Breaking text-based CAPTCHAs with variable word and character orientation

Journal Article•DOI•

Breaking text-based CAPTCHAs with variable word and character orientation

Oleg Starostenko¹, Claudia Cruz-Perez¹, Fernando Uceda-Ponga¹, Vicente Alarcon-Aquino¹•Institutions (1)

Universidad de las Américas Puebla¹

01 Apr 2015-Pattern Recognition (Pergamon)-Vol. 48, Iss: 4, pp 1101-1112

TL;DR: The obtained very satisfactory results confirm that the proposed approach may be used for development of new security mechanisms to protect users against cyber-criminal activities and Internet threats.

read less

About: This article is published in Pattern Recognition.The article was published on 2015-04-01. It has received 59 citations till now. The article focuses on the topics: CAPTCHA.

...read moreread less

Citations

PDF

Open Access

More filters

Proceedings Article•DOI•

I am Robot: (Deep) Learning to Break Semantic Image CAPTCHAs

[...]

Suphannee Sivakorn¹, Iasonas Polakis¹, Angelos D. Keromytis¹•Institutions (1)

Columbia University¹

21 Mar 2016

TL;DR: A comprehensive study of reCaptcha is conducted, and a novel low-cost attack that leverages deep learning technologies for the semantic annotation of images is designed, which is extremely effective and applies to the Facebook image captcha.

...read moreread less

Abstract: Since their inception, captchas have been widely used for preventing fraudsters from performing illicit actions. Nevertheless, economic incentives have resulted in an arms race, where fraudsters develop automated solvers and, in turn, captcha services tweak their design to break the solvers. Recent work, however, presented a generic attack that can be applied to any text-based captcha scheme. Fittingly, Google recently unveiled the latest version of reCaptcha. The goal of their new system is twofold, to minimize the effort for legitimate users, while requiring tasks that are more challenging to computers than text recognition. ReCaptcha is driven by an "advanced risk analysis system" that evaluates requests and selects the difficulty of the captcha that will be returned. Users may be required to click in a checkbox, or solve a challenge by identifying images with similar content. In this paper, we conduct a comprehensive study of reCaptcha, and explore how the risk analysis process is influenced by each aspect of the request. Through extensive experimentation, we identify flaws that allow adversaries to effortlessly influence the risk analysis, bypass restrictions, and deploy large-scale attacks. Subsequently, we design a novel low-cost attack that leverages deep learning technologies for the semantic annotation of images. Our system is extremely effective, automatically solving 70.78% of the image reCaptcha challenges, while requiring only 19 seconds per challenge. We also apply our attack to the Facebook image captcha and achieve an accuracy of 83.5%. Based on our experimental findings, we propose a series of safeguards and modifications for impacting the scalability and accuracy of our attacks. Overall, while our study focuses on reCaptcha, our findings have wide implications, as the semantic information conveyed via images is increasingly within the realm of automated reasoning, the future of captchas relies on the exploration of novel directions.

...read moreread less

119 citations

Cites background from "Breaking text-based CAPTCHAs with v..."

...Various attacks have been demonstrated against previous text versions of reCaptcha [32]–[34]....
[...]

Posted Content•

Inference Compilation and Universal Probabilistic Programming

[...]

Tuan Anh Le¹, Atilim Gunes Baydin¹, Frank Wood¹•Institutions (1)

University of Oxford¹

31 Oct 2016-arXiv: Artificial Intelligence

TL;DR: In this paper, a method for using deep neural networks to amortize the cost of inference in models from the family induced by universal probabilistic programming languages is introduced, establishing a framework that combines the strengths of Probabilistic Programming and deep learning methods.

...read moreread less

Abstract: We introduce a method for using deep neural networks to amortize the cost of inference in models from the family induced by universal probabilistic programming languages, establishing a framework that combines the strengths of probabilistic programming and deep learning methods. We call what we do "compilation of inference" because our method transforms a denotational specification of an inference problem in the form of a probabilistic program written in a universal programming language into a trained neural network denoted in a neural network specification language. When at test time this neural network is fed observational data and executed, it performs approximate inference in the original model specified by the probabilistic program. Our training objective and learning procedure are designed to allow the trained neural network to be used as a proposal distribution in a sequential importance sampling inference engine. We illustrate our method on mixture models and Captcha solving and show significant speedups in the efficiency of inference.

...read moreread less

87 citations

Journal Article•DOI•

Research on Deep Learning Techniques in Breaking Text-Based Captchas and Designing Image-Based Captcha

[...]

Mengyun Tang¹, Haichang Gao¹, Yang Zhang¹, Yi Liu¹, Ping Zhang¹, Ping Wang¹ - Show less +2 more•Institutions (1)

Xidian University¹

29 Mar 2018-IEEE Transactions on Information Forensics and Security

TL;DR: A novel image-based Captcha named Style Area Captcha (SACaptcha) is proposed in this paper, which is based on semantic information understanding, pixel-level segmentation, and deep learning techniques and it is hoped that this proposal shows promise in the development of image- based Captchas usingDeep learning techniques.

...read moreread less

Abstract: The ability of hackers to infiltrate computer systems using computer attack programs and bots led to the development of Captchas or Completely Automated Public Turing Tests to Tell Computers and Humans Apart. The text Captcha is the most popular Captcha scheme given its ease of construction and user friendliness. However, the next generation of hackers and programmers has decreased the expected security of these mechanisms, leaving websites open to attack. Text Captchas are still widely used, because it is believed that the attack speeds are slow, typically two to five seconds per image, and this is not seen as a critical threat. In this paper, we introduce a simple, generic, and fast attack on text Captchas that effectively challenges that supposition. With deep learning techniques, our attack demonstrates a high success rate in breaking the Roman-character-based text Captchas deployed by the top 50 most popular international websites and three Chinese Captchas that use a larger character set. These targeted schemes cover almost all existing resistance mechanisms, demonstrating that our attack techniques are also applicable to other existing Captchas. Does this work then spell the beginning of the end for text-based Captcha? We believe so. A novel image-based Captcha named Style Area Captcha (SACaptcha) is proposed in this paper, which is based on semantic information understanding, pixel-level segmentation, and deep learning techniques. Having demonstrated that text Captchas are no longer secure, we hope that our proposal shows promise in the development of image-based Captchas using deep learning techniques.

...read moreread less

71 citations

Cites background from "Breaking text-based CAPTCHAs with v..."

...and [25]–[27], of which [25]–[27] reported various attacks on previous text versions of reCAPTCHA, whereas [6] presented...
[...]

Proceedings Article•DOI•

Using synthetic data to train neural networks is model-based reasoning

[...]

Tuan Anh Le¹, Atilim Gunes Baydin¹, Robert Zinkov², Frank Wood¹•Institutions (2)

University of Oxford¹, Indiana University²

01 May 2017

TL;DR: A formal connection is drawn between using synthetic training data to optimize neural network parameters and approximate, Bayesian, model-based reasoning and successful breaking of real-world Captchas currently used by Facebook and Wikipedia is demonstrated.

...read moreread less

Abstract: We draw a formal connection between using synthetic training data to optimize neural network parameters and approximate, Bayesian, model-based reasoning. In particular, training a neural network using synthetic data can be viewed as learning a proposal distribution generator for approximate inference in the synthetic-data generative model. We demonstrate this connection in a recognition task where we develop a novel Captcha-breaking architecture and train it using synthetic data, demonstrating both state-of-the-art performance and a way of computing task-specific posterior uncertainty. Using a neural network trained this way, we also demonstrate successful breaking of real-world Captchas currently used by Facebook and Wikipedia. Reasoning from these empirical results and drawing connections with Bayesian modeling, we discuss the robustness of synthetic data results and suggest important considerations for ensuring good neural network generalization when training with synthetic data.

...read moreread less

57 citations

Cites methods from "Breaking text-based CAPTCHAs with v..."

...We wrote synthetic data generative models for seven different Captcha styles, covering the types frequently found in the Captcha breaking literature [18, 17, 20, 19]....
[...]

Posted Content•

Using Synthetic Data to Train Neural Networks is Model-Based Reasoning

[...]

Tuan Anh Le¹, Atilim Gunes Baydin¹, Robert Zinkov², Frank Wood¹•Institutions (2)

University of Oxford¹, Indiana University²

02 Mar 2017-arXiv: Learning

TL;DR: In this article, the authors draw a formal connection between using synthetic training data to optimize neural network parameters and approximate, Bayesian, model-based reasoning and demonstrate successful breaking of real-world Captchas currently used by Facebook and Wikipedia.

...read moreread less

51 citations

1
2
3
4
…
5
6
7
8
9
10
11
12

Collapse

References

PDF

Open Access

More filters

Proceedings Article•DOI•

Text-based CAPTCHA strengths and weaknesses

[...]

Elie Bursztein¹, Matthieu Martin¹, John C. Mitchell¹•Institutions (1)

Stanford University¹

17 Oct 2011

TL;DR: It is found that 13 current visual CAPTCHAs based on distorted characters that are augmented with anti-segmentation techniques from popular web sites are vulnerable to automated attacks.

...read moreread less

Abstract: We carry out a systematic study of existing visual CAPTCHAs based on distorted characters that are augmented with anti-segmentation techniques. Applying a systematic evaluation methodology to 15 current CAPTCHA schemes from popular web sites, we find that 13 are vulnerable to automated attacks. Based on this evaluation, we identify a series of recommendations for CAPTCHA designers and attackers, and possible future directions for producing more reliable human/computer distinguishers.

...read moreread less

312 citations

Proceedings Article•DOI•

The robustness of a new CAPTCHA

[...]

Ahmad Salah El Ahmad¹, Jeff Yan¹, Lindsay Marshall¹•Institutions (1)

Newcastle University¹

13 Apr 2010

TL;DR: This paper shows that this new CAPTCHA scheme deployed until very recently by Megaupload can be segmented using a simple but new automated attack with a success rate of 78%.

...read moreread less

Abstract: CAPTCHA is a standard security technology that presents tests to tell computers and humans apart. In this paper, we examine the security of a new CAPTCHA that was deployed until very recently by Megaupload, a leading online storage and delivery website. The security of this scheme relies on a novel segmentation resistance mechanism. However, we show that this CAPTCHA can be segmented using a simple but new automated attack with a success rate of 78%. It takes about 120 ms on average to segment each challenge on a standard desktop computer.

...read moreread less

72 citations

Proceedings Article•DOI•

Breaking e-banking CAPTCHAs

[...]

Shujun Li¹, S. Amier Haider Shah², M. Asad Usman Khan², Syed Ali Khayam², Ahmad-Reza Sadeghi³, Roland Schmitz - Show less +2 more•Institutions (3)

University of Konstanz¹, National University of Sciences and Technology², Ruhr University Bochum³

06 Dec 2010

TL;DR: This paper reports the first comprehensive study on e-banking CAPTCHAs deployed around the world, and proposes a new set of image processing and pattern recognition techniques that can be used to break all e-Banks CAPTCHA schemes that are found over the Internet.

...read moreread less

Abstract: Many financial institutions have deployed CAPTCHAs to protect their services (e.g., e-banking) from automated attacks. In addition to CAPTCHAs for login, CAPTCHAs are also used to prevent malicious manipulation of e-banking transactions by automated Man-in-the-Middle (MitM) attackers. Despite serious financial risks, security of e-banking CAPTCHAs is largely unexplored. In this paper, we report the first comprehensive study on e-banking CAPTCHAs deployed around the world. A new set of image processing and pattern recognition techniques is proposed to break all e-banking CAPTCHA schemes that we found over the Internet, including three e-banking CAPTCHA schemes for transaction verification and 41 schemes for login. These broken e-banking CAPTCHA schemes are used by thousands of financial institutions worldwide, which are serving hundreds of millions of e-banking customers. The success rate of our proposed attacks are either equal to or close to 100%. We also discuss possible improvements to these e-banking CAPTCHA schemes and show essential difficulties of designing e-banking CAPTCHAs that are both secure and usable.

...read moreread less

65 citations

Journal Article•DOI•

FaceDCAPTCHA: Face detection based color image CAPTCHA

[...]

Gaurav Goswami¹, Brian M. Powell², Mayank Vatsa¹, Richa Singh¹, Afzel Noore² - Show less +1 more•Institutions (2)

Indraprastha Institute of Information Technology¹, West Virginia University²

01 Feb 2014-Future Generation Computer Systems

TL;DR: The proposed algorithm generates a face image-based CAPTCHA that offers better human accuracy and lower machine attack rates compared to existing approaches.

...read moreread less

64 citations

Proceedings Article•DOI•

Easy does it: more usable CAPTCHAs

[...]

Elie Bursztein¹, Angelique Moscicki¹, Celine Fabry², Steven Bethard², John C. Mitchell², Dan Jurafsky² - Show less +2 more•Institutions (2)

Google¹, Stanford University²

26 Apr 2014

TL;DR: This paper describes how two new CAPTCHA schemes for Google that focus on maximizing usability are designed and tested, and how the resulting scheme is now an integral part of the production system and is served to millions of users.

...read moreread less

Abstract: Websites present users with puzzles called CAPTCHAs to curb abuse caused by computer algorithms masquerading as people. While CAPTCHAs are generally effective at stopping abuse, they might impair website usability if they are not properly designed. In this paper we describe how we designed two new CAPTCHA schemes for Google that focus on maximizing usability. We began by running an evaluation on Amazon Mechanical Turk with over 27,000 respondents to test the usability of different feature combinations. Then we studied user preferences using Google's consumer survey infrastructure. Finally, drawing on the insights gleaned during those studies, we tested our new captcha schemes first on Mechanical Turk and then on a fraction of production traffic. The resulting scheme is now an integral part of our production system and is served to millions of users. Our scheme achieved a 95.3% human accuracy, a 6.7.

...read moreread less

54 citations