Topic

CAPTCHA

About: CAPTCHA is a research topic. Over the lifetime, 1304 publications have been published within this topic receiving 18987 citations. The topic is also known as: Completely Automated Public Turing test to tell Computers and Humans Apart & captcha.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

The end is nigh: generic solving of text-based CAPTCHAs

[...]

Elie Bursztein¹, Jonathan Aigrain², Angelika Moscicki¹, John C. Mitchell²•Institutions (2)

Google¹, Stanford University²

19 Aug 2014

TL;DR: The effectiveness and universality of the results suggests that combining segmentation and recognition is the next evolution of catpcha solving, and that it supersedes the sequential approach used in earlier works.

...read moreread less

Abstract: Over the last decade, it has become well-established that a captcha's ability to withstand automated solving lies in the difficulty of segmenting the image into individual characters. The standard approach to solving captchas automatically has been a sequential process wherein a segmentation algorithm splits the image into segments that contain individual characters, followed by a character recognition step that uses machine learning. While this approach has been effective against particular captcha schemes, its generality is limited by the segmentation step, which is hand-crafted to defeat the distortion at hand. No general algorithm is known for the character collapsing anti-segmentation technique used by most prominent real world captcha schemes. This paper introduces a novel approach to solving captchas in a single step that uses machine learning to attack the segmentation and the recognition problems simultaneously. Performing both operations jointly allows our algorithm to exploit information and context that is not available when they are done sequentially. At the same time, it removes the need for any hand-crafted component, making our approach generalize to new captcha schemes where the previous approach can not. We were able to solve all the real world captcha schemes we evaluated accurately enough to consider the scheme insecure in practice, including Yahoo (5.33%) and ReCaptcha (33.34%), without any adjustments to the algorithm or its parameters. Our success against the Baidu (38.68%) and CNN (51.09%) schemes that use occluding lines as well as character collapsing leads us to believe that our approach is able to defeat occluding lines in an equally general manner. The effectiveness and universality of our results suggests that combining segmentation and recognition is the next evolution of catpcha solving, and that it supersedes the sequential approach used in earlier works. More generally, our approach raises questions about how to develop sufficiently secure captchas in the future.

...read moreread less

128 citations

Proceedings Article•DOI•

Yet Another Text Captcha Solver: A Generative Adversarial Network Based Approach

[...]

Guixin Ye¹, Zhanyong Tang¹, Dingyi Fang¹, Zhanxing Zhu², Yansong Feng², Pengfei Xu¹, Xiaojiang Chen¹, Zheng Wang³ - Show less +4 more•Institutions (3)

Northwest University (China)¹, Peking University², Lancaster University³

15 Oct 2018

TL;DR: This paper presents a generic, yet effective text captcha solver based on the generative adversarial network and demonstrates that the attack is generally applicable and can bypass the advanced security features employed by most modern text captcha schemes.

...read moreread less

Abstract: Despite several attacks have been proposed, text-based CAPTCHAs are still being widely used as a security mechanism One of the reasons for the pervasive use of text captchas is that many of the prior attacks are scheme-specific and require a labor-intensive and time-consuming process to construct This means that a change in the captcha security features like a noisier background can simply invalid an earlier attack This paper presents a generic, yet effective text captcha solver based on the generative adversarial network Unlike prior machine-learning-based approaches that need a large volume of manually-labeled real captchas to learn an effective solver, our approach requires significantly fewer real captchas but yields much better performance This is achieved by first learning a captcha synthesizer to automatically generate synthetic captchas to learn a base solver, and then fine-tuning the base solver on a small set of real captchas using transfer learning We evaluate our approach by applying it to 33 captcha schemes, including 11 schemes that are currently being used by 32 of the top-50 popular websites including Microsoft, Wikipedia, eBay and Google Our approach is the most capable attack on text captchas seen to date It outperforms four state-of-the-art text-captcha solvers by not only delivering a significant higher accuracy on all testing schemes, but also successfully attacking schemes where others have zero chance We show that our approach is highly efficient as it can solve a captcha within 005 second using a desktop GPU We demonstrate that our attack is generally applicable because it can bypass the advanced security features employed by most modern text captcha schemes We hope the results of our work can encourage the community to revisit the design and practical use of text captchas

...read moreread less

126 citations

Proceedings Article•DOI•

I am Robot: (Deep) Learning to Break Semantic Image CAPTCHAs

[...]

Suphannee Sivakorn¹, Iasonas Polakis¹, Angelos D. Keromytis¹•Institutions (1)

Columbia University¹

21 Mar 2016

TL;DR: A comprehensive study of reCaptcha is conducted, and a novel low-cost attack that leverages deep learning technologies for the semantic annotation of images is designed, which is extremely effective and applies to the Facebook image captcha.

...read moreread less

Abstract: Since their inception, captchas have been widely used for preventing fraudsters from performing illicit actions. Nevertheless, economic incentives have resulted in an arms race, where fraudsters develop automated solvers and, in turn, captcha services tweak their design to break the solvers. Recent work, however, presented a generic attack that can be applied to any text-based captcha scheme. Fittingly, Google recently unveiled the latest version of reCaptcha. The goal of their new system is twofold, to minimize the effort for legitimate users, while requiring tasks that are more challenging to computers than text recognition. ReCaptcha is driven by an "advanced risk analysis system" that evaluates requests and selects the difficulty of the captcha that will be returned. Users may be required to click in a checkbox, or solve a challenge by identifying images with similar content. In this paper, we conduct a comprehensive study of reCaptcha, and explore how the risk analysis process is influenced by each aspect of the request. Through extensive experimentation, we identify flaws that allow adversaries to effortlessly influence the risk analysis, bypass restrictions, and deploy large-scale attacks. Subsequently, we design a novel low-cost attack that leverages deep learning technologies for the semantic annotation of images. Our system is extremely effective, automatically solving 70.78% of the image reCaptcha challenges, while requiring only 19 seconds per challenge. We also apply our attack to the Facebook image captcha and achieve an accuracy of 83.5%. Based on our experimental findings, we propose a series of safeguards and modifications for impacting the scalability and accuracy of our attacks. Overall, while our study focuses on reCaptcha, our findings have wide implications, as the semantic information conveyed via images is increasingly within the realm of automated reasoning, the future of captchas relies on the exploration of novel directions.

...read moreread less

119 citations

Proceedings Article•DOI•

Balancing usability and security in a video CAPTCHA

[...]

Kurt Alfred Kluever¹, Richard Zanibbi²•Institutions (2)

Google¹, Rochester Institute of Technology²

15 Jul 2009

TL;DR: The usability and security of the authors' video CAPTCHA appears to be comparable to existing CAPTCHAs, and a majority of participants indicated that they found the video CAPtCHAs more enjoyable than traditional CAPT CHAs in which distorted text must be transcribed.

...read moreread less

Abstract: We present a technique for using content-based video labeling as a CAPTCHA task. Our CAPTCHAs are generated from YouTube videos, which contain labels (tags) supplied by the person that uploaded the video. They are graded using a video's tags, as well as tags from related videos. In a user study involving 184 participants, we were able to increase the human success rate on our video CAPTCHA from roughly 70% to 90%, while keeping the success rate of a tag frequency-based attack fixed at around 13%. Through a different parameterization of the challenge generation and grading algorithms, we were able to reduce the success rate of the same attack to 2%, while still increasing the human success rate from 70% to 75%. The usability and security of our video CAPTCHA appears to be comparable to existing CAPTCHAs, and a majority of participants (60%) indicated that they found the video CAPTCHAs more enjoyable than traditional CAPTCHAs in which distorted text must be transcribed.

...read moreread less

111 citations

Proceedings Article•DOI•

Using a test-to-speech synthesizer to generate a reverse Turing test

[...]

Tsz-Yan Chan¹•Institutions (1)

City University of Hong Kong¹

03 Nov 2003

TL;DR: The recognition rate of synthesized utterances in a noisy environment is reported to show that the performance of a HMM recognizer is not too bad even in the presence of background noise, and there seems to be a gap in the ability of understanding synthesized speech with background noise between humans and computers.

...read moreread less

Abstract: Recognition of synthesized speech by a diphone synthesizer is thought to be easy for a machine due to the small variation of the synthesized speech. In this paper, we report the recognition rate of synthesized utterances in a noisy environment. Our experiments show that the performance of a HMM recognizer is not too bad even in the presence of background noise. These recognition results nearly approach the performance of a human. Thus, although there seems to be a gap in the ability of understanding synthesized speech with background noise between humans and computers, our results discourage using this gap to build an audio-based CAPTCHA (completely automated public Turing test to tell computers and humans apart) (i.e., a reverse Turing test which can tell computers and humans apart). Moreover, we explored the possible use of a classification and regression tree to control the hardness of our CAPTCHA.

...read moreread less

106 citations

Collapse

Network Information

Performance

Metrics

1,462

Papers

21,185

Citations

No. of papers in the topic in previous years
Year	Papers
2023	51
2022	111
2021	72
2020	92
2019	77
2018	88

CAPTCHA

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics