scispace - formally typeset
Search or ask a question
Topic

CAPTCHA

About: CAPTCHA is a research topic. Over the lifetime, 1304 publications have been published within this topic receiving 18987 citations. The topic is also known as: Completely Automated Public Turing test to tell Computers and Humans Apart & captcha.


Papers
More filters
19 Aug 2014
TL;DR: The effectiveness and universality of the results suggests that combining segmentation and recognition is the next evolution of catpcha solving, and that it supersedes the sequential approach used in earlier works.
Abstract: Over the last decade, it has become well-established that a captcha's ability to withstand automated solving lies in the difficulty of segmenting the image into individual characters. The standard approach to solving captchas automatically has been a sequential process wherein a segmentation algorithm splits the image into segments that contain individual characters, followed by a character recognition step that uses machine learning. While this approach has been effective against particular captcha schemes, its generality is limited by the segmentation step, which is hand-crafted to defeat the distortion at hand. No general algorithm is known for the character collapsing anti-segmentation technique used by most prominent real world captcha schemes. This paper introduces a novel approach to solving captchas in a single step that uses machine learning to attack the segmentation and the recognition problems simultaneously. Performing both operations jointly allows our algorithm to exploit information and context that is not available when they are done sequentially. At the same time, it removes the need for any hand-crafted component, making our approach generalize to new captcha schemes where the previous approach can not. We were able to solve all the real world captcha schemes we evaluated accurately enough to consider the scheme insecure in practice, including Yahoo (5.33%) and ReCaptcha (33.34%), without any adjustments to the algorithm or its parameters. Our success against the Baidu (38.68%) and CNN (51.09%) schemes that use occluding lines as well as character collapsing leads us to believe that our approach is able to defeat occluding lines in an equally general manner. The effectiveness and universality of our results suggests that combining segmentation and recognition is the next evolution of catpcha solving, and that it supersedes the sequential approach used in earlier works. More generally, our approach raises questions about how to develop sufficiently secure captchas in the future.

128 citations

Proceedings ArticleDOI
15 Oct 2018
TL;DR: This paper presents a generic, yet effective text captcha solver based on the generative adversarial network and demonstrates that the attack is generally applicable and can bypass the advanced security features employed by most modern text captcha schemes.
Abstract: Despite several attacks have been proposed, text-based CAPTCHAs are still being widely used as a security mechanism One of the reasons for the pervasive use of text captchas is that many of the prior attacks are scheme-specific and require a labor-intensive and time-consuming process to construct This means that a change in the captcha security features like a noisier background can simply invalid an earlier attack This paper presents a generic, yet effective text captcha solver based on the generative adversarial network Unlike prior machine-learning-based approaches that need a large volume of manually-labeled real captchas to learn an effective solver, our approach requires significantly fewer real captchas but yields much better performance This is achieved by first learning a captcha synthesizer to automatically generate synthetic captchas to learn a base solver, and then fine-tuning the base solver on a small set of real captchas using transfer learning We evaluate our approach by applying it to 33 captcha schemes, including 11 schemes that are currently being used by 32 of the top-50 popular websites including Microsoft, Wikipedia, eBay and Google Our approach is the most capable attack on text captchas seen to date It outperforms four state-of-the-art text-captcha solvers by not only delivering a significant higher accuracy on all testing schemes, but also successfully attacking schemes where others have zero chance We show that our approach is highly efficient as it can solve a captcha within 005 second using a desktop GPU We demonstrate that our attack is generally applicable because it can bypass the advanced security features employed by most modern text captcha schemes We hope the results of our work can encourage the community to revisit the design and practical use of text captchas

126 citations

Proceedings ArticleDOI
21 Mar 2016
TL;DR: A comprehensive study of reCaptcha is conducted, and a novel low-cost attack that leverages deep learning technologies for the semantic annotation of images is designed, which is extremely effective and applies to the Facebook image captcha.
Abstract: Since their inception, captchas have been widely used for preventing fraudsters from performing illicit actions. Nevertheless, economic incentives have resulted in an arms race, where fraudsters develop automated solvers and, in turn, captcha services tweak their design to break the solvers. Recent work, however, presented a generic attack that can be applied to any text-based captcha scheme. Fittingly, Google recently unveiled the latest version of reCaptcha. The goal of their new system is twofold, to minimize the effort for legitimate users, while requiring tasks that are more challenging to computers than text recognition. ReCaptcha is driven by an "advanced risk analysis system" that evaluates requests and selects the difficulty of the captcha that will be returned. Users may be required to click in a checkbox, or solve a challenge by identifying images with similar content. In this paper, we conduct a comprehensive study of reCaptcha, and explore how the risk analysis process is influenced by each aspect of the request. Through extensive experimentation, we identify flaws that allow adversaries to effortlessly influence the risk analysis, bypass restrictions, and deploy large-scale attacks. Subsequently, we design a novel low-cost attack that leverages deep learning technologies for the semantic annotation of images. Our system is extremely effective, automatically solving 70.78% of the image reCaptcha challenges, while requiring only 19 seconds per challenge. We also apply our attack to the Facebook image captcha and achieve an accuracy of 83.5%. Based on our experimental findings, we propose a series of safeguards and modifications for impacting the scalability and accuracy of our attacks. Overall, while our study focuses on reCaptcha, our findings have wide implications, as the semantic information conveyed via images is increasingly within the realm of automated reasoning, the future of captchas relies on the exploration of novel directions.

119 citations

Proceedings ArticleDOI
15 Jul 2009
TL;DR: The usability and security of the authors' video CAPTCHA appears to be comparable to existing CAPTCHAs, and a majority of participants indicated that they found the video CAPtCHAs more enjoyable than traditional CAPT CHAs in which distorted text must be transcribed.
Abstract: We present a technique for using content-based video labeling as a CAPTCHA task. Our CAPTCHAs are generated from YouTube videos, which contain labels (tags) supplied by the person that uploaded the video. They are graded using a video's tags, as well as tags from related videos. In a user study involving 184 participants, we were able to increase the human success rate on our video CAPTCHA from roughly 70% to 90%, while keeping the success rate of a tag frequency-based attack fixed at around 13%. Through a different parameterization of the challenge generation and grading algorithms, we were able to reduce the success rate of the same attack to 2%, while still increasing the human success rate from 70% to 75%. The usability and security of our video CAPTCHA appears to be comparable to existing CAPTCHAs, and a majority of participants (60%) indicated that they found the video CAPTCHAs more enjoyable than traditional CAPTCHAs in which distorted text must be transcribed.

111 citations

Proceedings ArticleDOI
03 Nov 2003
TL;DR: The recognition rate of synthesized utterances in a noisy environment is reported to show that the performance of a HMM recognizer is not too bad even in the presence of background noise, and there seems to be a gap in the ability of understanding synthesized speech with background noise between humans and computers.
Abstract: Recognition of synthesized speech by a diphone synthesizer is thought to be easy for a machine due to the small variation of the synthesized speech. In this paper, we report the recognition rate of synthesized utterances in a noisy environment. Our experiments show that the performance of a HMM recognizer is not too bad even in the presence of background noise. These recognition results nearly approach the performance of a human. Thus, although there seems to be a gap in the ability of understanding synthesized speech with background noise between humans and computers, our results discourage using this gap to build an audio-based CAPTCHA (completely automated public Turing test to tell computers and humans apart) (i.e., a reverse Turing test which can tell computers and humans apart). Moreover, we explored the possible use of a classification and regression tree to control the hardness of our CAPTCHA.

106 citations


Network Information
Related Topics (5)
Encryption
98.3K papers, 1.4M citations
82% related
Server
79.5K papers, 1.4M citations
78% related
Wireless ad hoc network
49K papers, 1.1M citations
77% related
Wireless sensor network
142K papers, 2.4M citations
76% related
Network packet
159.7K papers, 2.2M citations
76% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202351
2022111
202172
202092
201977
201888