We report the results of a large scale study of password use andpassword re-use habits. The study involved half a million users over athree month period. A client component on users' machines recorded a variety of password strength, usage and frequency metrics. This allows us to measure or estimate such quantities as the average number of passwords and average number of accounts each user has, how many passwords she types per day, how often passwords are shared among sites, and how often they are forgotten. We get extremely detailed data on password strength, the types and lengths of passwords chosen, and how they vary by site. The data is the first large scale study of its kind, and yields numerous other insights into the role the passwords play in users' online experience.

/pdf/a-large-scale-study-of-web-password-habits-23a4auo2u7.pdf

A large-scale study of web password habits

コンピュータ・サイエンス : ACM computing surveys

We report on the largest corpus of user-chosen passwords ever studied, consisting of anonymized password histograms representing almost 70 million Yahoo! users, mitigating privacy concerns while enabling analysis of dozens of subpopulations based on demographic factors and site usage characteristics. This large data set motivates a thorough statistical treatment of estimating guessing difficulty by sampling from a secret distribution. In place of previously used metrics such as Shannon entropy and guessing entropy, which cannot be estimated with any realistically sized sample, we develop partial guessing metrics including a new variant of guesswork parameterized by an attacker's desired success rate. Our new metric is comparatively easy to approximate and directly relevant for security engineering. By comparing password distributions with a uniform distribution which would provide equivalent security against different forms of guessing attack, we estimate that passwords provide fewer than 10 bits of security against an online, trawling attack, and only about 20 bits of security against an optimal offline dictionary attack. We find surprisingly little variation in guessing difficulty; every identifiable group of users generated a comparably weak password distribution. Security motivations such as the registration of a payment card have no greater impact than demographic factors such as age and nationality. Even proactive efforts to nudge users towards better password choices with graphical feedback make little difference. More surprisingly, even seemingly distant language communities choose the same weak passwords and an attacker never gains more than a factor of 2 efficiency gain by switching from the globally optimal dictionary to a population-specific lists.

/pdf/the-science-of-guessing-analyzing-an-anonymized-corpus-of-70-3japkv7a47.pdf

The Science of Guessing: Analyzing an Anonymized Corpus of 70 Million Passwords

Starting around 1999, a great many graphical password schemes have been proposed as alternatives to text-based password authentication. We provide a comprehensive overview of published research in the area, covering both usability and security aspects as well as system evaluation. The article first catalogues existing approaches, highlighting novel features of selected schemes and identifying key usability or security advantages. We then review usability requirements for knowledge-based authentication as they apply to graphical passwords, identify security threats that such systems must address and review known attacks, discuss methodological issues related to empirical evaluation, and identify areas for further research and improved methodology.

/pdf/graphical-passwords-learning-from-the-first-twelve-years-1h8jsqlrwr.pdf

Graphical passwords: Learning from the first twelve years

We study the efficiency of statistical attacks on human authentication systems relying on personal knowledge questions. We adapt techniques from guessing theory to measure security against a trawling attacker attempting to compromise a large number of strangers’ accounts. We then examine a diverse corpus of real-world statistical distributions for likely answer categories such as the names of people, pets, and places and find that personal knowledge questions are significantly less secure than graphical or textual passwords. We also demonstrate that statistics can be used to increase security by proactively shaping the answer distribution to lower the prevalence of common responses.

/pdf/what-s-in-a-name-evaluating-statistical-attacks-on-personal-15bt6d7i2f.pdf

What's in a name? Evaluating statistical attacks on personal knowledge questions

Users rarely choose passwords that are both hard to guess and easy to remember. To determine how to help users choose good passwords, the authors performed a controlled trial of the effects of giving users different kinds of advice. Some of their results challenge the established wisdom.

Password memorability and security: empirical results

CAPTCHA is now almost a standard security technology. The most widely deployed CAPTCHAs are text-based schemes, which typically require users to solve a text recognition task. The state of the art of CAPTCHA design suggests that such text-based schemes should rely on segmentation resistance to provide security guarantee, as individual character recognition after segmentation can be solved with a high success rate by standard methods such as neural networks.In this paper, we present new character segmentation techniques of general value to attack a number of text CAPTCHAs, including the schemes designed and deployed by Microsoft, Yahoo and Google. In particular, the Microsoft CAPTCHA has been deployed since 2002 at many of their online services including Hotmail, MSN and Windows Live. Designed to be segmentation-resistant, this scheme has been studied and tuned by its designers over the years. However, our simple attack has achieved a segmentation success rate of higher than 90% against this scheme. It took on average ~80 ms for the attack to completely segment a challenge on an ordinary desktop computer. As a result, we estimate that this CAPTCHA could be instantly broken by a malicious bot with an overall (segmentation and then recognition) success rate of more than 60%. On the contrary, the design goal was that automated attacks should not achieve a success rate of higher than 0.01%. For the first time, this paper shows that CAPTCHAs that are carefully designed to be segmentation-resistant are vulnerable to novel but simple attacks.

/pdf/a-low-cost-attack-on-a-microsoft-captcha-4386yiatdj.pdf

A low-cost attack on a Microsoft captcha

CAPTCHA is now almost a standard security technology, and has found widespread application in commercial websites. Usability and robustness are two fundamental issues with CAPTCHA, and they often interconnect with each other. This paper discusses usability issues that should be considered and addressed in the design of CAPTCHAs. Some of these issues are intuitive, but some others have subtle implications for robustness (or security). A simple but novel framework for examining CAPTCHA usability is also proposed.

/pdf/usability-of-captchas-or-usability-issues-in-captcha-design-4ivc2rbkl1.pdf

Usability of CAPTCHAs or usability issues in CAPTCHA design

While trawling online/offline password guessing has been intensively studied, only a few studies have examined targeted online guessing, where an attacker guesses a specific victim's password for a service, by exploiting the victim's personal information such as one sister password leaked from her another account and some personally identifiable information (PII). A key challenge for targeted online guessing is to choose the most effective password candidates, while the number of guess attempts allowed by a server's lockout or throttling mechanisms is typically very small. We propose TarGuess, a framework that systematically characterizes typical targeted guessing scenarios with seven sound mathematical models, each of which is based on varied kinds of data available to an attacker. These models allow us to design novel and efficient guessing algorithms. Extensive experiments on 10 large real-world password datasets show the effectiveness of TarGuess. Particularly, TarGuess I~IV capture the four most representative scenarios and within 100 guesses: (1) TarGuess-I outperforms its foremost counterpart by 142% against security-savvy users and by 46% against normal users; (2) TarGuess-II outperforms its foremost counterpart by 169% on security-savvy users and by 72% against normal users; and (3) Both TarGuess-III and IV gain success rates over 73% against normal users and over 32% against security-savvy users. TarGuess-III and IV, for the first time, address the issue of cross-site online guessing when given the victim's one sister password and some PII.

/pdf/targeted-online-password-guessing-an-underestimated-threat-2px05fkqe1.pdf

Targeted Online Password Guessing: An Underestimated Threat

Visual CAPTCHAs have been widely used across the Internet to defend against undesirable or malicious bot programs. In this paper, we document how we have broken most such visual schemes provided at Captchaservice.org, a publicly available web service for CAPTCHA generation. These schemes were effectively resistant to attacks conducted using a high-quality Optical Character Recognition program, but were broken with a near 100% success rate by our novel attacks. In contrast to early work that relied on sophisticated computer vision or machine learning algorithms, we used simple pattern recognition algorithms but exploited fatal design errors that we discovered in each scheme. Surprisingly, our simple attacks can also break many other schemes deployed on the Internet at the time of writing: their design had similar errors. We also discuss defence against our attacks and new insights on the design of visual CAPTCHA schemes.

/pdf/breaking-visual-captchas-with-naive-pattern-recognition-3voblmwmo8.pdf

Jeff Yan

Papers

Password memorability and security: empirical results

A low-cost attack on a Microsoft captcha

Usability of CAPTCHAs or usability issues in CAPTCHA design

Targeted Online Password Guessing: An Underestimated Threat

Breaking Visual CAPTCHAs with Naive Pattern Recognition Algorithms