scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Password Cracking Using Probabilistic Context-Free Grammars

17 May 2009-pp 391-405
TL;DR: This paper discusses a new method that generates password structures in highest probability order by automatically creating a probabilistic context-free grammar based upon a training set of previously disclosed passwords, and then generating word-mangling rules to be used in password cracking.
Abstract: Choosing the most effective word-mangling rules to use when performing a dictionary-based password cracking attack can be a difficult task In this paper we discuss a new method that generates password structures in highest probability order We first automatically create a probabilistic context-free grammar based upon a training set of previously disclosed passwords This grammar then allows us to generate word-mangling rules, and from them, password guesses to be used in password cracking We will also show that this approach seems to provide a more effective way to crack passwords as compared to traditional methods by testing our tools and techniques on real password sets In one series of experiments, training on a set of disclosed passwords, our approach was able to crack 28% to 129% more passwords than John the Ripper, a publicly available standard password cracking program
Citations
More filters
Proceedings ArticleDOI
20 May 2012
TL;DR: It is estimated that passwords provide fewer than 10 bits of security against an online, trawling attack, and only about 20 bits ofSecurity against an optimal offline dictionary attack, when compared with a uniform distribution which would provide equivalent security against different forms of guessing attack.
Abstract: We report on the largest corpus of user-chosen passwords ever studied, consisting of anonymized password histograms representing almost 70 million Yahoo! users, mitigating privacy concerns while enabling analysis of dozens of subpopulations based on demographic factors and site usage characteristics. This large data set motivates a thorough statistical treatment of estimating guessing difficulty by sampling from a secret distribution. In place of previously used metrics such as Shannon entropy and guessing entropy, which cannot be estimated with any realistically sized sample, we develop partial guessing metrics including a new variant of guesswork parameterized by an attacker's desired success rate. Our new metric is comparatively easy to approximate and directly relevant for security engineering. By comparing password distributions with a uniform distribution which would provide equivalent security against different forms of guessing attack, we estimate that passwords provide fewer than 10 bits of security against an online, trawling attack, and only about 20 bits of security against an optimal offline dictionary attack. We find surprisingly little variation in guessing difficulty; every identifiable group of users generated a comparably weak password distribution. Security motivations such as the registration of a payment card have no greater impact than demographic factors such as age and nationality. Even proactive efforts to nudge users towards better password choices with graphical feedback make little difference. More surprisingly, even seemingly distant language communities choose the same weak passwords and an attacker never gains more than a factor of 2 efficiency gain by switching from the globally optimal dictionary to a population-specific lists.

711 citations

Proceedings ArticleDOI
20 May 2012
TL;DR: An efficient distributed method is developed for calculating how effectively several heuristic password-guessing algorithms guess passwords, and the relationship between guess ability, as measured with password-cracking algorithms, and entropy estimates is investigated.
Abstract: Text-based passwords remain the dominant authentication method in computer systems, despite significant advancement in attackers' capabilities to perform password cracking. In response to this threat, password composition policies have grown increasingly complex. However, there is insufficient research defining metrics to characterize password strength and using them to evaluate password-composition policies. In this paper, we analyze 12,000 passwords collected under seven composition policies via an online study. We develop an efficient distributed method for calculating how effectively several heuristic password-guessing algorithms guess passwords. Leveraging this method, we investigate (a) the resistance of passwords created under different conditions to guessing, (b) the performance of guessing algorithms under different training sets, (c) the relationship between passwords explicitly created under a given composition policy and other passwords that happen to meet the same requirements, and (d) the relationship between guess ability, as measured with password-cracking algorithms, and entropy estimates. Our findings advance understanding of both password-composition policies and metrics for quantifying password security.

464 citations


Cites background or methods from "Password Cracking Using Probabilist..."

  • ...present a novel password-cracking technique that uses the text structure from training data while applying mangling rules to the text itself [25]....

    [...]

  • ..., [2], [13], [25], [30]), this nevertheless creates an ethical conundrum: Should our research use passwords acquired illicitly? Since this data has already been made public and is easily available, using it in our research does not increase the harm to the victims....

    [...]

  • ...The Weir algorithm determines guessing order based on the probabilities of different password structures, or patterns of character types such as letters, digits, and symbols [25]....

    [...]

  • ...net, and a variety of Finnish websites, with user valuations that are similarly difficult to assess [2], [25]....

    [...]

Proceedings ArticleDOI
01 Jan 2014
TL;DR: This paper investigates for the first time how an attacker can leverage a known password from one site to more easily guess that user's password at other sites and develops the first cross-site password-guessing algorithm, able to guess 30% of transformed passwords within 100 attempts.
Abstract: Today's Internet services rely heavily on text-based passwords for user authentication. The pervasiveness of these services coupled with the difficulty of remembering large numbers of secure passwords tempts users to reuse passwords at multiple sites. In this paper, we investigate for the first time how an attacker can leverage a known password from one site to more easily guess that user's password at other sites. We study several hundred thousand leaked passwords from eleven web sites and conduct a user survey on password reuse; we estimate that 43- 51% of users reuse the same password across multiple sites. We further identify a few simple tricks users often employ to transform a basic password between sites which can be used by an attacker to make password guessing vastly easier. We develop the first cross-site password-guessing algorithm, which is able to guess 30% of transformed passwords within 100 attempts compared to just 14% for a standard password-guessing algorithm without cross-site password knowledge.

426 citations


Cites background from "Password Cracking Using Probabilist..."

  • ...However, password meters have been shown experimentally [46] and in practice [21] to make a relatively modest impact on password choices....

    [...]

Proceedings ArticleDOI
04 Oct 2010
TL;DR: This paper attempts to determine the effectiveness of using entropy, as defined in NIST SP800-63, as a measurement of the security provided by various password creation policies, by modeling the success rate of current password cracking techniques against real user passwords.
Abstract: In this paper we attempt to determine the effectiveness of using entropy, as defined in NIST SP800-63, as a measurement of the security provided by various password creation policies. This is accomplished by modeling the success rate of current password cracking techniques against real user passwords. These data sets were collected from several different websites, the largest one containing over 32 million passwords. This focus on actual attack methodologies and real user passwords quite possibly makes this one of the largest studies on password security to date. In addition we examine what these results mean for standard password creation policies, such as minimum password length, and character set requirements.

423 citations


Cites methods from "Password Cracking Using Probabilist..."

  • ...One potential solution to this problem is to take the grammar we described in a previous paper [18] and use it to evaluate the probability of user selected passwords....

    [...]

Proceedings ArticleDOI
24 Oct 2016
TL;DR: TarGuess, a framework that systematically characterizes typical targeted guessing scenarios with seven sound mathematical models, each of which is based on varied kinds of data available to an attacker, is proposed to design novel and efficient guessing algorithms.
Abstract: While trawling online/offline password guessing has been intensively studied, only a few studies have examined targeted online guessing, where an attacker guesses a specific victim's password for a service, by exploiting the victim's personal information such as one sister password leaked from her another account and some personally identifiable information (PII). A key challenge for targeted online guessing is to choose the most effective password candidates, while the number of guess attempts allowed by a server's lockout or throttling mechanisms is typically very small. We propose TarGuess, a framework that systematically characterizes typical targeted guessing scenarios with seven sound mathematical models, each of which is based on varied kinds of data available to an attacker. These models allow us to design novel and efficient guessing algorithms. Extensive experiments on 10 large real-world password datasets show the effectiveness of TarGuess. Particularly, TarGuess I~IV capture the four most representative scenarios and within 100 guesses: (1) TarGuess-I outperforms its foremost counterpart by 142% against security-savvy users and by 46% against normal users; (2) TarGuess-II outperforms its foremost counterpart by 169% on security-savvy users and by 72% against normal users; and (3) Both TarGuess-III and IV gain success rates over 73% against normal users and over 32% against security-savvy users. TarGuess-III and IV, for the first time, address the issue of cross-site online guessing when given the victim's one sister password and some PII.

304 citations


Cites background or methods from "Password Cracking Using Probabilist..."

  • ...’s algorithm [27]: it differentiates a segment’s...

    [...]

  • ..., L4 → love and L4 → Suny) can add up to 1 [27]....

    [...]

  • ...It builds on Weir et al.’s PCFG-based algorithm [35] which has been shown a great success in dealing with trawling guessing scenarios....

    [...]

  • ...In the training phase of [27], each password is seen as a combination of letter(L)-, digit(D)- and symbol(S)- segments....

    [...]

  • ...To understand their security, a number of probabilistic guessing models, such as probabilistic context-free grammars (PCFG) [27] and Markov n-grams [18], have been suggested....

    [...]

References
More filters
Journal ArticleDOI
Lawrence R. Rabiner1
01 Feb 1989
TL;DR: In this paper, the authors provide an overview of the basic theory of hidden Markov models (HMMs) as originated by L.E. Baum and T. Petrie (1966) and give practical details on methods of implementation of the theory along with a description of selected applications of HMMs to distinct problems in speech recognition.
Abstract: This tutorial provides an overview of the basic theory of hidden Markov models (HMMs) as originated by L.E. Baum and T. Petrie (1966) and gives practical details on methods of implementation of the theory along with a description of selected applications of the theory to distinct problems in speech recognition. Results from a number of original sources are combined to provide a single source of acquiring the background required to pursue further this area of research. The author first reviews the theory of discrete Markov chains and shows how the concept of hidden states, where the observation is a probabilistic function of the state, can be used effectively. The theory is illustrated with two simple examples, namely coin-tossing, and the classic balls-in-urns system. Three fundamental problems of HMMs are noted and several practical techniques for solving these problems are given. The various types of HMMs that have been studied, including ergodic as well as left-right models, are described. >

21,819 citations

Book
01 Jan 1979
TL;DR: This book is a rigorous exposition of formal languages and models of computation, with an introduction to computational complexity, appropriate for upper-level computer science undergraduates who are comfortable with mathematical arguments.
Abstract: This book is a rigorous exposition of formal languages and models of computation, with an introduction to computational complexity. The authors present the theory in a concise and straightforward manner, with an eye out for the practical applications. Exercises at the end of each chapter, including some that have been solved, help readers confirm and enhance their understanding of the material. This book is appropriate for upper-level computer science undergraduates who are comfortable with mathematical arguments.

13,779 citations

Journal ArticleDOI
TL;DR: It is found that no finite-state Markov process that produces symbols with transition from state to state can serve as an English grammar, and the particular subclass of such processes that produce n -order statistical approximations to English do not come closer, with increasing n, to matching the output of anEnglish grammar.
Abstract: We investigate several conceptions of linguistic structure to determine whether or not they can provide simple and "revealing" grammars that generate all of the sentences of English and only these. We find that no finite-state Markov process that produces symbols with transition from state to state can serve as an English grammar. Furthermore, the particular subclass of such processes that produce n -order statistical approximations to English do not come closer, with increasing n , to matching the output of an English grammar. We formalize-the notions of "phrase structure" and show that this gives us a method for describing language which is essentially more powerful, though still representable as a rather elementary type of finite-state process. Nevertheless, it is successful only when limited to a small subset of simple sentences. We study the formal properties of a set of grammatical transformations that carry sentences with phrase structure into new sentences with derived phrase structure, showing that transformational grammars are processes of the same elementary type as phrase-structure grammars; that the grammar of English is materially simplified if phrase structure description is limited to a kernel of simple sentences from which all other sentences are constructed by repeated transformations; and that this view of linguistic structure gives a certain insight into the use and understanding of language.

2,140 citations


"Password Cracking Using Probabilist..." refers background in this paper

  • ...Context-free grammars have long been used in the study of natural languages [12, 13, 14], where they are used to generate (or parse) strings with particular structures....

    [...]

Journal ArticleDOI
TL;DR: A probabilistic method is presented which cryptanalyzes any N key cryptosystem in N 2/3 operational with N2/3 words of memory after a precomputation which requires N operations, and works in a chosen plaintext attack and can also be used in a ciphertext-only attack.
Abstract: A probabilistic method is presented which cryptanalyzes any N key cryptosystem in N^{2/3} operational with N^{2/3} words of memory (average values) after a precomputation which requires N operations. If the precomputation can be performed in a reasonable time period (e.g, several years), the additional computation required to recover each key compares very favorably with the N operations required by an exhaustive search and the N words of memory required by table lookup. When applied to the Data Encryption Standard (DES) used in block mode, it indicates that solutions should cost between 1 and 100 each. The method works in a chosen plaintext attack and, if cipher block chaining is not used, can also be used in a ciphertext-only attack.

761 citations


"Password Cracking Using Probabilist..." refers background in this paper

  • ...To estimate the risk of password-guessing attacks, it has been proposed that administrators pro-actively attempt to crack passwords in their systems [4]....

    [...]

Journal ArticleDOI
01 Sep 2004
TL;DR: To determine how to help users choose good passwords, the authors performed a controlled trial of the effects of giving users different kinds of advice.
Abstract: Users rarely choose passwords that are both hard to guess and easy to remember. To determine how to help users choose good passwords, the authors performed a controlled trial of the effects of giving users different kinds of advice. Some of their results challenge the established wisdom.

678 citations


"Password Cracking Using Probabilist..." refers background in this paper

  • ...Index Terms — Computer security, Data security, Computer crime...

    [...]