scispace - formally typeset
Search or ask a question
Proceedings Article

A study of probabilistic password models

26 Mar 2014-pp 12
TL;DR: This paper finds that Markov models, when done correctly, perform significantly better than the Probabilistic Context-Free Grammar model proposed in Weir et al., which has been used as the state-of-the-art password model in recent research.
Abstract: A probabilistic password model assigns a probability value to each string. Such models are useful for research into understanding what makes users choose more (or less) secure passwords, and for constructing password strength meters and password cracking utilities. Guess number graphs generated from password models are a widely used method in password research. In this paper, we show that probability-threshold graphs have important advantages over guess-number graphs. They are much faster to compute, and at the same time provide information beyond what is feasible in guess-number graphs. We also observe that research in password modeling can benefit from the extensive literature in statistical language modeling. We conduct a systematic evaluation of a large number of probabilistic password models, including Markov models using different normalization and smoothing methods, and found that, among other things, Markov models, when done correctly, perform significantly better than the Probabilistic Context-Free Grammar model proposed in Weir et al., which has been used as the state-of-the-art password model in recent research.
Citations
More filters
Dissertation
05 Sep 2017
TL;DR: This thesis addresses several of these aspects of information security and network intrusion, from a security and cryptography viewpoint, by introducing new cryptographic algorithms, new protocols, improved algorithms, and an efficient integer multiplication algorithm.
Abstract: Information security relies on the correct interaction of several abstraction layers: hardware, operating systems, algorithms, and networks. However, protecting each component of the technological stack has a cost; for this reason, many devices are left unprotected or under-protected. This thesis addresses several of these aspects, from a security and cryptography viewpoint. To that effect we introduce new cryptographic algorithms (such as extensions of the Naccache–Stern encryption scheme), new protocols (including a distributed zero-knowledge identification protocol), improved algorithms (including a new error-correcting code, and an efficient integer multiplication algorithm), as well as several contributions relevant to information security and network intrusion. Furthermore, several of these contributions address the performance of existing and newly-introduced constructions.

21 citations

Dissertation
05 Jun 2018
TL;DR: It is shown that more than 16 million password pairs (including 30% of the modified passwords) can be cracked within just 10 guesses, and a new training-based guessing algorithm is developed to quantify the security risks.
Abstract: Leaked passwords from data breaches can pose a serious threat if users reuse or slightly modify the passwords for other services. With more services getting breached today, there is still a lack of a quantitative understanding of this risk. In this paper, we perform the first large-scale empirical analysis of password reuse and modification patterns using a ground-truth dataset of 28.8 million users and their 61.5 million passwords in 107 services over 8 years. We find that password reuse and modification is very common (observed on 52% of the users). Sensitive online services such as shopping websites and email services received the most reused and modified passwords. We also observe that users would still reuse the already-leaked passwords for other online services for years after the initial data breach. Finally, to quantify the security risks, we develop a new training-based guessing algorithm. We show that more than 16 million password pairs (including 30% of the modified passwords) can be cracked within just 10 guesses.

12 citations


Cites methods from "A study of probabilistic password m..."

  • ...Over the last decades, a number of guessing methods have been proposed, including Markov Model [16, 20], Mangled Wordlist method [35], Probabilistic Context-Free GrammarsMethod (PCFGs) [12, 20, 36, 41], and Deep Neural Networks [19]....

    [...]

Posted Content
TL;DR: A multisketch over tenprints, preventing an attacker from learning the biometric data of a user in the advent of a breach, but enabling derivation of user-specific secret keys upon successful user authentication is designed.
Abstract: Biometric authentication is increasingly being used for large scale human authentication and identification, creating the risk of leaking the biometric secrets of millions of users in the case of database compromise. Powerful "fuzzy" cryptographic techniques for biometric template protection, such as secure sketches, could help in principle, but go unused in practice. This is because they would require new biometric matching algorithms with potentially much diminished accuracy. We introduce a new primitive called a multisketch that generalizes secure sketches. Multisketches can work with existing biometric matching algorithms to generate strong cryptographic keys from biometric data reliably. A multisketch works on a biometric database containing multiple biometrics --- e.g., multiple fingerprints --- of a moderately large population of users (say, thousands). It conceals the correspondence between users and their biometric templates, preventing an attacker from learning the biometric data of a user in the advent of a breach, but enabling derivation of user-specific secret keys upon successful user authentication. We design a multisketch over tenprints --- fingerprints of ten fingers --- called TenSketch. We report on a prototype implementation of TenSketch, showing its feasibility in practice. We explore several possible attacks against TenSketch database and show, via simulations with real tenprint datasets, that an attacker must perform a large amount of computation to learn any meaningful information from a stolen TenSketch database.

9 citations


Cites methods from "A study of probabilistic password m..."

  • ...To generate high confidence — according to the classifiers — message tuples, the attacker can use hill-climbing approach as described below, or an approach similar to generating passwords from a Markov model [47]....

    [...]

Dissertation
06 Feb 2020
TL;DR: HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not, for teaching and research institutions in France or abroad.
Abstract: HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés. Etudes en sécurité informatique Patrick Lacharme

6 citations

Journal ArticleDOI
TL;DR: This paper proposes a context-based password strength meter and investigates its effectiveness on users' password generating behavior, finding that it is significantly effective and suggests that simply incorporating a contextual information to password strength meters could be one of potential methods in promoting secure behaviors among end users.
Abstract: Encouraging users to create stronger passwords has always been one of the key issues in password-based authentication. It is particularly important as passwords are still the most common user authentication method. Furthermore, prior works have highlighted that most passwords are significantly weak. In this paper, we seek to mitigate such an issue by proposing a context-based password strength meter and investigating its effectiveness on users' password generating behavior. We conduct a randomized experiment on Amazon MTurk involving hypothetical account creating scenarios. We observe the change in users' behavior in terms of the number of occasions where users change their password after seeing the warning message, the number of occasions where users want to learn more about creating stronger passwords, and the changes in password strength. We find that our proposed password strength meter is significantly effective. Users exposed to our password strength meter are more likely to change their password, and those new passwords are stronger. Furthermore, if the information is readily available, users are willing to invest their time to learn about creating a stronger password, even in a traditional password strength meter setting. Our findings suggest that simply incorporating a contextual information to password strength meters could be one of potential methods in promoting secure behaviors among end users.

4 citations


Cites background or methods from "A study of probabilistic password m..."

  • ...For example, the use of probabilistic context-free grammar (e.g., Weir et al., 2009; Veras et al., 2014), and Markov models (e.g., Castelluccia et al., 2012; Ma et al., 2014) have been proposed....

    [...]

  • ...The rank is calculated by first generating a list of passwords using the Backoff Markov Model described earlier....

    [...]

  • ...Accordingly, we reviewed the literature that proposes different probabilistic models and found that the Backoff Markov Model is consistently cited as the best model for our type of experiment (e.g., Ma et al., 2014; Ur et al., 2015)....

    [...]

  • ...Our meter displays one of the following strength labels based on the results generated by the Backoff Markov Model from the password entered: Weak, Medium, Strong....

    [...]

  • ...Electronic copy available at: https://ssrn.com/abstract=2800499 50 Electronic copy available at: https://ssrn.com/abstract=2800499 51 Electronic copy available at: https://ssrn.com/abstract=2800499 52 D. Markov Model and Markov Model with Backoff D.1 Markov Model 𝑁-gram models, i.e., Markov chains, have been applied to passwords (Castelluccia et al., 2012)....

    [...]

References
More filters
Journal ArticleDOI
S. Katz1
TL;DR: The model offers, via a nonlinear recursive procedure, a computation and space efficient solution to the problem of estimating probabilities from sparse data, and compares favorably to other proposed methods.
Abstract: The description of a novel type of m-gram language model is given. The model offers, via a nonlinear recursive procedure, a computation and space efficient solution to the problem of estimating probabilities from sparse data. This solution compares favorably to other proposed methods. While the method has been developed for and successfully implemented in the IBM Real Time Speech Recognizers, its generality makes it applicable in other areas where the problem of estimating probabilities from sparse data arises.

2,038 citations


"A study of probabilistic password m..." refers background or methods in this paper

  • ...Such techniques include Laplace smoothing, Good-Turing smoothing [13], and backoff [16]....

    [...]

  • ...A whole-string model, on the other hand, does not divide a password into segments....

    [...]

Proceedings ArticleDOI
08 May 2007
TL;DR: The study involved half a million users over athree month period and gets extremely detailed data on password strength, the types and lengths of passwords chosen, and how they vary by site.
Abstract: We report the results of a large scale study of password use andpassword re-use habits. The study involved half a million users over athree month period. A client component on users' machines recorded a variety of password strength, usage and frequency metrics. This allows us to measure or estimate such quantities as the average number of passwords and average number of accounts each user has, how many passwords she types per day, how often passwords are shared among sites, and how often they are forgotten. We get extremely detailed data on password strength, the types and lengths of passwords chosen, and how they vary by site. The data is the first large scale study of its kind, and yields numerous other insights into the role the passwords play in users' online experience.

1,068 citations


"A study of probabilistic password m..." refers background in this paper

  • ...8 for other scenarios, because of space limitation....

    [...]

Proceedings ArticleDOI
20 May 2012
TL;DR: It is estimated that passwords provide fewer than 10 bits of security against an online, trawling attack, and only about 20 bits ofSecurity against an optimal offline dictionary attack, when compared with a uniform distribution which would provide equivalent security against different forms of guessing attack.
Abstract: We report on the largest corpus of user-chosen passwords ever studied, consisting of anonymized password histograms representing almost 70 million Yahoo! users, mitigating privacy concerns while enabling analysis of dozens of subpopulations based on demographic factors and site usage characteristics. This large data set motivates a thorough statistical treatment of estimating guessing difficulty by sampling from a secret distribution. In place of previously used metrics such as Shannon entropy and guessing entropy, which cannot be estimated with any realistically sized sample, we develop partial guessing metrics including a new variant of guesswork parameterized by an attacker's desired success rate. Our new metric is comparatively easy to approximate and directly relevant for security engineering. By comparing password distributions with a uniform distribution which would provide equivalent security against different forms of guessing attack, we estimate that passwords provide fewer than 10 bits of security against an online, trawling attack, and only about 20 bits of security against an optimal offline dictionary attack. We find surprisingly little variation in guessing difficulty; every identifiable group of users generated a comparably weak password distribution. Security motivations such as the registration of a payment card have no greater impact than demographic factors such as age and nationality. Even proactive efforts to nudge users towards better password choices with graphical feedback make little difference. More surprisingly, even seemingly distant language communities choose the same weak passwords and an attacker never gains more than a factor of 2 efficiency gain by switching from the globally optimal dictionary to a population-specific lists.

711 citations


"A study of probabilistic password m..." refers background in this paper

  • ...8 for other scenarios, because of space limitation....

    [...]

Proceedings ArticleDOI
17 May 2009
TL;DR: This paper discusses a new method that generates password structures in highest probability order by automatically creating a probabilistic context-free grammar based upon a training set of previously disclosed passwords, and then generating word-mangling rules to be used in password cracking.
Abstract: Choosing the most effective word-mangling rules to use when performing a dictionary-based password cracking attack can be a difficult task In this paper we discuss a new method that generates password structures in highest probability order We first automatically create a probabilistic context-free grammar based upon a training set of previously disclosed passwords This grammar then allows us to generate word-mangling rules, and from them, password guesses to be used in password cracking We will also show that this approach seems to provide a more effective way to crack passwords as compared to traditional methods by testing our tools and techniques on real password sets In one series of experiments, training on a set of disclosed passwords, our approach was able to crack 28% to 129% more passwords than John the Ripper, a publicly available standard password cracking program

491 citations


"A study of probabilistic password m..." refers background or methods in this paper

  • ...In this paper, we considered 3 instantiations of PCFGW : the first uses the dictionary used in [25]; the second uses the OpenWall dictionary; and the third generates the dictionary from the training set....

    [...]

  • ...One such approach is NIST’s recommended scheme [7] for estimating the entropy of one password, which is mainly based on their length....

    [...]

  • ...This is clearly not the case in practice....

    [...]

  • ...A template-based model divides a password into several segments, often by grouping consecutive characters of the same category (e.g., lower-case letters, digits, etc.) into one segment, and then generates the probability for each segment independently, e.g., [21], [25]....

    [...]

  • ...From Table II(c), we can see that the CSDN dataset includes 9.78% of passwords of length 11, whereas the PhpBB dataset includes only 2.1%, resulting in a ratio of 4.66 to 1, and this ratio keeps increasing....

    [...]

Proceedings ArticleDOI
20 May 2012
TL;DR: An efficient distributed method is developed for calculating how effectively several heuristic password-guessing algorithms guess passwords, and the relationship between guess ability, as measured with password-cracking algorithms, and entropy estimates is investigated.
Abstract: Text-based passwords remain the dominant authentication method in computer systems, despite significant advancement in attackers' capabilities to perform password cracking. In response to this threat, password composition policies have grown increasingly complex. However, there is insufficient research defining metrics to characterize password strength and using them to evaluate password-composition policies. In this paper, we analyze 12,000 passwords collected under seven composition policies via an online study. We develop an efficient distributed method for calculating how effectively several heuristic password-guessing algorithms guess passwords. Leveraging this method, we investigate (a) the resistance of passwords created under different conditions to guessing, (b) the performance of guessing algorithms under different training sets, (c) the relationship between passwords explicitly created under a given composition policy and other passwords that happen to meet the same requirements, and (d) the relationship between guess ability, as measured with password-cracking algorithms, and entropy estimates. Our findings advance understanding of both password-composition policies and metrics for quantifying password security.

464 citations


"A study of probabilistic password m..." refers background or methods in this paper

  • ...8 for other scenarios, because of space limitation....

    [...]

  • ...Researchers have studied the quality of users’ password choices under different scenarios [17], [18], [22], [23], [26]....

    [...]

  • ...This is done in the guess calculator framework in [17], [20], which is based on the PCFGW model....

    [...]