How to Attack and Generate Honeywords
01 May 2022-pp 966-983
TL;DR: This work proposes four theoretic models for characterizing the attacker $\mathcal{A}$’s best distinguishing strategies, and develops the corresponding honeyword-generation method for each type of attackers, by using various representative probabilistic password guessing models.
Abstract: Honeywords are decoy passwords associated with each user account to timely detect password leakage. The key issue lies in how to generate honeywords that are hard to be differentiated from real passwords. This security mechanism was first introduced by Juels and Rivest at CCS’13, and has been covered by hundreds of media and adopted in dozens of research domains. Existing research deals with honeywords primarily in an ad hoc manner, and it is challenging to develop a secure honeyword-generation method and well evaluate (attack) it. In this work, we tackle this problem in a principled approach. We first propose four theoretic models for characterizing the attacker $\mathcal{A}$’s best distinguishing strategies, with each model based on a different combination of information available to $\mathcal{A}$ (e.g., public datasets, the victim’s personal information and registration order). These theories guide us to design effective experiments with real-world password datasets to evaluate the goodness (flatness) of a given honeyword-generation method.Armed with the four best attacking theories, we develop the corresponding honeyword-generation method for each type of attackers, by using various representative probabilistic password guessing models. Through a series of exploratory investigations, we show the use of these password models is not straightforward, but requires creative and significant efforts. Both empirical experiments and user-study results demonstrate that our methods significantly outperform prior art. Besides, we manage to resolve several previously unexplored challenges that arise in the practical deployment of a honeyword method. We believe this work pushes the honeyword research towards statistical rigor.
Citations
More filters
TL;DR: In this article , the authors proposed a model inversion method that can reconstruct representative samples of the target model's training data based only on the output labels, which requires the least information to succeed and therefore has the best applicability.
Abstract: In a model inversion attack, an adversary attempts to reconstruct the training data records of a target model using only the model’s output. In launching a contemporary model inversion attack, the strategies discussed are generally based on either predicted confidence score vectors, i.e., black-box attacks, or the parameters of a target model, i.e., white-box attacks. However, in the real world, model owners usually only give out the predicted labels; the confidence score vectors and model parameters are hidden as a defense mechanism to prevent such attacks. Unfortunately, we have found a model inversion method that can reconstruct representative samples of the target model’s training data based only on the output labels. We believe this attack requires the least information to succeed and, therefore, has the best applicability. The key idea is to exploit the error rate of the target model to compute the median distance from a set of data records to the decision boundary of the target model. The distance is then used to generate confidence score vectors which are adopted to train an attack model to reconstruct the representative samples. The experimental results show that highly recognizable representative samples can be reconstructed with far less information than existing methods.
3 citations
01 Sep 2022
TL;DR: A new Password hardening (PH) service called PW-Hero is proposed that equips its PH service with an option to terminate its use (i.e., opt-out), and is defined as a suite of protocols that meet desirable properties and build a simple, secure, and efficient instance.
Abstract: As the most dominant authentication mechanism, password-based authentication suffers catastrophic offline password guessing attacks once the authentication server is compromised and the password database is leaked. Password hardening (PH) service, an external/third-party crypto service, has been recently proposed to strengthen password storage and reduce the damage of authentication server compromise. However, all existing schemes are unreliable in that they overlook the important restorable property: PH service opt-out. In existing PH schemes, once the authentication server has subscribed to a PH service, it must adopt this service forever, even if it wants to stop the external/third-party PH service and restore its original password storage (or subscribe to another PH service). To fill the gap, we propose a new PH service called PW-Hero that equips its PH service with an option to terminate its use (i.e., opt-out). In PW-Hero, password authentication is strengthened against offline attacks by adding external secret spices to password records. With the opt-out property, authentication servers can proactively request to end the PH service after successful authentications. Then password records can be securely migrated to their traditional salted hash state, ready for subscription to other PH services. Besides, PW-Hero achieves all existing desirable properties, such as comprehensive verifiability, rate limits against online attacks, and user privacy. We define PW-Hero as a suite of protocols that meet desirable properties and build a simple, secure, and efficient instance. Moreover, we develop a prototype implementation and evaluate its performance, establishing the practicality of our PW-Hero service.
1 citations
TL;DR: In this paper , the authors proposed a honeyword generation technique (HGT) called HoneyGAN and an evaluation metric based on representation learning for measuring the indistinguishability of fake passwords.
Abstract: Honeywords are fictitious passwords inserted into databases in order to identify password breaches. Producing honeywords that are difficult to distinguish from actual passwords automatically is a sophisticated task. We propose a honeyword generation technique (HGT) called HoneyGAN and an evaluation metric based on representation learning for measuring the indistinguishability of fake passwords, together with a novel attack model for evaluating the efficiency of HGTs. We compare HoneyGAN to state-of-the-art HGTs proposed in the literature using both evaluation metrics and a human study. Our findings indicate that HoneyGAN creates genuine-looking honeywords, leading to a low success rate for knowledgeable attackers in identifying them. We also demonstrate that our attack model is more capable of finding real passwords among sets of honeywords compared to previous works.
1 citations
TL;DR: This paper proposes to build a more secure and trustworthy authentication system that employs off-the-shelf pre-trained language models which require no further training on real passwords to produce honeywords while retaining the PII of the associated real password, therefore significantly raising the bar for attackers.
Abstract: Honeywords are fictitious passwords inserted into databases in order to identify password breaches. The major difficulty is how to produce honeywords that are difficult to distinguish from real passwords. Although the generation of honeywords has been widely investigated in the past, the majority of existing research assumes attackers have no knowledge of the users. These honeyword generating techniques (HGTs) may utterly fail if attackers exploit users’ personally identifiable information (PII) and the real passwords include users’ PII. In this paper, we propose to build a more secure and trustworthy authentication system that employs off-the-shelf pre-trained language models which require no further training on real passwords to produce honeywords while retaining the PII of the associated real password, therefore significantly raising the bar for attackers.Weconducted a pilot experiment in which individuals are asked to distinguish between authentic passwords and honeywords when the username is provided for GPT-3 and a tweaking technique. Results show that it is extremely difficult to distinguish the real passwords from the artifical ones for both techniques. We speculate that a larger sample size could reveal a significant difference between the two HGT techniques, favouring our proposed approach.
1 citations
TL;DR: Zhang et al. as discussed by the authors investigated the intrinsic characteristics of online and of-ine guessing scenarios and proposed a systematic evaluation framework that is composed of four different dimensioned criteria to rate PSM accuracy under these two guessing scenarios (as well as various guessing strategies).
Abstract: To help users create stronger passwords, nearly every respectable web service adopts a password strength meter (PSM) to provide real-time strength feedback upon user registration and password change. Recent research has found that PSMs that provide accurate feedback can indeed effectively nudge users toward choosing stronger passwords. Thus, it is imperative to systematically evaluate existing PSMs to facilitate the selection of accurate ones. In this paper, we highlight that there is no single silver bullet metric for measuring the accuracy of PSMs: For each given guessing scenario and strategy, a specific metric is necessary. We investigate the intrinsic characteristics of online and offline guessing scenarios, and for the first time , propose a systematic evaluation framework that is composed of four different dimensioned criteria to rate PSM accuracy under these two password guessing scenarios (as well as various guessing strategies). More specifically, for online guessing , the strength misjudgments of passwords with different popularity would have varied effects on PSM accuracy, and we suggest the weighted Spearman metric and consider two typical attackers: The general attacker who is unaware of the target password distribution, and the knowledgeable attacker aware of it. For offline guessing , since the cracked passwords are generally weaker than the uncracked ones, and they correspond to two disparate distributions, we adopt the Kullback-Leibler divergence metric and investigate the four most typical guessing strategies: brute-force, dictionary-based, probability-based, and a combination of above three strategies. In particular, we propose the Precision metric to measure PSM accuracy when non-binned strength feedback (e.g., probability) is transformed into easy-to-understand bins/scores (e.g., [weak, medium, strong]). We further introduce a reconciled Precision metric to characterize the impacts of strength misjudgments in different directions (e.g., weak → strong and strong → weak) on PSM accuracy. The effectiveness and practicality of our evaluation framework are demonstrated by rating 12 leading PSMs, leveraging 14 real-world password datasets. Finally, we provide three recommendations to help improve the accuracy of PSMs.
References
More filters
20 May 2012
TL;DR: It is concluded that many academic proposals to replace text passwords for general-purpose user authentication on the web have failed to gain traction because researchers rarely consider a sufficiently wide range of real-world constraints.
Abstract: We evaluate two decades of proposals to replace text passwords for general-purpose user authentication on the web using a broad set of twenty-five usability, deployability and security benefits that an ideal scheme might provide. The scope of proposals we survey is also extensive, including password management software, federated login protocols, graphical password schemes, cognitive authentication schemes, one-time passwords, hardware tokens, phone-aided schemes and biometrics. Our comprehensive approach leads to key insights about the difficulty of replacing passwords. Not only does no known scheme come close to providing all desired benefits: none even retains the full set of benefits that legacy passwords already provide. In particular, there is a wide range from schemes offering minor security benefits beyond legacy passwords, to those offering significant security benefits in return for being more costly to deploy or more difficult to use. We conclude that many academic proposals have failed to gain traction because researchers rarely consider a sufficiently wide range of real-world constraints. Beyond our analysis of current schemes, our framework provides an evaluation methodology and benchmark for future web authentication proposals.
914 citations
20 May 2012
TL;DR: It is estimated that passwords provide fewer than 10 bits of security against an online, trawling attack, and only about 20 bits ofSecurity against an optimal offline dictionary attack, when compared with a uniform distribution which would provide equivalent security against different forms of guessing attack.
Abstract: We report on the largest corpus of user-chosen passwords ever studied, consisting of anonymized password histograms representing almost 70 million Yahoo! users, mitigating privacy concerns while enabling analysis of dozens of subpopulations based on demographic factors and site usage characteristics. This large data set motivates a thorough statistical treatment of estimating guessing difficulty by sampling from a secret distribution. In place of previously used metrics such as Shannon entropy and guessing entropy, which cannot be estimated with any realistically sized sample, we develop partial guessing metrics including a new variant of guesswork parameterized by an attacker's desired success rate. Our new metric is comparatively easy to approximate and directly relevant for security engineering. By comparing password distributions with a uniform distribution which would provide equivalent security against different forms of guessing attack, we estimate that passwords provide fewer than 10 bits of security against an online, trawling attack, and only about 20 bits of security against an optimal offline dictionary attack. We find surprisingly little variation in guessing difficulty; every identifiable group of users generated a comparably weak password distribution. Security motivations such as the registration of a payment card have no greater impact than demographic factors such as age and nationality. Even proactive efforts to nudge users towards better password choices with graphical feedback make little difference. More surprisingly, even seemingly distant language communities choose the same weak passwords and an attacker never gains more than a factor of 2 efficiency gain by switching from the globally optimal dictionary to a population-specific lists.
711 citations
17 May 2009
TL;DR: This paper discusses a new method that generates password structures in highest probability order by automatically creating a probabilistic context-free grammar based upon a training set of previously disclosed passwords, and then generating word-mangling rules to be used in password cracking.
Abstract: Choosing the most effective word-mangling rules to use when performing a dictionary-based password cracking attack can be a difficult task In this paper we discuss a new method that generates password structures in highest probability order We first automatically create a probabilistic context-free grammar based upon a training set of previously disclosed passwords This grammar then allows us to generate word-mangling rules, and from them, password guesses to be used in password cracking We will also show that this approach seems to provide a more effective way to crack passwords as compared to traditional methods by testing our tools and techniques on real password sets In one series of experiments, training on a set of disclosed passwords, our approach was able to crack 28% to 129% more passwords than John the Ripper, a publicly available standard password cracking program
491 citations
24 Oct 2016
TL;DR: TarGuess, a framework that systematically characterizes typical targeted guessing scenarios with seven sound mathematical models, each of which is based on varied kinds of data available to an attacker, is proposed to design novel and efficient guessing algorithms.
Abstract: While trawling online/offline password guessing has been intensively studied, only a few studies have examined targeted online guessing, where an attacker guesses a specific victim's password for a service, by exploiting the victim's personal information such as one sister password leaked from her another account and some personally identifiable information (PII). A key challenge for targeted online guessing is to choose the most effective password candidates, while the number of guess attempts allowed by a server's lockout or throttling mechanisms is typically very small. We propose TarGuess, a framework that systematically characterizes typical targeted guessing scenarios with seven sound mathematical models, each of which is based on varied kinds of data available to an attacker. These models allow us to design novel and efficient guessing algorithms. Extensive experiments on 10 large real-world password datasets show the effectiveness of TarGuess. Particularly, TarGuess I~IV capture the four most representative scenarios and within 100 guesses: (1) TarGuess-I outperforms its foremost counterpart by 142% against security-savvy users and by 46% against normal users; (2) TarGuess-II outperforms its foremost counterpart by 169% on security-savvy users and by 72% against normal users; and (3) Both TarGuess-III and IV gain success rates over 73% against normal users and over 32% against security-savvy users. TarGuess-III and IV, for the first time, address the issue of cross-site online guessing when given the victim's one sister password and some PII.
304 citations
TL;DR: Li et al. as discussed by the authors proposed two Zipf-like models (i.e., PDF-Zipf and CDF-ZipF) to characterize the distribution of passwords and proposed a new metric for measuring the strength of password data sets.
Abstract: Despite three decades of intensive research efforts, it remains an open question as to what is the underlying distribution of user-generated passwords. In this paper, we make a substantial step forward toward understanding this foundational question. By introducing a number of computational statistical techniques and based on 14 large-scale data sets, which consist of 113.3 million real-world passwords, we, for the first time, propose two Zipf-like models (i.e., PDF-Zipf and CDF-Zipf) to characterize the distribution of passwords. More specifically, our PDF-Zipf model can well fit the popular passwords and obtain a coefficient of determination larger than 0.97; our CDF-Zipf model can well fit the entire password data set, with the maximum cumulative distribution function (CDF) deviation between the empirical distribution and the fitted theoretical model being 0.49%~4.59% (on an average 1.85%). With the concrete knowledge of password distributions, we suggest a new metric for measuring the strength of password data sets. Extensive experimental results show the effectiveness and general applicability of the proposed Zipf-like models and security metric.
300 citations