Superword: A honeyword system for achieving higher security goals
TL;DR: This study proposes a “matching attack” model and finds that although Erguler's honeyword system can achieve perfect flatness, the success rate of the attacker is 100% under matching attack, and proposes a new honeyword approach named Superword that isolates the direct relationship between username and the corresponding hashed password in password files.
Abstract: Generating honeywords for each user’s account is an effective way to detect whether password databases are compromised. However, there are several underlying security issues associated with honeyword techniques that need to be addressed, for example, (1) How to make it more difficult for an attacker to find an accurate match of “username-real password”? (2) How to prevent the intersection attack in multiple systems caused by password reuse without reducing usability? (3) How to reduce the success rate of targeted password guessing? In this study, we first propose a “matching attack” model and find that although Erguler’s honeyword system can achieve perfect flatness, the success rate of the attacker is 100% under matching attack. Secondly, we propose a new honeyword approach named Superword that isolates the direct relationship between username and the corresponding hashed password in password files. Additional honeypots are mixed with real accounts to detect online guessing attacks. The analysis reveals that our approach makes a matching attacker difficult to find a real password from N password hashes. Since there is no connection between the username and password in password files, our honeyword system also alleviates the multiple systems intersection attack and targeted password guessing.
Citations
More filters
24 Oct 2016
TL;DR: This paper found that people do tend to re-use each password on 1.7-3.4 different websites, they reuse passwords that are more complex, and mostly they tend to use passwords that they have to enter frequently.
Abstract: From email to online banking, passwords are an essential component of modern internet use. Yet, users do not always have good password security practices, leaving their accounts vulnerable to attack. We conducted a study which combines self-report survey responses with measures of actual online behavior gathered from 134 participants over the course of six weeks. We find that people do tend to re-use each password on 1.7-3.4 different websites, they reuse passwords that are more complex, and mostly they tend to re-use passwords that they have to enter frequently. We also investigated whether self-report measures are accurate indicators of actual behavior, finding that though people understand password security, their self-reported intentions have only a weak correlation with reality. These findings suggest that users manage the challenge of having many passwords by choosing a complex password on a website where they have to enter it frequently in order to memorize that password, and then re-using that strong password across other websites.
30 citations
TL;DR: Zhang et al. as discussed by the authors designed a secure remote user authentication scheme, SecFHome, which supports secure communication at the edge of the network and remote authentication in fog-enabled smart home systems.
Abstract: Fog computing is the best solution for IoT applications with low latency and real-time interaction. Fog can endow smart home with many smart functions and services. One of the most important services is that users can remotely access and control smart devices. Since remote users and smart homes communicate through insecure channels, it is necessary to design a secure and effective remote authentication scheme to guarantee secure communications. The existing authentication schemes designed for smart homes have some security issues and are not suitable for fog-enabled smart home environments. Therefore, this paper designs a secure remote user authentication scheme, SecFHome. It supports secure communication at the edge of the network and remote authentication in fog-enabled smart home systems. Specifically, We present an efficient authentication mode in the fog-enabled environment, which includes the edge negotiation phase and the authentication phase. SecFHome adds updated information to the authenticator, which can verify the message synchronization simultaneously with the authentication, thus improving the authentication efficiency. In addition, SecFHome does not store sensitive information of users and smart devices in the memory of the smart gateway, which can avoid various attacks caused by the compromised gateway. The formal security proof and informal security analysis show that the SecFHome is secure and can resist known attacks. Compared with the related authentication schemes, SecFHome only needs fewer communication costs and computation costs, and achieves more security features.
21 citations
Proceedings Article•
26 Mar 2014TL;DR: This paper finds that Markov models, when done correctly, perform significantly better than the Probabilistic Context-Free Grammar model proposed in Weir et al., which has been used as the state-of-the-art password model in recent research.
Abstract: A probabilistic password model assigns a probability value to each string. Such models are useful for research into understanding what makes users choose more (or less) secure passwords, and for constructing password strength meters and password cracking utilities. Guess number graphs generated from password models are a widely used method in password research. In this paper, we show that probability-threshold graphs have important advantages over guess-number graphs. They are much faster to compute, and at the same time provide information beyond what is feasible in guess-number graphs. We also observe that research in password modeling can benefit from the extensive literature in statistical language modeling. We conduct a systematic evaluation of a large number of probabilistic password models, including Markov models using different normalization and smoothing methods, and found that, among other things, Markov models, when done correctly, perform significantly better than the Probabilistic Context-Free Grammar model proposed in Weir et al., which has been used as the state-of-the-art password model in recent research.
16 citations
24 May 2021
TL;DR: In this article, the authors propose HoneyGen, a practical and highly robust HGT that produces realistic looking honeywords by leveraging representation learning techniques to learn useful and explanatory representations from a massive collection of unstructured data, i.e., each operator's password database.
Abstract: Honeywords are false passwords injected in a database for detecting password leakage. Generating honeywords is a challenging problem due to the various assumptions about the adversary's knowledge as well as users' password-selection behaviour. The success of a Honeywords Generation Technique (HGT) lies on the resulting honeywords; the method fails if an adversary can easily distinguish the real password. In this paper, we propose HoneyGen, a practical and highly robust HGT that produces realistic looking honeywords. We do this by leveraging representation learning techniques to learn useful and explanatory representations from a massive collection of unstructured data, i.e., each operator's password database. We perform both a quantitative and qualitative evaluation of our framework using the state-of-the-art metrics. Our results suggest that HoneyGen generates high-quality honeywords that cause sophisticated attackers to achieve low distinguishing success rates.
12 citations
01 May 2022
TL;DR: This work proposes four theoretic models for characterizing the attacker $\mathcal{A}$’s best distinguishing strategies, and develops the corresponding honeyword-generation method for each type of attackers, by using various representative probabilistic password guessing models.
Abstract: Honeywords are decoy passwords associated with each user account to timely detect password leakage. The key issue lies in how to generate honeywords that are hard to be differentiated from real passwords. This security mechanism was first introduced by Juels and Rivest at CCS’13, and has been covered by hundreds of media and adopted in dozens of research domains. Existing research deals with honeywords primarily in an ad hoc manner, and it is challenging to develop a secure honeyword-generation method and well evaluate (attack) it. In this work, we tackle this problem in a principled approach. We first propose four theoretic models for characterizing the attacker $\mathcal{A}$’s best distinguishing strategies, with each model based on a different combination of information available to $\mathcal{A}$ (e.g., public datasets, the victim’s personal information and registration order). These theories guide us to design effective experiments with real-world password datasets to evaluate the goodness (flatness) of a given honeyword-generation method.Armed with the four best attacking theories, we develop the corresponding honeyword-generation method for each type of attackers, by using various representative probabilistic password guessing models. Through a series of exploratory investigations, we show the use of these password models is not straightforward, but requires creative and significant efforts. Both empirical experiments and user-study results demonstrate that our methods significantly outperform prior art. Besides, we manage to resolve several previously unexplored challenges that arise in the practical deployment of a honeyword method. We believe this work pushes the honeyword research towards statistical rigor.
11 citations
References
More filters
Proceedings Article•
13 Aug 2004TL;DR: Honeyd is presented, a framework for virtual honeypots that simulates virtual computer systems at the network level and shows how the Honeyd framework helps in many areas of system security, e.g. detecting and disabling worms, distracting adversaries, or preventing the spread of spam email.
Abstract: A honeypot is a closely monitored network decoy serving several purposes: it can distract adversaries from more valuable machines on a network, provide early warning about new attack and exploitation trends, or allow in-depth examination of adversaries during and after exploitation of a honeypot. Deploying a physical honeypot is often time intensive and expensive as different operating systems require specialized hardware and every honeypot requires its own physical system. This paper presents Honeyd, a framework for virtual honeypots that simulates virtual computer systems at the network level. The simulated computer systems appear to run on unallocated network addresses. To deceive network fingerprinting tools, Honeyd simulates the networking stack of different operating systems and can provide arbitrary routing topologies and services for an arbitrary number of virtual systems. This paper discusses Honeyd's design and shows how the Honeyd framework helps in many areas of system security, e.g. detecting and disabling worms, distracting adversaries, or preventing the spread of spam email.
729 citations
20 May 2012
TL;DR: It is estimated that passwords provide fewer than 10 bits of security against an online, trawling attack, and only about 20 bits ofSecurity against an optimal offline dictionary attack, when compared with a uniform distribution which would provide equivalent security against different forms of guessing attack.
Abstract: We report on the largest corpus of user-chosen passwords ever studied, consisting of anonymized password histograms representing almost 70 million Yahoo! users, mitigating privacy concerns while enabling analysis of dozens of subpopulations based on demographic factors and site usage characteristics. This large data set motivates a thorough statistical treatment of estimating guessing difficulty by sampling from a secret distribution. In place of previously used metrics such as Shannon entropy and guessing entropy, which cannot be estimated with any realistically sized sample, we develop partial guessing metrics including a new variant of guesswork parameterized by an attacker's desired success rate. Our new metric is comparatively easy to approximate and directly relevant for security engineering. By comparing password distributions with a uniform distribution which would provide equivalent security against different forms of guessing attack, we estimate that passwords provide fewer than 10 bits of security against an online, trawling attack, and only about 20 bits of security against an optimal offline dictionary attack. We find surprisingly little variation in guessing difficulty; every identifiable group of users generated a comparably weak password distribution. Security motivations such as the registration of a payment card have no greater impact than demographic factors such as age and nationality. Even proactive efforts to nudge users towards better password choices with graphical feedback make little difference. More surprisingly, even seemingly distant language communities choose the same weak passwords and an attacker never gains more than a factor of 2 efficiency gain by switching from the globally optimal dictionary to a population-specific lists.
711 citations
17 Aug 2003
TL;DR: A new way of precalculating the data is proposed which reduces by two the number of calculations needed during cryptanalysis and it is shown that the gain could be even much higher depending on the parameters used.
Abstract: In 1980 Martin Hellman described a cryptanalytic time-memory trade-off which reduces the time of cryptanalysis by using precalculated data stored in memory. This technique was improved by Rivest before 1982 with the introduction of distinguished points which drastically reduces the number of memory lookups during cryptanalysis. This improved technique has been studied extensively but no new optimisations have been published ever since. We propose a new way of precalculating the data which reduces by two the number of calculations needed during cryptanalysis. Moreover, since the method does not make use of distinguished points, it reduces the overhead due to the variable chain length, which again significantly reduces the number of calculations. As an example we have implemented an attack on MS-Windows password hashes. Using 1.4GB of data (two CD-ROMs) we can crack 99.9% of all alphanumerical passwords hashes (237) in 13.6 seconds whereas it takes 101 seconds with the current approach using distinguished points. We show that the gain could be even much higher depending on the parameters used.
524 citations
17 May 2009
TL;DR: This paper discusses a new method that generates password structures in highest probability order by automatically creating a probabilistic context-free grammar based upon a training set of previously disclosed passwords, and then generating word-mangling rules to be used in password cracking.
Abstract: Choosing the most effective word-mangling rules to use when performing a dictionary-based password cracking attack can be a difficult task In this paper we discuss a new method that generates password structures in highest probability order We first automatically create a probabilistic context-free grammar based upon a training set of previously disclosed passwords This grammar then allows us to generate word-mangling rules, and from them, password guesses to be used in password cracking We will also show that this approach seems to provide a more effective way to crack passwords as compared to traditional methods by testing our tools and techniques on real password sets In one series of experiments, training on a set of disclosed passwords, our approach was able to crack 28% to 129% more passwords than John the Ripper, a publicly available standard password cracking program
491 citations
20 May 2012
TL;DR: An efficient distributed method is developed for calculating how effectively several heuristic password-guessing algorithms guess passwords, and the relationship between guess ability, as measured with password-cracking algorithms, and entropy estimates is investigated.
Abstract: Text-based passwords remain the dominant authentication method in computer systems, despite significant advancement in attackers' capabilities to perform password cracking. In response to this threat, password composition policies have grown increasingly complex. However, there is insufficient research defining metrics to characterize password strength and using them to evaluate password-composition policies. In this paper, we analyze 12,000 passwords collected under seven composition policies via an online study. We develop an efficient distributed method for calculating how effectively several heuristic password-guessing algorithms guess passwords. Leveraging this method, we investigate (a) the resistance of passwords created under different conditions to guessing, (b) the performance of guessing algorithms under different training sets, (c) the relationship between passwords explicitly created under a given composition policy and other passwords that happen to meet the same requirements, and (d) the relationship between guess ability, as measured with password-cracking algorithms, and entropy estimates. Our findings advance understanding of both password-composition policies and metrics for quantifying password security.
464 citations