scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Beyond Credential Stuffing: Password Similarity Models Using Neural Networks

TL;DR: This work recast one of the core technical challenges underlying targeted attacks as the task of modeling similarity of human-chosen passwords, and proposes the first-ever defense against such targeted attacks, by way of personalized password strength meters (PPSMs).
Abstract: Attackers increasingly use passwords leaked from one website to compromise associated accounts on other websites. Such targeted attacks work because users reuse, or pick similar, passwords for different websites. We recast one of the core technical challenges underlying targeted attacks as the task of modeling similarity of human-chosen passwords. We show how to learn good password similarity models using a compilation of 1.4 billion leaked email, password pairs. Using our trained models of password similarity, we exhibit the most damaging targeted attack to date. Simulations indicate that our attack compromises more than 16% of user accounts in less than a thousand guesses, should one of their other passwords be known to the attacker and despite the use of state-of-the art countermeasures. We show via a case study involving a large university authentication service that the attacks are also effective in practice. We go on to propose the first-ever defense against such targeted attacks, by way of personalized password strength meters (PPSMs). These are password strength meters that can warn users when they are picking passwords that are vulnerable to attacks, including targeted ones that take advantage of the user’s previously compromised passwords. We design and build a PPSM that can be compressed to less than 3 MB, making it easy to deploy in order to accurately estimate the strength of a password against all known guessing attacks.
Citations
More filters
Proceedings ArticleDOI
06 Nov 2019
TL;DR: A framework for empirically analyzing the leakage of compromised credential checking protocols is provided, showing that in some contexts knowing the hash prefixes leads to a 12x increase in the efficacy of remote guessing attacks.
Abstract: To prevent credential stuffing attacks, industry best practice now proactively checks if user credentials are present in known data breaches. Recently, some web services, such as HaveIBeenPwned (HIBP) and Google Password Checkup (GPC), have started providing APIs to check for breached passwords. We refer to such services as compromised credential checking (C3) services. We give the first formal description of C3 services, detailing different settings and operational requirements, and we give relevant threat models. One key security requirement is the secrecy of a user's passwords that are being checked. Current widely deployed C3 services have the user share a small prefix of a hash computed over the user's password. We provide a framework for empirically analyzing the leakage of such protocols, showing that in some contexts knowing the hash prefixes leads to a 12x increase in the efficacy of remote guessing attacks. We propose two new protocols that provide stronger protection for users' passwords, implement them, and show experimentally that they remain practical to deploy.

30 citations

Proceedings ArticleDOI
23 May 2021
TL;DR: In this article, a deep generative model representation learning approach for password guessing is introduced, which can generate passwords with arbitrary biases and dynamically adapt the estimated password distribution to match the distribution of the attacked password set.
Abstract: Learning useful representations from unstructured data is one of the core challenges, as well as a driving force, of modern data-driven approaches. Deep learning has demonstrated the broad advantages of learning and harnessing such representations.In this paper, we introduce a deep generative model representation learning approach for password guessing. We show that an abstract password representation naturally offers compelling and versatile properties that open new directions in the extensively studied, and yet presently active, password guessing field. These properties can establish novel password generation techniques that are neither feasible nor practical with the existing probabilistic and non-probabilistic approaches. Based on these properties, we introduce: (1) A general framework for conditional password guessing that can generate passwords with arbitrary biases; and (2) an Expectation Maximization-inspired framework that can dynamically adapt the estimated password distribution to match the distribution of the attacked password set.

26 citations

Posted Content
TL;DR: It is shown that an abstract password representation naturally offers compelling and versatile properties that can be used to open new directions in the extensively studied, and yet presently active, password guessing field.
Abstract: Learning useful representations from unstructured data is one of the core challenges, as well as a driving force, of modern data-driven approaches. Deep learning has demonstrated the broad advantages of learning and harnessing such representations. In this paper, we introduce a deep generative model representation learning approach for password guessing. We show that an abstract password representation naturally offers compelling and versatile properties that can be used to open new directions in the extensively studied, and yet presently active, password guessing field. These properties can establish novel password generation techniques that are neither feasible nor practical with the existing probabilistic and non-probabilistic approaches. Based on these properties, we introduce:(1) A general framework for conditional password guessing that can generate passwords with arbitrary biases; and (2) an Expectation Maximization-inspired framework that can dynamically adapt the estimated password distribution to match the distribution of the attacked password set.

21 citations


Additional excerpts

  • ...Similarly to our SSPG framework, different works have focused on creating password variations for a given starting password [24, 46], primarily with the aim of modeling credential tweaking attacks....

    [...]

Proceedings Article
26 Mar 2014
TL;DR: This paper finds that Markov models, when done correctly, perform significantly better than the Probabilistic Context-Free Grammar model proposed in Weir et al., which has been used as the state-of-the-art password model in recent research.
Abstract: A probabilistic password model assigns a probability value to each string. Such models are useful for research into understanding what makes users choose more (or less) secure passwords, and for constructing password strength meters and password cracking utilities. Guess number graphs generated from password models are a widely used method in password research. In this paper, we show that probability-threshold graphs have important advantages over guess-number graphs. They are much faster to compute, and at the same time provide information beyond what is feasible in guess-number graphs. We also observe that research in password modeling can benefit from the extensive literature in statistical language modeling. We conduct a systematic evaluation of a large number of probabilistic password models, including Markov models using different normalization and smoothing methods, and found that, among other things, Markov models, when done correctly, perform significantly better than the Probabilistic Context-Free Grammar model proposed in Weir et al., which has been used as the state-of-the-art password model in recent research.

16 citations

Journal ArticleDOI
TL;DR: A threshold MFA key exchange protocol built on the top of a threshold oblivious pseudorandom function and an authenticated key Exchange protocol that achieves the “highest-attainable” security against all attacking attempts in the context of parties/factors being compromised/corrupted.
Abstract: Multi-factor authentication (MFA) has been widely used to safeguard high-value assets. Unlike single-factor authentication (e.g., password-only login), $t$ -factor authentication ( $t$ FA) requires a user always to carry and present $t$ specified factors so as to strengthen the security of login. Nevertheless, this may restrict user experience in limiting the flexibility of factor usage, e.g., the user may prefer to choose any factors at hand for login authentication. To bring back usability and flexibility without loss of security, we introduce a new notion of authentication, called $(t,n)$ threshold MFA , that allows a user to actively choose $t$ factors out of $n$ based on preference. We further define the “most-rigorous” multi-factor security model for the new notion, allowing attackers to control public channels, launch active/passive attacks, and compromise/corrupt any subset of parties as well as factors. We state that the model can capture the most practical security needs in the literature. We design a threshold MFA key exchange (T-MFAKE) protocol built on the top of a threshold oblivious pseudorandom function and an authenticated key exchange protocol. Our protocol achieves the “highest-attainable” security against all attacking attempts in the context of parties/factors being compromised/corrupted. As for efficiency, our design only requires $4+t$ exponentiations, 2 multi-exponentiations and $\mathbf {2}$ communication rounds. Compared with existing $t$ FA schemes, even the degenerated $(t,t)$ version of our protocol achieves the strongest security (stronger than most schemes) and higher efficiency on computational and communication. We instantiate our design on real-world platform to highlight its practicability and efficiency.

13 citations

References
More filters
Proceedings ArticleDOI
27 Jun 2016
TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Abstract: Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers—8× deeper than VGG nets [40] but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions1, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.

123,388 citations

Proceedings Article
01 Jan 2015
TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Abstract: We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments. The method is straightforward to implement, is computationally efficient, has little memory requirements, is invariant to diagonal rescaling of the gradients, and is well suited for problems that are large in terms of data and/or parameters. The method is also appropriate for non-stationary objectives and problems with very noisy and/or sparse gradients. The hyper-parameters have intuitive interpretations and typically require little tuning. Some connections to related algorithms, on which Adam was inspired, are discussed. We also analyze the theoretical convergence properties of the algorithm and provide a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework. Empirical results demonstrate that Adam works well in practice and compares favorably to other stochastic optimization methods. Finally, we discuss AdaMax, a variant of Adam based on the infinity norm.

111,197 citations


"Beyond Credential Stuffing: Passwor..." refers methods in this paper

  • ...For training, we used stochastic gradient descent (SGD) using the Adam’s optimizer [49] to minimize the cross-entropy loss [50] between the predicted output of the network and the expected output....

    [...]

Journal ArticleDOI
TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.
Abstract: Learning to store information over extended time intervals by recurrent backpropagation takes a very long time, mostly because of insufficient, decaying error backflow. We briefly review Hochreiter's (1991) analysis of this problem, then address it by introducing a novel, efficient, gradient based method called long short-term memory (LSTM). Truncating the gradient where this does not do harm, LSTM can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units. Multiplicative gate units learn to open and close access to the constant error flow. LSTM is local in space and time; its computational complexity per time step and weight is O. 1. Our experiments with artificial data involve local, distributed, real-valued, and noisy pattern representations. In comparisons with real-time recurrent learning, back propagation through time, recurrent cascade correlation, Elman nets, and neural sequence chunking, LSTM leads to many more successful runs, and learns much faster. LSTM also solves complex, artificial long-time-lag tasks that have never been solved by previous recurrent network algorithms.

72,897 citations


Additional excerpts

  • ...The embedded character is then fed to a LSTM cell with three hidden layers, each of dimension 128....

    [...]

  • ...A LSTM outputs two vectors, the first one is ignored for the encoder, and the second one, called state, is fed to the next LSTM cell, along with the next character of the password....

    [...]

  • ...The architecture of the decoder is identical to the encoder except that we consider the first output of the LSTM layer, which is projected to a vector of size |T |....

    [...]

  • ...A variant of RNN, called long short-term memory (LSTM) [45] was shown to be effective in avoiding vanishing and exploding gradient problems [46]....

    [...]

  • ...We used LSTM blocks wrapped in residual cells....

    [...]

Proceedings Article
12 Jun 2017
TL;DR: This paper proposed a simple network architecture based solely on an attention mechanism, dispensing with recurrence and convolutions entirely and achieved state-of-the-art performance on English-to-French translation.
Abstract: The dominant sequence transduction models are based on complex recurrent orconvolutional neural networks in an encoder and decoder configuration. The best performing such models also connect the encoder and decoder through an attentionm echanisms. We propose a novel, simple network architecture based solely onan attention mechanism, dispensing with recurrence and convolutions entirely.Experiments on two machine translation tasks show these models to be superiorin quality while being more parallelizable and requiring significantly less timeto train. Our single model with 165 million parameters, achieves 27.5 BLEU onEnglish-to-German translation, improving over the existing best ensemble result by over 1 BLEU. On English-to-French translation, we outperform the previoussingle state-of-the-art with model by 0.7 BLEU, achieving a BLEU score of 41.1.

52,856 citations

Journal Article
TL;DR: It is shown that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.
Abstract: Deep neural nets with a large number of parameters are very powerful machine learning systems. However, overfitting is a serious problem in such networks. Large networks are also slow to use, making it difficult to deal with overfitting by combining the predictions of many different large neural nets at test time. Dropout is a technique for addressing this problem. The key idea is to randomly drop units (along with their connections) from the neural network during training. This prevents units from co-adapting too much. During training, dropout samples from an exponential number of different "thinned" networks. At test time, it is easy to approximate the effect of averaging the predictions of all these thinned networks by simply using a single unthinned network that has smaller weights. This significantly reduces overfitting and gives major improvements over other regularization methods. We show that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.

33,597 citations


"Beyond Credential Stuffing: Passwor..." refers background in this paper

  • ...The dropout rate controls removal of neural network units (neurons) randomly during training, which is useful to prevent overfitting [52]....

    [...]

Trending Questions (2)
What are the interesting theories about credential stuffing attacks and authentication security?

The paper discusses targeted attacks and proposes personalized password strength meters (PPSMs) as a defense against credential stuffing attacks.

What are the major themes for credential stuffing?

The provided paper does not explicitly mention the major themes for credential stuffing.