scispace - formally typeset
Open AccessProceedings ArticleDOI

Think Harder! Investigating the Effect of Password Strength on Cognitive Load during Password Creation

TLDR
In this paper, the relation between password creation and cognitive load inferred from eye pupil diameter was investigated, and the results showed that passwords with different strengths affect the pupil diameter, thereby giving an indication of the user's cognitive state.
Abstract
Strict password policies can frustrate users, reduce their productivity, and lead them to write their passwords down. This paper investigates the relation between password creation and cognitive load inferred from eye pupil diameter. We use a wearable eye tracker to monitor the user’s pupil size while creating passwords with different strengths. To assess how creating passwords of different strength (namely weak and strong) influences users’ cognitive load, we conducted a lab study (N = 15). We asked the participants to create and enter 6 weak and 6 strong passwords. The results showed that passwords with different strengths affect the pupil diameter, thereby giving an indication of the user’s cognitive state. Our initial investigation shows the potential for new applications in the field of cognition-aware user interfaces. For example, future systems can use our results to determine whether the user created a strong password based on their gaze behavior, without the need to reveal the characteristics of the password.

read more

Content maybe subject to copyright    Report

Abdrabou, Y., Abdelrahman, Y., Khamis, M. and Alt, F. (2021) Think Harder!
Investigating the Effect of Password Strength on Cognitive Load during Password
Creation. In: 2021 ACM CHI Virtual Conference on Human Factors in Computing
Systems, 08-13 May 2021, p. 259. ISBN 9781450380959.
There may be differences between this version and the published version. You are
advised to consult the publisher’s version if you wish to cite from it.
© Association for Computing Machinery 2021. This is the author's version of the
work. It is posted here for your personal use. Not for redistribution. The definitive
Version of Record was published in 2021 ACM CHI Virtual Conference on Human
Factors in Computing Systems, 08-13 May 2021, p. 259. ISBN 9781450380959.
http://dx.doi.org/10.1145/3411763.3451636.
http://eprints.gla.ac.uk/236283/
Deposited on: 10 March 2021
Enlighten Research publications by members of the University of Glasgow
http://eprints.gla.ac.uk

Think Harder! Investigating the Eect of Password Strength on
Cognitive Load during Password Creation
Yasmeen Abdrabou
yasmeen.essam@unibw.de
Bundeswehr University Munich
Germany
Yomna Abdelrahman
yomna.abdelrahman@unibw.de
Bundeswehr University Munich
Germany
Mohamed Khamis
mohamed.khamis@glasgow.ac.uk
University of Glasgow
Glasgow, United Kingdom
Florian Alt
orian.alt@unibw.de
Bundeswehr University Munich
Germany
ABSTRACT
Strict password policies can frustrate users, reduce their produc-
tivity, and lead them to write their passwords down. This paper
investigates the relation between password creation and cogni-
tive load inferred from eye pupil diameter. We use a wearable eye
tracker to monitor the user’s pupil size while creating passwords
with dierent strengths. To assess how creating passwords of dier-
ent strength (namely weak and strong) inuences users’ cognitive
load, we conducted a lab study (
𝑁 =
15). We asked the participants
to create and enter 6 weak and 6 strong passwords. The results
showed that passwords with dierent strengths aect the pupil
diameter, thereby giving an indication of the user’s cognitive state.
Our initial investigation shows the potential for new applications
in the eld of cognition-aware user interfaces. For example, future
systems can use our results to determine whether the user created
a strong password based on their gaze behavior, without the need
to reveal the characteristics of the password.
CCS CONCEPTS
Human-centered computing Human computer interac-
tion (HCI)
;
Security and privacy Human and societal as-
pects of security and privacy.
KEYWORDS
Eye Tracking, Cognitive Load, Pupillometry, Cognition-Aware User
Interfaces, Passwords Strength
ACM Reference Format:
Yasmeen Abdrabou, Yomna Abdelrahman, Mohamed Khamis, and Florian
Alt. 2021. Think Harder! Investigating the Eect of Password Strength
on Cognitive Load during Password Creation. In CHI Conference on Hu-
man Factors in Computing Systems Extended Abstracts (CHI ’21 Extended
Abstracts), May 8–13, 2021, Yokohama, Japan. ACM, New York, NY, USA,
7 pages. https://doi.org/10.1145/3411763.3451636
CHI ’21 Extended Abstracts, May 8–13, 2021, Yokohama, Japan
© 2021 Association for Computing Machinery.
This is the author’s version of the work. It is posted here for your personal use. Not for
redistribution. The denitive Version of Record was published in CHI Conference on
Human Factors in Computing Systems Extended Abstracts (CHI ’21 Extended Abstracts),
May 8–13, 2021, Yokohama, Japan, https://doi.org/10.1145/3411763.3451636.
1 INTRODUCTION
Passwords are the most popular authentication mechanism [
25
].
Ideally, the password selection process is achieved by complying to
strong password heuristics and nding the best match between an
easy to remember password that is at the same time hard to guess
[
25
]. Weak passwords that can be cracked might cause unautho-
rized access to an organization’s information assets. Thus, many
organizations enforce password change in frequent intervals to
address passwords leakage [
5
]. At the same time, research showed
that strict password policies decrease employees’ productivity [
27
]
and can even result in less security as employees work around rules
to easily remember their passwords [40].
Password meters are used in many interfaces to help users create
strong and secure passwords [
40
]. Ur et al. [
38
] found that partici-
pants had misconceptions about the impact of basing passwords
on common phrases and including digits and keyboard patterns in
their passwords. However, they also found that in most cases, users’
perceptions of what characteristics make a strong secure password
were consistent with password meter tools. The fact that users’
perceptions of what characteristics make a strong password are
accurate, motivated us to explore whether systems can learn about
the strength of created passwords through the users rather than
by examining the passwords themselves. Doing so has a security
advantage: no third party applications would need to examine the
created password to evaluate its strength. It also has a usability
advantage: if we are able to determine password strength through
the user’s cognitive load (e.g., as estimated via an eye tracker), then
users can consciously learn about their password’s strength, even
if the used interface does not measure the password’s strength.
In this work, we contribute an investigation of the relationship
between perceived password strength and cognitive load and how it
aects the pupil diameter. We use a wearable eye tracker to monitor
users’ pupil size while creating passwords with dierent strengths.
We found that the pupil dilates while creating strong passwords
and contracts while creating weak passwords. To the best of our
knowledge, we are the rst to investigate the relation between
password strength and cognitive load. Unlike password strength
meters that estimate the password strength based on the password
characters, our work allows systems to determine the perceived
strength of a password without revealing its characteristics. Our
ndings allow for new applications in the eld of cognition-aware

CHI ’21 Extended Abstracts, May 8–13, 2021, Yokohama, Japan Abdrabou et al.
interfaces, for example, suggesting verbal, visual or spatial cues to
help the user creating unique, memorable passwords [3].
2 RELATED WORK
Our work builds on prior research on utilizing eye tracking for
cognitive load state estimation and password strength.
2.1 Pupillometry and Cognitive Load
Three types of cognitive load measures were introduced in liter-
ature: subjective, physiological and performance measures [
28
].
Subjective measures reect the user’s subjective assessment of cog-
nitive load. The NASA-TLX questionnaire [
14
] is a frequently used
assessment tool for subjective cognitive load. However, such a tool
cannot account for rapid changes in the cognitive load that may
be the result of changes in the experiment. Physiological measures
include pupil dilation, heart-rate variability, and galvanic skin re-
sponse [
6
,
17
,
19
]. Changes in these measures have been shown to
correlate with dierent levels of cognitive load [
15
,
41
]. However,
physiological measures depend on many factors, including other
aspects of the user’s cognitive state such as anxiety [
7
], arousal [
21
],
the user’s physical activity [
33
], and environmental variables such
as light [
32
]. Hence, researchers should draw attention to the study
conditions and user’s state. Finally, performance measures captures
how eciently is the user performing a given task. The method is
based on the standardization of raw scores for mental eort and
task performance to z scores, which are displayed in a cross of axes
[
29
]. In our work, we use the second measure "physiologically"
as it is captured without requiring participants to reect on their
performance during password creation nor ll a questionnaire.
In the last decades, researchers have investigated the pupillary
response for dierent types of tasks [
8
,
9
,
16
,
23
]. Pupil dilation was
found to be higher for more challenging tasks [
11
,
26
]. Not only task
demands have been found to inuence the pupil diameter, but also
factors like anxiety [
7
], stress [
10
], and fatigue [
37
]. A study done by
Just and Carpenter [
20
], showcased that pupil responses can be an
indicator of the eort to understand and process information. They
conducted an experiment where participants were given two sen-
tences of dierent complexities to read while they would measure
their pupil diameters. They found that the pupillary dilation was
larger while readers processed the sentence that was complicated
and more subtle while reading the simpler one. It was also shown
that pupil size correlates to the diculty of a cognitive task [
15
].
Over the years, researchers have encountered some challenges in
pupillometry such as luminance. One way to improve validity is
to strictly control the luminance of the experimental stimuli, but
this limits the potential of pupillometry. While cognitive load can
be aected by a large number of factors, pupillometry oers a re-
sponsive signal that can potentially provide approximate real-time
feedback of the users’ arousal and potentially their cognitive load.
We expect that creating stronger passwords is more dicult and
thus cognitively demanding. This motivated us to study the relation
between cognitive load and password creation.
2.2 Password Strength
Passwords are the most popular authentication mechanism [
25
].
There are dierent types of attacks that passwords might be vulner-
able to e.g., brute force and guessing attacks [
31
]. Hence, system
administrators started employing password-composition policies
to eliminate attacks [
13
,
39
]. To help users create strong passwords,
password meters are integrated to interfaces to give users an esti-
mate of how strong their passwords are and hence, how easy it is
to be cracked [
13
]. Researchers found that password meters design,
color and feedback messages have an inuence on the strength of
the created passwords [
12
,
13
,
34
,
39
]. Although prior work has
shown that password-composition policies requiring more charac-
ters or more character classes can improve resistance to automated
guessing attacks, many passwords that meet common policies re-
main vulnerable [
22
,
42
]. Furthermore, strict policies can frustrate
users, reduce their productivity, and lead users to write their pass-
words down [1, 18, 35].
Ur et al. [
38
] found that users are aware of what makes a pass-
word strong. This suggests that putting more eort in creating a
password might be an indication that it is a strong one. This mo-
tivated us to study the relation between password strength and
cognitive load during password creation. If such a connection exists,
future systems can then determine the strength of a password based
on the user’s cognitive load, alleviating the need for systems to
access the password characteristics.
Hence, the need to study the relation between creating passwords
and cognitive load is a must. Therefore, in this paper, we introduce
using pupillometry to detect users’ cognitive load while creating
weak and strong passwords.
3 CONCEPT AND METHODOLOGY
In this section, we describe our concept and approach of evaluating
cognitive load from pupil diameter. Since the relation between
pupil diameter and cognitive load has already been proven (see
subsection 2.1). In this work, we look at how the users’ cognitive
load changes during weak and strong passwords creation (
RQ
).
Bafna et al. [
4
] showed that there is increase in cognitive load when
participants were asked to memorize and type dicult vs easy
sentences. Inspired by them, we hypothesize that creating strong
passwords will induce higher cognitive load compared to creating
weak passwords.
For this we ran a lab study to answer our research question. In
the following, we highlight how we analyzed the collected data.
First, we analyzed the collected passwords’ strength against the
zxcvbn password meter [
43
] to see if participants’ rating matches
the system rating. Second, we extracted the pupil diameter variance
between weak and strong passwords and tested their statistical
signicance. Third, we calculated the mean pupil diameter change
(MPDC) as a mean to calculate the cognitive load while creating
passwords of dierent strengths.
3.1 Password Strength Meter
We analyzed and compared user rated password strength against
the zxcvbn password strength meter [
43
] (details in Section 5.2).
In addition, we statistically analyzed the rated weak and strong

Investigating the Eect of Password Strength on Cognitive Load CHI ’21 Extended Abstracts, May 8–13, 2021, Yokohama, Japan
passwords strength using repeated measures ANOVA and the gen-
erated entropy for weak and strong passwords by the zxcvbn meter.
Finally, we further analyzed the post-study questions and reported
their results. We used a cut o score of 2.5 for dierentiating be-
tween weak and strong passwords where from 1 to 2.5 is considered
as weak password and from more than 2.5 to 5 is considered as
strong password.
3.2 Mean Pupil Diameter Change Calculation
We analyze the average pupil diameter and the commonly used
mean pupil diameter change (MPDC) as a cognitive load metric
[
2
,
24
]. The MPDC calculation can be found in Equation 1 where
MPD
𝑝
represents mean pupil diameter for a specic password and
MPD
𝑎
represents mean pupil diameter for the participants while
entering all passwords and N is the number of overall passwords in
our case it is 12. The overall mean is subtracted from the password
mean in order to compare results between subjects with dierent
pupil sizes [
30
]. The MPDC has the advantage compared to MPD
as it corrects the uctuations in the baseline pupil diameter, and
compensates for any structural temporal trends that might exist.
Hence, the use of MPDC is appropriate as compared to other types
of measures such as dilation percentage, as pointed out by Beatty
et al. [
6
], “the pupillary dilation evoked by cognitive processing
is independent of baseline pupillary diameter over a wide range
of baseline values”. On the other hand, the MPDC allows us to
determine whether the baseline itself diered as a function of the
password strength.
𝑀𝑃𝐷𝐶 =
𝑁
Õ
𝑖=0
𝑀𝑃𝐷
𝑝
𝑀𝑃𝐷
𝑎
𝑁
(1)
4 EVALUATION
We conducted a user study in which we recorded the participants’
eye gaze data while creating weak and strong passwords on laptops.
4.1 Study Design
We applied a repeated-measures design, where all participants did
all conditions. Overall, participants were asked to create 12 pass-
words (6 weak and 6 strong). The order of which password they
should enter was counterbalanced using a Latin Square. Partici-
pants were advised not to reuse a password they already entered.
We collected the entered passwords, passwords ratings and gaze
data including pupil size as dependent variables. Passwords strength
(weak vs strong) acted as an independent variable and the screen
brightness, as well as the room light, was kept the same throughout
the whole experiment.
4.2 Participants and Apparatus
We invited 15 participants (5 males) to our lab by the university
mailing list. The age varied from 22 to 31 (
𝑀𝑒𝑎𝑛 =
24
.
27;
𝑆𝐷 =
2
.
91).
Participants came from dierent backgrounds (Computer science,
Engineering, Landscape Design), and dierent nationalities (Spain,
China, Bangladesh, Pakistan, Egypt, Germany). Participants had
from basic to average knowledge of eye-tracking and none of them
had glasses on.
As shown in Figure 1, our experimental setup consisted of a Tobii
Pro Glasses 2
1
with 120 fps running on Lenovo T440s
2
along with
the Tobii glasses controller
3
. We implemented a simple web page
interface where it shows the question and an empty eld to write
the password in.
4.3 Procedure
After arriving in the lab, participants were asked to sign a consent
form and received an explanation of the purpose of the study. After
that, we calibrated the eye tracker using Tobii’s one-point calibra-
tion
4
. We instructed the participants to change the keyboard style
to the one they are using and to change the language as well if
needed. We gave the participants the device and we asked them to
create and enter a set of passwords (6 weak and 6 strong) one at a
time in a randomized order. Participants were requested to enter
passwords more than 8 characters but we did not give any hints
on how to create strong password neither requested any require-
ments. After each password, we asked the participants to rate the
password strength on a Likert-scale from 1 to 5 (very weak to very
strong). At the end of the study, we asked the participants "What
makes a strong password?" to understand whether they know the
basic password policies. Overall the study lasted approximately 10
minutes and participants were rewarded with 5 EUR.
5 RESULTS
5.1 Data Cleaning and Reprocessing
In order to start analyzing the collected pupil size, we rst removed
the missing data. Then, we averaged both left and right eye pupil
size to one value. After that, we plotted the data to check for outliers.
The data of two participants were considered outliers due to exces-
sive talking and asking questions during the study which highly
aects the cognitive load [
36
]. Therefore, the following analysis is
done only on 13 participants.
5.2 Rated Password Strength
To get a better idea of how our participants perceived their pass-
words’ strength, we compared their rated password strength to the
zxcvbn meter password strength. Figure 2, shows the average rating
for all the passwords entered per participant against the results
from the zxcvbn meter. As seen, there is a variance between the
passwords ratings, however, the dierence between users rating
and zxcvbn meter rating is not statistically signicant (
𝜒
2
(1) = 3.769,
𝑃 = .
0521) as found by Friedman test. We also compared the entropy
of the weak and strong passwords calculated by the zxcvbn meter
and we found a signicant dierence between the entropy for the
weak (
𝑀 =
14
.
45;
𝑆𝐷 =
3
.
59) and the strong passwords
(𝑀 =
60
.
75;
𝑆𝐷 =
9
.
21), (
𝐹
1,14
=
268
.
760,
𝑃 < .
001) which assures that the
entered passwords are valid to be used for further analysis [
13
]
and that participants’ perception of weak and strong passwords
matches the password meter rating.
1
Tobii Pro Glasseshttps://www.tobiipro.com/product-listing/tobii-pro-glasses-2/
2
Lenovo T440shttps://www.lenovo.com/gb/en/laptops/thinkpad/t-series/t440s/
3
Tobii Glasses Controllerhttps://www.tobiipro.com/learn-and-support/learn/steps-in-
an-eye-tracking-study/setup/installing-tobii-glasses-controller/
4
One Point Calibration: https://www.tobiipro.com/learn-and-support/learn/steps-in-
an-eye-tracking-study/run/running-a-monocular-calibration-with-the-Tobii-pro-
spectrum/

CHI ’21 Extended Abstracts, May 8–13, 2021, Yokohama, Japan Abdrabou et al.
Figure 1: Experiment study setup consisting of a laptop and
a wearable eye tracker Top Left: gaze monitoring while cre-
ating passwords viewed from Tobii pro glasses controller.
Figure 2: Password strength comparison between participants’
rating and the zxcvbn password meter rating. Showing similar
ratings between the zxcvbn meter and users ratings
Figure 3: (Left) shows the MPD across the 13 participants. (Right) shows the MPD per created password
5.3 Post Study Question Analysis
At the end of the study, we asked the participants what makes a
strong password. Special characters came in the rst place (22%),
then adding numbers (18%) and upper/lower cases (18%), nally,
increasing the length (14%), adding numbers (14%) and adding ran-
dom characters (14%). While metrics like password length have a
stronger positive impact on security than special characters [
25
],
the responses still show that participants knew what makes pass-
words stronger.
5.4 Pupil Diameter and Password Strength
Figure 3 left, shows the MPD across the 13 participants. As seen
in the gure, the MPD dilates when creating strong passwords
than weak passwords expect for participant 7 and 11. Repeated
measures ANOVA showed statistical signicant dierence between
the MPD for weak (
𝑀 =
3
.
47,
𝑆𝐷 = .
4) and strong passwords
(
𝑀 =
3
.
60,
𝑆𝐷 = .
41), (
𝐹
1,12
=
29
.
497,
𝑃 < .
001). This means that
the password strength has a statistically signicant eect on the
MPD. Furthermore, We also looked into the MPD dierence while
creating strong and weak passwords for all participants(see Table
1) and we found that the mean dierence is (
𝑀 = .
14,
𝑆𝐷 = .
09)
and the smallest dierence is
𝑀 =
0
.
03
𝑚𝑚
. Which means that
even when we cannot draw a threshold due to dierent pupil size
response across participants, the dierence still exists indicating
that strong passwords induce higher cognitive load.
Looking at the MPD per created password, we can see in Figure
3 right, that for all 6 passwords participants had wider pupil diame-
ter which can indicate higher cognitive load while creating strong

Citations
More filters
Proceedings ArticleDOI

”Your Eyes Tell You Have Used This Password Before”: Identifying Password Reuse from Gaze and Keystroke Dynamics

TL;DR: Using gaze, password reuse can already be detected during the registration process, before users entered their password, and this work paves the road for developing novel interventions to prevent password reuse.
Journal ArticleDOI

Pupil dilation as cognitive load measure in instructional videos on complex chemical representations

TL;DR: In this paper , a secondary analysis of an earlier eye-tracking experiment investigated how triangulating changes in pupil dilation with student-self reports can be used as a measure of cognitive load during instructional videos with complex chemical representations.
Proceedings ArticleDOI

A temporally quantized distribution of pupil diameters as a new feature for cognitive load classification

TL;DR: In this article , a new feature that can be used to classify cognitive load based on pupil information is presented, which consists of a temporal segmentation of the eye tracking recordings, for each segment of the temporal partition, a probability distribution of pupil size is computed and stored.
Journal ArticleDOI

Reviewing the Usability of Web Authentication Procedures: Comparing the Current Procedures of 20 Websites

TL;DR: In this article , the status of the sign-up, sign-in, and password recovery processes on 20 websites was compared using the "think-aloud" technique while recording the screen to ensure accurate data analysis.
References
More filters
Book ChapterDOI

Development of NASA-TLX (Task Load Index): Results of Empirical and Theoretical Research

TL;DR: In this article, the results of a multi-year research program to identify the factors associated with variations in subjective workload within and between different types of tasks are reviewed, including task-, behavior-, and subject-related correlates of subjective workload experiences.
Journal ArticleDOI

Pupil Size in Relation to Mental Activity during Simple Problem-Solving

TL;DR: Changes in pupil size during the solving of simple multiplication problems can be used as a direct measure of mental activity and shows that mental activity is closely correlated with problem difficulty, and that the size of the pupil increases with the difficulty of the problem.
Journal ArticleDOI

The efficiency of instructional conditions: An approach to combine mental-effort and performance measures

TL;DR: It is concluded that the method for calculating and representing relative condition efficiency discussed here can be a valuable addition to research on the training and performance of complex cognitive tasks.
Related Papers (5)
Frequently Asked Questions (10)
Q1. What have the authors contributed in "Think harder! investigating the effect of password strength on cognitive load during password creation" ?

This paper investigates the relation between password creation and cognitive load inferred from eye pupil diameter. To assess how creating passwords of different strength ( namely weak and strong ) influences users ’ cognitive load, the authors conducted a lab study ( N = 15 ). The authors asked the participants to create and enter 6 weak and 6 strong passwords. Their initial investigation shows the potential for new applications in the field of cognition-aware user interfaces. 

For future work, it is valuable to investigate the effect of reusing passwords and whether it complies to their findings or not. The authors will also investigate how would their approach distinguish between a low cognitive load due to a weak password and a low cognitive load due to the user adopting a password strategy. 

Since the authors found that password strength is reflected in pupil diameter response, pupil diameter can be integrated in interfaces to assess password strength without revealing the actual password to the system. 

strict policies can frustrate users, reduce their productivity, and lead users to write their passwords down [1, 18, 35]. 

One way to improve validity is to strictly control the luminance of the experimental stimuli, but this limits the potential of pupillometry. 

It also has a usability advantage: if the authors are able to determine password strength through the user’s cognitive load (e.g., as estimated via an eye tracker), then users can consciously learn about their password’s strength, even if the used interface does not measure the password’s strength. 

The authors used a cut off score of 2.5 for differentiating between weak and strong passwords where from 1 to 2.5 is considered as weak password and from more than 2.5 to 5 is considered as strong password. 

Researchers found that password meters design, color and feedback messages have an influence on the strength of the created passwords [12, 13, 34, 39]. 

As shown in Figure 1, their experimental setup consisted of a Tobii Pro Glasses 21 with 120 fps running on Lenovo T440s2 along with the Tobii glasses controller3. 

While metrics like password length have a stronger positive impact on security than special characters [25], the responses still show that participants knew what makes passwords stronger.