How can the authors use pupil diameter to assess password strength?

Since the authors found that password strength is reflected in pupil diameter response, pupil diameter can be integrated in interfaces to assess password strength without revealing the actual password to the system.

What factors influence the strength of passwords?

Researchers found that password meters design, color and feedback messages have an influence on the strength of the created passwords [12, 13, 34, 39].

What was the setup of the experiment?

As shown in Figure 1, their experimental setup consisted of a Tobii Pro Glasses 21 with 120 fps running on Lenovo T440s2 along with the Tobii glasses controller3.

What is the password for the study?

While metrics like password length have a stronger positive impact on security than special characters [25], the responses still show that participants knew what makes passwords stronger.

(Open Access) Think Harder! Investigating the Effect of Password Strength on Cognitive Load during Password Creation (2021) | Yasmeen Abdrabou

Q: What future works have the authors mentioned in the paper "Think harder! investigating the effect of password strength on cognitive load during password creation" ?

For future work, it is valuable to investigate the effect of reusing passwords and whether it complies to their findings or not. The authors will also investigate how would their approach distinguish between a low cognitive load due to a weak password and a low cognitive load due to the user adopting a password strategy.

Q: What is the advantage of using a password meter?

It also has a usability advantage: if the authors are able to determine password strength through the user’s cognitive load (e.g., as estimated via an eye tracker), then users can consciously learn about their password’s strength, even if the used interface does not measure the password’s strength.

Q: How did the authors compare the differentiating between weak and strong passwords?

The authors used a cut off score of 2.5 for differentiating between weak and strong passwords where from 1 to 2.5 is considered as weak password and from more than 2.5 to 5 is considered as strong password.

Abdrabou, Y., Abdelrahman, Y., Khamis, M. and Alt, F. (2021) Think Harder!

Investigating the Effect of Password Strength on Cognitive Load during Password

Creation. In: 2021 ACM CHI Virtual Conference on Human Factors in Computing

Systems, 08-13 May 2021, p. 259. ISBN 9781450380959.

There may be differences between this version and the published version. You are

advised to consult the publisher’s version if you wish to cite from it.

work. It is posted here for your personal use. Not for redistribution. The definitive

Version of Record was published in 2021 ACM CHI Virtual Conference on Human

Factors in Computing Systems, 08-13 May 2021, p. 259. ISBN 9781450380959.

http://dx.doi.org/10.1145/3411763.3451636.

http://eprints.gla.ac.uk/236283/

Deposited on: 10 March 2021

Enlighten – Research publications by members of the University of Glasgow

http://eprints.gla.ac.uk

Think Harder! Investigating the Eect of Password Strength on

Cognitive Load during Password Creation

Yasmeen Abdrabou

yasmeen.essam@unibw.de

Bundeswehr University Munich

Germany

Yomna Abdelrahman

yomna.abdelrahman@unibw.de

Bundeswehr University Munich

Germany

Mohamed Khamis

mohamed.khamis@glasgow.ac.uk

University of Glasgow

Glasgow, United Kingdom

Florian Alt

orian.alt@unibw.de

Bundeswehr University Munich

Germany

ABSTRACT

Strict password policies can frustrate users, reduce their produc-

tivity, and lead them to write their passwords down. This paper

investigates the relation between password creation and cogni-

tive load inferred from eye pupil diameter. We use a wearable eye

tracker to monitor the user’s pupil size while creating passwords

with dierent strengths. To assess how creating passwords of dier-

ent strength (namely weak and strong) inuences users’ cognitive

load, we conducted a lab study (

𝑁 =

15). We asked the participants

to create and enter 6 weak and 6 strong passwords. The results

showed that passwords with dierent strengths aect the pupil

diameter, thereby giving an indication of the user’s cognitive state.

Our initial investigation shows the potential for new applications

in the eld of cognition-aware user interfaces. For example, future

systems can use our results to determine whether the user created

a strong password based on their gaze behavior, without the need

to reveal the characteristics of the password.

CCS CONCEPTS

• Human-centered computing → Human computer interac-

tion (HCI)

;

• Security and privacy → Human and societal as-

pects of security and privacy.

KEYWORDS

Eye Tracking, Cognitive Load, Pupillometry, Cognition-Aware User

Interfaces, Passwords Strength

ACM Reference Format:

Yasmeen Abdrabou, Yomna Abdelrahman, Mohamed Khamis, and Florian

Alt. 2021. Think Harder! Investigating the Eect of Password Strength

on Cognitive Load during Password Creation. In CHI Conference on Hu-

man Factors in Computing Systems Extended Abstracts (CHI ’21 Extended

Abstracts), May 8–13, 2021, Yokohama, Japan. ACM, New York, NY, USA,

7 pages. https://doi.org/10.1145/3411763.3451636

CHI ’21 Extended Abstracts, May 8–13, 2021, Yokohama, Japan

This is the author’s version of the work. It is posted here for your personal use. Not for

redistribution. The denitive Version of Record was published in CHI Conference on

Human Factors in Computing Systems Extended Abstracts (CHI ’21 Extended Abstracts),

May 8–13, 2021, Yokohama, Japan, https://doi.org/10.1145/3411763.3451636.

1 INTRODUCTION

Passwords are the most popular authentication mechanism [

Ideally, the password selection process is achieved by complying to

strong password heuristics and nding the best match between an

easy to remember password that is at the same time hard to guess

[

]. Weak passwords that can be cracked might cause unautho-

rized access to an organization’s information assets. Thus, many

organizations enforce password change in frequent intervals to

address passwords leakage [

]. At the same time, research showed

that strict password policies decrease employees’ productivity [

]

and can even result in less security as employees work around rules

to easily remember their passwords [40].

Password meters are used in many interfaces to help users create

strong and secure passwords [

]. Ur et al. [

] found that partici-

pants had misconceptions about the impact of basing passwords

on common phrases and including digits and keyboard patterns in

their passwords. However, they also found that in most cases, users’

perceptions of what characteristics make a strong secure password

were consistent with password meter tools. The fact that users’

perceptions of what characteristics make a strong password are

accurate, motivated us to explore whether systems can learn about

the strength of created passwords through the users rather than

by examining the passwords themselves. Doing so has a security

advantage: no third party applications would need to examine the

created password to evaluate its strength. It also has a usability

advantage: if we are able to determine password strength through

the user’s cognitive load (e.g., as estimated via an eye tracker), then

users can consciously learn about their password’s strength, even

if the used interface does not measure the password’s strength.

In this work, we contribute an investigation of the relationship

between perceived password strength and cognitive load and how it

aects the pupil diameter. We use a wearable eye tracker to monitor

users’ pupil size while creating passwords with dierent strengths.

We found that the pupil dilates while creating strong passwords

and contracts while creating weak passwords. To the best of our

knowledge, we are the rst to investigate the relation between

password strength and cognitive load. Unlike password strength

meters that estimate the password strength based on the password

characters, our work allows systems to determine the perceived

strength of a password without revealing its characteristics. Our

ndings allow for new applications in the eld of cognition-aware

CHI ’21 Extended Abstracts, May 8–13, 2021, Yokohama, Japan Abdrabou et al.

interfaces, for example, suggesting verbal, visual or spatial cues to

help the user creating unique, memorable passwords [3].

2 RELATED WORK

Our work builds on prior research on utilizing eye tracking for

cognitive load state estimation and password strength.

2.1 Pupillometry and Cognitive Load

Three types of cognitive load measures were introduced in liter-

ature: subjective, physiological and performance measures [

Subjective measures reect the user’s subjective assessment of cog-

nitive load. The NASA-TLX questionnaire [

] is a frequently used

assessment tool for subjective cognitive load. However, such a tool

cannot account for rapid changes in the cognitive load that may

be the result of changes in the experiment. Physiological measures

include pupil dilation, heart-rate variability, and galvanic skin re-

sponse [

]. Changes in these measures have been shown to

correlate with dierent levels of cognitive load [

]. However,

physiological measures depend on many factors, including other

aspects of the user’s cognitive state such as anxiety [

], arousal [

the user’s physical activity [

], and environmental variables such

as light [

]. Hence, researchers should draw attention to the study

conditions and user’s state. Finally, performance measures captures

how eciently is the user performing a given task. The method is

based on the standardization of raw scores for mental eort and

task performance to z scores, which are displayed in a cross of axes

[

]. In our work, we use the second measure "physiologically"

as it is captured without requiring participants to reect on their

performance during password creation nor ll a questionnaire.

In the last decades, researchers have investigated the pupillary

response for dierent types of tasks [

]. Pupil dilation was

found to be higher for more challenging tasks [

]. Not only task

demands have been found to inuence the pupil diameter, but also

factors like anxiety [

], stress [

], and fatigue [

]. A study done by

Just and Carpenter [

], showcased that pupil responses can be an

indicator of the eort to understand and process information. They

conducted an experiment where participants were given two sen-

tences of dierent complexities to read while they would measure

their pupil diameters. They found that the pupillary dilation was

larger while readers processed the sentence that was complicated

and more subtle while reading the simpler one. It was also shown

that pupil size correlates to the diculty of a cognitive task [

Over the years, researchers have encountered some challenges in

pupillometry such as luminance. One way to improve validity is

to strictly control the luminance of the experimental stimuli, but

this limits the potential of pupillometry. While cognitive load can

be aected by a large number of factors, pupillometry oers a re-

sponsive signal that can potentially provide approximate real-time

feedback of the users’ arousal and potentially their cognitive load.

We expect that creating stronger passwords is more dicult and

thus cognitively demanding. This motivated us to study the relation

between cognitive load and password creation.

2.2 Password Strength

Passwords are the most popular authentication mechanism [

There are dierent types of attacks that passwords might be vulner-

able to e.g., brute force and guessing attacks [

]. Hence, system

administrators started employing password-composition policies

to eliminate attacks [

]. To help users create strong passwords,

password meters are integrated to interfaces to give users an esti-

mate of how strong their passwords are and hence, how easy it is

to be cracked [

]. Researchers found that password meters design,

color and feedback messages have an inuence on the strength of

the created passwords [

]. Although prior work has

shown that password-composition policies requiring more charac-

ters or more character classes can improve resistance to automated

guessing attacks, many passwords that meet common policies re-

main vulnerable [

]. Furthermore, strict policies can frustrate

users, reduce their productivity, and lead users to write their pass-

words down [1, 18, 35].

Ur et al. [

] found that users are aware of what makes a pass-

word strong. This suggests that putting more eort in creating a

password might be an indication that it is a strong one. This mo-

tivated us to study the relation between password strength and

cognitive load during password creation. If such a connection exists,

future systems can then determine the strength of a password based

on the user’s cognitive load, alleviating the need for systems to

access the password characteristics.

Hence, the need to study the relation between creating passwords

and cognitive load is a must. Therefore, in this paper, we introduce

using pupillometry to detect users’ cognitive load while creating

weak and strong passwords.

3 CONCEPT AND METHODOLOGY

In this section, we describe our concept and approach of evaluating

cognitive load from pupil diameter. Since the relation between

pupil diameter and cognitive load has already been proven (see

subsection 2.1). In this work, we look at how the users’ cognitive

load changes during weak and strong passwords creation (

Bafna et al. [

] showed that there is increase in cognitive load when

participants were asked to memorize and type dicult vs easy

sentences. Inspired by them, we hypothesize that creating strong

passwords will induce higher cognitive load compared to creating

weak passwords.

For this we ran a lab study to answer our research question. In

the following, we highlight how we analyzed the collected data.

First, we analyzed the collected passwords’ strength against the

zxcvbn password meter [

] to see if participants’ rating matches

the system rating. Second, we extracted the pupil diameter variance

between weak and strong passwords and tested their statistical

signicance. Third, we calculated the mean pupil diameter change

(MPDC) as a mean to calculate the cognitive load while creating

passwords of dierent strengths.

3.1 Password Strength Meter

We analyzed and compared user rated password strength against

the zxcvbn password strength meter [

] (details in Section 5.2).

In addition, we statistically analyzed the rated weak and strong

Investigating the Eect of Password Strength on Cognitive Load CHI ’21 Extended Abstracts, May 8–13, 2021, Yokohama, Japan

passwords strength using repeated measures ANOVA and the gen-

erated entropy for weak and strong passwords by the zxcvbn meter.

Finally, we further analyzed the post-study questions and reported

their results. We used a cut o score of 2.5 for dierentiating be-

tween weak and strong passwords where from 1 to 2.5 is considered

as weak password and from more than 2.5 to 5 is considered as

strong password.

3.2 Mean Pupil Diameter Change Calculation

We analyze the average pupil diameter and the commonly used

mean pupil diameter change (MPDC) as a cognitive load metric

[

]. The MPDC calculation can be found in Equation 1 where

MPD

𝑝

represents mean pupil diameter for a specic password and

MPD

𝑎

represents mean pupil diameter for the participants while

entering all passwords and N is the number of overall passwords in

our case it is 12. The overall mean is subtracted from the password

mean in order to compare results between subjects with dierent

pupil sizes [

]. The MPDC has the advantage compared to MPD

as it corrects the uctuations in the baseline pupil diameter, and

compensates for any structural temporal trends that might exist.

Hence, the use of MPDC is appropriate as compared to other types

of measures such as dilation percentage, as pointed out by Beatty

et al. [

], “the pupillary dilation evoked by cognitive processing

is independent of baseline pupillary diameter over a wide range

of baseline values”. On the other hand, the MPDC allows us to

determine whether the baseline itself diered as a function of the

password strength.

𝑀𝑃𝐷𝐶 =

𝑁

𝑖=0

𝑀𝑃𝐷

𝑝

− 𝑀𝑃𝐷

𝑎

𝑁

(1)

4 EVALUATION

We conducted a user study in which we recorded the participants’

eye gaze data while creating weak and strong passwords on laptops.

4.1 Study Design

We applied a repeated-measures design, where all participants did

all conditions. Overall, participants were asked to create 12 pass-

words (6 weak and 6 strong). The order of which password they

should enter was counterbalanced using a Latin Square. Partici-

pants were advised not to reuse a password they already entered.

We collected the entered passwords, passwords ratings and gaze

data including pupil size as dependent variables. Passwords strength

(weak vs strong) acted as an independent variable and the screen

brightness, as well as the room light, was kept the same throughout

the whole experiment.

4.2 Participants and Apparatus

We invited 15 participants (5 males) to our lab by the university

mailing list. The age varied from 22 to 31 (

𝑀𝑒𝑎𝑛 =

27;

𝑆𝐷 =

91).

Participants came from dierent backgrounds (Computer science,

Engineering, Landscape Design), and dierent nationalities (Spain,

China, Bangladesh, Pakistan, Egypt, Germany). Participants had

from basic to average knowledge of eye-tracking and none of them

had glasses on.

As shown in Figure 1, our experimental setup consisted of a Tobii

Pro Glasses 2

with 120 fps running on Lenovo T440s

along with

the Tobii glasses controller

. We implemented a simple web page

interface where it shows the question and an empty eld to write

the password in.

4.3 Procedure

After arriving in the lab, participants were asked to sign a consent

form and received an explanation of the purpose of the study. After

that, we calibrated the eye tracker using Tobii’s one-point calibra-

tion

. We instructed the participants to change the keyboard style

to the one they are using and to change the language as well if

needed. We gave the participants the device and we asked them to

create and enter a set of passwords (6 weak and 6 strong) one at a

time in a randomized order. Participants were requested to enter

passwords more than 8 characters but we did not give any hints

on how to create strong password neither requested any require-

ments. After each password, we asked the participants to rate the

password strength on a Likert-scale from 1 to 5 (very weak to very

strong). At the end of the study, we asked the participants "What

makes a strong password?" to understand whether they know the

basic password policies. Overall the study lasted approximately 10

minutes and participants were rewarded with 5 EUR.

5 RESULTS

5.1 Data Cleaning and Reprocessing

In order to start analyzing the collected pupil size, we rst removed

the missing data. Then, we averaged both left and right eye pupil

size to one value. After that, we plotted the data to check for outliers.

The data of two participants were considered outliers due to exces-

sive talking and asking questions during the study which highly

aects the cognitive load [

]. Therefore, the following analysis is

done only on 13 participants.

5.2 Rated Password Strength

To get a better idea of how our participants perceived their pass-

words’ strength, we compared their rated password strength to the

zxcvbn meter password strength. Figure 2, shows the average rating

for all the passwords entered per participant against the results

from the zxcvbn meter. As seen, there is a variance between the

passwords ratings, however, the dierence between users rating

and zxcvbn meter rating is not statistically signicant (

𝜒

(1) = 3.769,

𝑃 = .

0521) as found by Friedman test. We also compared the entropy

of the weak and strong passwords calculated by the zxcvbn meter

and we found a signicant dierence between the entropy for the

weak (

𝑀 =

45;

𝑆𝐷 =

59) and the strong passwords

(𝑀 =

75;

𝑆𝐷 =

21), (

𝐹

1,14

268

760,

𝑃 < .

001) which assures that the

entered passwords are valid to be used for further analysis [

]

and that participants’ perception of weak and strong passwords

matches the password meter rating.

Tobii Pro Glasseshttps://www.tobiipro.com/product-listing/tobii-pro-glasses-2/

Lenovo T440shttps://www.lenovo.com/gb/en/laptops/thinkpad/t-series/t440s/

Tobii Glasses Controllerhttps://www.tobiipro.com/learn-and-support/learn/steps-in-

an-eye-tracking-study/setup/installing-tobii-glasses-controller/

One Point Calibration: https://www.tobiipro.com/learn-and-support/learn/steps-in-

an-eye-tracking-study/run/running-a-monocular-calibration-with-the-Tobii-pro-

spectrum/

CHI ’21 Extended Abstracts, May 8–13, 2021, Yokohama, Japan Abdrabou et al.

Figure 1: Experiment study setup consisting of a laptop and

a wearable eye tracker Top Left: gaze monitoring while cre-

ating passwords viewed from Tobii pro glasses controller.

Figure 2: Password strength comparison between participants’

rating and the zxcvbn password meter rating. Showing similar

ratings between the zxcvbn meter and users ratings

Figure 3: (Left) shows the MPD across the 13 participants. (Right) shows the MPD per created password

5.3 Post Study Question Analysis

At the end of the study, we asked the participants what makes a

strong password. Special characters came in the rst place (22%),

then adding numbers (18%) and upper/lower cases (18%), nally,

increasing the length (14%), adding numbers (14%) and adding ran-

dom characters (14%). While metrics like password length have a

stronger positive impact on security than special characters [

the responses still show that participants knew what makes pass-

words stronger.

5.4 Pupil Diameter and Password Strength

Figure 3 left, shows the MPD across the 13 participants. As seen

in the gure, the MPD dilates when creating strong passwords

than weak passwords expect for participant 7 and 11. Repeated

measures ANOVA showed statistical signicant dierence between

the MPD for weak (

𝑀 =

47,

𝑆𝐷 = .

4) and strong passwords

(

𝑀 =

60,

𝑆𝐷 = .

41), (

𝐹

1,12

497,

𝑃 < .

001). This means that

the password strength has a statistically signicant eect on the

MPD. Furthermore, We also looked into the MPD dierence while

creating strong and weak passwords for all participants(see Table

1) and we found that the mean dierence is (

𝑀 = .

14,

𝑆𝐷 = .

09)

and the smallest dierence is

𝑀 =

𝑚𝑚

. Which means that

even when we cannot draw a threshold due to dierent pupil size

response across participants, the dierence still exists indicating

that strong passwords induce higher cognitive load.

Looking at the MPD per created password, we can see in Figure

3 right, that for all 6 passwords participants had wider pupil diame-

ter which can indicate higher cognitive load while creating strong

Think Harder! Investigating the Effect of Password Strength on Cognitive Load during Password Creation

Figures

Citations

”Your Eyes Tell You Have Used This Password Before”: Identifying Password Reuse from Gaze and Keystroke Dynamics

Pupil dilation as cognitive load measure in instructional videos on complex chemical representations

A temporally quantized distribution of pupil diameters as a new feature for cognitive load classification

Towards Practical Personalized Security Nudge Schemes: Investigating the Moderation Effects of Behavioral Features on Nudge Effects

Reviewing the Usability of Web Authentication Procedures: Comparing the Current Procedures of 20 Websites

References

Development of NASA-TLX (Task Load Index): Results of Empirical and Theoretical Research

Pupil Size in Relation to Mental Activity during Simple Problem-Solving

The efficiency of instructional conditions: An approach to combine mental-effort and performance measures

The pupillary system.

Workload assessment methodology.

Related Papers (5)

Of passwords and people: measuring the effect of password-composition policies

User practice in password security

Optiwords: A new password policy for creating memorable and strong passwords

Let's Go in for a Closer Look: Observing Passwords in Their Natural Habitat

Do Users' Perceptions of Password Security Match Reality?

Frequently Asked Questions (10)

Q1. What have the authors contributed in "Think harder! investigating the effect of password strength on cognitive load during password creation" ?

Q2. What future works have the authors mentioned in the paper "Think harder! investigating the effect of password strength on cognitive load during password creation" ?

Q3. How can the authors use pupil diameter to assess password strength?

Q4. What can be the effect of strict password policies on users’ productivity?

Q5. What is the way to improve validity of pupillometry?

Q6. What is the advantage of using a password meter?

Q7. How did the authors compare the differentiating between weak and strong passwords?

Q8. What factors influence the strength of passwords?

Q9. What was the setup of the experiment?

Q10. What is the password for the study?