What are the contributions mentioned in the paper "Survivability model for security and dependability analysis of a vulnerable critical system" ?

This paper aims to analyze transient security and dependability of a vulnerable critical system, under vulnerability-related attack and two reactive defense strategies, from a severe vulnerability announcement until the vulnerability is fully removed from the system. The authors propose a Markov chain-based survivability model for capturing the vulnerable critical system behaviors during the vulnerability elimination process. The proposed model and metrics not only enable us to quantitatively assess the system survivability in terms of security risk and dependability, but also provide insights on the system investment decision. By severe, the authors mean that the vulnerability-based malware could cause significant damage to the infected system in terms of security and dependability while infecting more and more new vulnerable computer systems.

What is the priority of the token?

When there is a token respectively in startisolateP and vulP ( repairP , exploitP , infectP , lmovP , or efilP ), the immediate transition1ct ( 2ct , 3ct , 4ct , 5ct , or 6ct ) fires.

What is the time t that isolateT fires?

When isolateT fires, one token is taken from preisolateP and one token is put in startisolateP , representing that the isolation strategy is ready for deployment.

What is the first step in the process of infecting a new host?

a token is taken fromstartisolateP and vulP ( repairP , exploitP , infectP , lmovP , or efilP ) , and deposited in place finishisolateP .

What is the time t of the token?

When a software vulnerability is identified, one token is removed from vu lfoundP with rate δ and put in _vul sP , vulP ,prepareP , and preisolateP each.

What is the funding for the research of Ricardo Rodrguez?

The research of Ricardo J. Rodríguez is supported in part by Spanish Ministry of Economy, Industry and Competitiveness project CyCriSec (grant number TIN201458457-R).

What is the average number of new infected hosts?

Mean number of new infected hosts at time t . • Metric 3m . Mean accumulated security loss of the localsystem in the interval [0, t ]. • Metric 4m . Mean accumulated number of the new infectedhosts in the interval [0, t ]. • Metric 5m . Mean accumulated cost in the interval [0, t ]. • Metric 6m . Mean accumulated revenue in the interval [0, t]. • Metric 7m . Mean accumulated profit in the interval [0, t ].

How many places are affected by the lmovP?

4m * Trans Infectmc + the sum of mean accu-mulated loss of each place in the interval [0, t ]. • 6m : the sum of mean accumulated reward of each placein the interval [0, t ]. • 7 6 5m m m= − .IV.

(Open Access) Survivability Model for Security and Dependability Analysis of a Vulnerable Critical System (2018) | Xiaolin Chang

Q: How long does it take for malware to launch attack?

The authors observe that with the increasing mean days for the isolation implementation, there ismore mean sojourn time for malware to launch attack to local system.

Survivability Model for Security and Dependability

Analysis of a Vulnerable Critical System

Xiaolin Chang

, Shaohua Lv

, Ricardo J. Rodríguez

, Kishor Trivedi

Beijing Key Laboratory of Security and Privacy in Intelligent Transportation, Beijing Jiaotong University, P. R. China

Centro Universitario de la Defensa, Academia General Militar, Zaragoza, Spain

Department of Electrical and Computer Engineering, Duke University, USA

Email: {xlchang, 16120401}@bjtu.edu.cn,

rjrodriguez@unizar.es

, ktrivedi@duke.edu

Abstract— This paper aims to analyze transient security

and dependability of a vulnerable critical system, under

vulnerability-related attack and two reactive defense

strategies, from a severe vulnerability announcement until

the vulnerability is fully removed from the system. By severe,

we mean that the vulnerability-based malware could cause

significant damage to the infected system in terms of

security and dependability while infecting more and more

new vulnerable computer systems. We propose a Markov

chain-based survivability model for capturing the

vulnerable critical system behaviors during the

vulnerability elimination process. A high-level formalism

based on Stochastic Reward Nets is applied to automatically

generate and solve the survivability model. Survivability

metrics are defined to quantify system attributes. The

proposed model and metrics not only enable us to

quantitatively assess the system survivability in terms of

security risk and dependability, but also provide insights on

the system investment decision. Numerical experiments are

constructed to study the impact of key parameters on system

security, dependability and profit.

Keywords— Reactive defense strategy; Quantitative analysis;

Stochastic Reward Nets; Survivability; Security

I. INTRODUCTION

A software vulnerability is a defect in software which can be

exploited by attackers/malwares (malicious software) to

compromise a system for their benefits. A typical malware goes

through the following four phases: (1) gaining access to a

targeted system by means of the declaimed vulnerabilities, (2)

trying various methods to make itself persistent into the system,

(3) looking for data of interest to be stolen or modified, and (4)

damaging security by modifying unauthorized data, exfiltrating

sensitive data, or infecting new vulnerable computer systems

(denoted as host in the following). In addition, the various

activities undertaken during the attack may crash the system and

then degrade system dependability. The security and/or

dependability damage may lead to loss of customer confidence

and lead to the other possible long-term consequences due to loss

and theft of information.

In this paper, we assess the survivability of a vulnerable

critical system from a severe vulnerability announcement until

the vulnerability is fully removed from the system. We define

survivability as a transient measure of a system’s capability in

withstanding vulnerability-related malicious attacks and

executing pre-specified mission even when parts of the system

are damaged. By severe, we mean the vulnerability-based

malware could cause significant damage to the infected system

in terms of security and dependability while infecting new

vulnerable computer systems.

We develop a survivability model to capture the system

behaviors under reactive defense strategies and the actions

performed by an attacker to cause such an attack by exploiting

the vulnerability. All relevant event times are assumed to be

exponentially distributed and thus the model is a homogeneous

continuous time Markov chain (CTMC). In this paper, the

generation and solution of the proposed Markov model are

automated using a variant of stochastic Petri Nets called

Stochastic Reward Nets (SRNs), which could easily represent

common characteristics of computer systems such as

concurrency, synchronization, conditional branches, looping,

and sequencing.

This paper is close to the work in [1]. The key differences

from [1] are detailed in Section II. We summarize the major

contributions of this paper as follows:

(1) We investigate the scenario where two reactive defense

strategies are deployed to reduce or prevent the security

damage caused by malware. Moreover, we not only

investigate the attacking activity of affecting the local

system security, but also investigate the attacking

activity of infecting new hosts.

(2) We develop a survivability model by using Stochastic

Reward Nets so as to capture the system behaviors. To

the best of our knowledge, we are the first to apply

state-space analytic model to analyze the survivability

of such a vulnerable critical system. We also define

survivability metrics and propose the corresponding

calculating methods.

(3) Numerical experiments are constructed to study the

impact of key parameters on system security,

dependability and profit.

Fig. 1 Flowchart depicting events in a system under an attack, after a vulnerability is announced and during the implementation and deployment of the reac-

tive defense strategies

The paper is organized as follows. Section II presents related

work. Section III presents the system model and survivability

measures. In Section IV, we present evaluation results. The

conclusion is drawn in Section V.

II. R

ELATED WORK

Kinds of efforts have been made to advance the improvement

in the dependability and security of various infrastructure

systems, including communication network, transportation,

power and water distribution and so on. However, undesired

events still occur to those systems. For example, natural disasters,

security attacks and hardware/software failures. Timely and

quick recovery from the unexpected events is critical to

infrastructure systems. It is known that an undesirable event

occurrence may only degrade the system performance instead of

crashing the system.

In the scenario where there are multiple actions to be taken

for recovering the system, the recovery process could be

modeled as a single-phase recovery model or a multi-phase

recovery model. In the latter type, each recovery action or a set

of parallel actions are modelled as a phase. Phase input

determines the sequence of the phases. A multi-phase recovery

model could capture the fine-grained characteristics of the

restoration process [2][5], compared to a single-phase recovery

model.

Survivability, a transient measure, is defined to describe the

ability of the system to recover a predefined service in a timely

manner after the occurrence of undesired events [2]. Its

quantitative analysis could help improve the systems’ capability

in critical service provision when damage occurs to part of the

system or the whole system get damaged.

The tremendous increase in the number of vulnerabilities

discovered and disclosed and the severity of their damage have

prompted various research to the survivability modeling and

analysis in various fields and from different perspectives [1][6][7]

and the references therein. Recently, the authors in [1] carried

out a quantitative assessment of the system secure survivability.

There are three major differences between [1] and this paper:

(1) Only one mitigation strategy, namely the patch

implementation, is considered in [1]. This paper, besides

patch strategy, also considers the isolation strategy to

Vulnerability discovered

Infected

Failing

Infect successful?

Crashing

Lmoved &

Infect new hosts

Lmov Successful?

Exfiltrated &

Infect new hosts

Efil Successful?

Fixing

Patch ready?

Patching

Patch ready?

System fails?

Patch ready?

System fails?

Patch ready?

Good

Patch ready

Patch implementation

Exploit code implementation

System is vulnerable

Exploit ready?

Isolation ready?

Isolating system

Patch ready?

Isolation ready?

Isolating system

Patch ready?

Isolation ready?

Isolating system

Patch ready?

Isolation ready?

Isolating system

Patch ready?

Isolation ready?

Isolating system

Patch ready?

Isolation system

System

separate vulnerable part in the infected system, which

avoids vulnerability-related damage but may degrade

system dependability and performance.

(2) The security loss is quantified in terms of sojourn time in

[1]. This paper proposes a new calculation method by in

terms of the times of successfully stealing/modifying

sensitive information.

(3) The model proposed in this paper could capture the

activities of infecting new vulnerable hosts.

III. S

YSTEM DESCRIPTION AND MODEL

This section first overviews the system of interest in this

paper. Then a Stochastic Reward Net model for survivability

analysis of this system is presented.

A. System Description

We now describe the system considered in this paper, shown

in Fig.1. It could be regarded as an extended system of [1]. There

are nine system states: Vulnerable, Isolating, Patching, Fixing,

Failing, Crashing, Infected, Lmoved, and Exfiltrated. Isolating and

Patching denote two reactive defense strategies considered in

this paper. All the assumptions made in [1] are applied in this

paper, in order to highlight the differences of this paper from [1].

More assumptions are given in the following.

When a vulnerability is fully disclosed, the system is in the

Vulnerable state. Meanwhile, the attacker starts the exploit

implementation. In addition, the defender designs and deploys

the two reactive defense strategies. Thus, there are three

rectangles in the second row of Fig.1, denoted by Isolation

system, Patch implementation and Exploit code implementation,

respectively.

The shaded part denoted by System describes the system state

changes under the attack actions and the two reactive defense

strategies. After the isolating strategy is deployed, the attack

could not degrade the system security but the system

performance is degraded. When the patch is ready, it must be

deployed into the system immediately and the system is

recovered to a secure state. When the attacked system is in

Lmoved or Exfiltrated state, the malware also could infect new

vulnerable hosts which have not been infected before. System

may fail or crash due to attacker behaviors or software bugs, such

as Mandelbugs [8]. If the system crashes or fails, it must be fixed

immediately even the isolation or patching strategy is ready to

be deployed. In the fixing process, both the defender and the

attacker can do nothing to the system.

The metrics used to quantify survivability vary according to

the system and system attributes of interest. We assume that as

long as system service is provisioned, there is revenue. But

revenue decreases after the isolation strategy is deployed or the

malware enters into the vulnerable system. When system service

cannot be provisioned, there may be economic loss to the service

provider due to the pre-defined SLA (Service Level Agreement)

with customers. In addition, both each successful infecting of a

new vulnerable host and each successful stealing/modifying

sensitive information could result in some loss to the service

provider. We define profit equals total revenue minus total cost.

The metrics considered in this paper include:

• Metric

. Mean security loss of the local system at time

• Metric

. Mean number of new infected hosts at time

• Metric

. Mean accumulated security loss of the local

system in the interval [0,

• Metric

. Mean accumulated number of the new infected

hosts in the interval [0,

• Metric

. Mean accumulated cost in the interval [0,

• Metric

. Mean accumulated revenue in the interval [0,

• Metric

. Mean accumulated profit in the interval [0,

Note that although the definitions of some metrics are same

as in [3], the computation formulas are different. The first three

metrics are

transient

metrics that capture the state of the

system at time

after the occurrence of an undesired event.

The left metrics are c

umulative

metrics

which are expected

accumulated rewards in the interval (0

, t

]. Note that

survivability metrics are computed after the

announcement of a vulnerability. In the remainder of this

paper, time

refers to

the time immediately after a severe

vulnerability announcement and is measured in days.

B. Stochastic Reward Net Model

There are two major challenges for modeling the system:

(1) How to model two attack activities which occur

simultaneously. Namely, damaging the infected system

security and infecting new vulnerable hosts.

(2) How to model the priority of the patch-based defense

strategy over that of the isolation-based defense strategy

when both strategies are ready to be deployed.

Fig.4 describes an SRN model for the survivability analysis.

The shaded part is the extension to the model proposed in [1]. As

in [1], survivability focuses on capturing the evolution of the

system after an unexpected event occurs. Thus, the model in

Fig.2 does not include the vulnerability detection process.

TABLE I and II show the variable definitions and guard

definitions, respectively. The following focuses on the

explanation on the shaded part. The left part explanation is

referred to [1].

Fig. 2 Stochastic Reward Net model

When a software vulnerability is identified, one token is

removed from

vulfound

with rate

and put in

vul s

vul

repare

, and

reisolate

each. This means that system failure,

exploitation code implementation, patch implementation, and

the vulnerability-related service isolation implementation occur

parallelly. A token in place

reisolate

denotes that the isolation

strategy is under implementation. When

isolate

fires, one token

is taken from

reisolate

and one token is put in

tartisolate

representing that the isolation strategy is ready for deployment.

When there is a token respectively in

tartisolate

and

vul

(

repair

exploit

infect

lmov

, or

efil

), the immediate transition

(

, or

) fires. Then, a token is taken from

tartisolate

and

vul

(

repair

exploit

infect

lmov

, or

efil

) , and

deposited in place

finishisolate

represents that the

system is isolated from the malicious software. In this situation,

the system may fail or crash. As long as there is a token in

ready

the system enters into the state of deploying the patch.

The priority of

and

is set as

127c

tt t>>

with the

aim to achieve the following goals: whenever the patch strategy

is available, the patch must be deployed immediately; then are

the service isolation strategy and exploit code. Similarly, we set

the priority:

21c

tt>

33c

tt>

44c

tt>

55c

tt>

, and

66c

tt>

The activity of infecting a new vulnerable host is modeled by

infectm

infects

and

infectm

assures that

only when there is a token in

lmov

efil

, a new vulnerable host

may be infected. Before we define each metrics, some variables

are defined first. We define a reward/loss to each place in Fig.2

to represent the service revenue/loss at this place per day.

vul

Place

vul

denotes the unit revenue/loss at

vul

. The other places

have similar revenue and loss definitions.

Trans

lmov

and

Trans

infectm

are

defined to denote unit loss of throughput at

lmov

and

infectm

respectively. Now we use the SPNP software package [9] to

calculate the above metrics as follows:

•

: throughput of

lmov

at time

•

: throughput of

infectm

at time

•

: the expected accumulated rate of

infectm

in the in-

terval [0,

•

: the expected accumulated rate of

lmov

in the inter-

val [0,

•

Trans

lmov

Trans

nfectm

+ the sum of mean accu-

mulated loss of each place in the interval [0,

•

: the sum of mean accumulated reward of each place

in the interval [0,

•

765

mmm=−

IV. NUMERICAL ANALYSIS AND DISCUSSIONS

This section aims to evaluate the effectiveness of the

proposed model. We evaluate our model solutions obtained by

using SPNP software package [9] to solve the SRN model, in

terms of the metrics described in Section III.B. Parameter values

are set as in [1], also given in TABLE I.

We first investigate the effect of

repare

on security loss.

The other parameter values are fixed as in TABLE I. Fig.3-Fig.6

plot these results. P10, P12, P16, and P20 represent the results of

prepare

= 10 days, 12 days, 16 days, 20 days respectively.

We observe:

•

Fig.3 indicates that for each

prepare

the throughput

of damaging the local system security increases first and

then decreases. The increasing throughput is due to the in-

creasing probability that

lmov

has a token. But this in-

crease stops at some time. The decreasing throughput is

due to the increasing probability that the

isolation and/or

patch-based defense strategies are ready for deploy-

ment. Similar

explanation could be applied for the

changes in the throughput of infecting new hosts, shown

in Fig.5.

•

With the increasing mean days (

prepare

) for the patch

implementation, the probability that the

patch-based de-

fense strategy is ready for deployment increases

slowly. Therefore, more security damage is caused.

Fig.3 indicates

the throughputs of P20, P16, P12 and P10

at the same time instant are increasing.

•

With the increasing mean days (

prepare

) for the patch

implementation, much more local security damage is

caused and there are more number of new hosts to be in-

fected, shown in Fig.4 and Fig.6, respectively.

We also do experiments by fixing

prepare

=20 days and

varying

isolate

. Due to space limitation, we only present results

Times of successfully damaging local system security at time

in Fig.7. “i8” and “i16” represent the results of

isolate

= 8

days and 16 days, respectively. “i0” represents that there is no

isolation strategy deployment. We observe that with the

increasing mean days for the isolation implementation, there is

more mean sojourn time for malware to launch attack to local

system. Then more security damage is generated.

TABLE I. P

ARAMETER

EFINITION

Symbol

Definition Mean value

Mean time that the discovered vul-

nerability is known to all

30 mins

repare

Mean time for implementing a patch 20 days

deploy

Mean time for deploying the patch 12days

vuln

Mean time for generating the exploit

code by an attacker

4 days

ail

Mean time that the computer system

fails

365 days

Mean time that the computer system

completes the failure or crash fixing

2 days

exploit

Mean time for injecting the exploit

code into the system

7 days

inf

Mean time that the exploit code is

persistent

1 days

lmov

Mean time that the attacker finds sen-

sitive data of interest

7 days

efil

Mean time that the attacker obtains

the desired information

2 days

isolate

Mean time for shutting down those

services related to the detected vul-

nerability

8 days

infectm

Mean time that the attacker injects

the exploit code into another vulner-

able host

7 days

Probability that the exploit code

works in the system and is persistent,

respectively

0.9,0.9

Probability that the attacker finds its

target and the desired information,

respectively.

0.9,0.9

Probability that the attacker infects a

new host successfully.

0.9

TABLE II.

UARD

UNCTIONS FOR THE

SRN

ODEL

Guard Values

vul

if (#(

vul s

)==1) then 1 else 0

if (#(

vul

)==1) then 1 else 0

infectm

if (#(

lmov

)==1 ||#(

efil

)==1) then 1 else 0

V. C

ONCLUSIONS

This paper presents a CTMC model for survivability analysis

of a critical system under a severe vulnerability. Stochastic

Reward Nets was used to facilitate the generation and solution

of the Markov model. We defined survivability metrics in terms

of system dependability and security. In addition, numerical

results were presented to study the impact of the underlying

parameters on the system survivability. These results also

provided insights on investment efforts in various system

recovery strategies including reactive defense strategies.

Survivability Model for Security and Dependability Analysis of a Vulnerable Critical System

Figures

Citations

Transmission Early-Stopping Scheme for Anti-Jamming Over Delay-Sensitive IoT Applications

Numerical Evaluation of Job Finish Time Under MTD Environment

On the Security of Cyber-Physical Systems Against Stochastic Cyber-Attacks Models

Quantitative security analysis of a dynamic network system under lateral movement-based attacks

Optimization of Cyber System Survivability Under Attacks Using Redundancy of Components

References

SPNP: stochastic Petri net package

Fighting bugs: remove, retry, replicate, and rejuvenate

Quantification of system survivability

Modeling and Analysis of High Availability Techniques in a Virtualized System

Survivability analysis of a two-tier infrastructure-based wireless network

Related Papers (5)

A Vulnerability Life Cycle-Based Security Modeling and Evaluation Approach

Comparative analysis and patch optimization using the cyber security analytics framework

Integrated Survivability Assessment

A Vulnerability Model of Distributed Systems Based on Reliability Theory

Analysis of software vulnerability

Frequently Asked Questions (12)

Q1. What are the contributions mentioned in the paper "Survivability model for security and dependability analysis of a vulnerable critical system" ?

Q2. What is the priority of the token?

Q3. What is the time t that isolateT fires?

Q4. What is the shaded part of the system?

Q5. What is the first step in the process of infecting a new host?

Q6. What is the meaning of time t?

Q7. What is the time t of the token?

Q8. How long does it take for malware to launch attack?

Q9. What is the funding for the research of Ricardo Rodrguez?

Q10. What is the system state of the attack?

Q11. What is the average number of new infected hosts?

Q12. How many places are affected by the lmovP?