scispace - formally typeset
Open AccessProceedings ArticleDOI

Survivability Model for Security and Dependability Analysis of a Vulnerable Critical System

TLDR
A Markov chain-based survivability model is proposed for capturing the vulnerable critical system behaviors during the vulnerability elimination process and will enable the system survivability in terms of security risk and dependability, but also provide insights on the system investment decision.
Abstract
This paper aims to analyze transient security and dependability of a vulnerable critical system, under vulnerability-related attack and two reactive defense strategies, from a severe vulnerability announcement until the vulnerability is fully removed from the system. By severe, we mean that the vulnerability-based malware could cause significant damage to the infected system in terms of security and dependability while infecting more and more new vulnerable computer systems. We propose a Markov chain-based survivability model for capturing the vulnerable critical system behaviors during the vulnerability elimination process. A high-level formalism based on Stochastic Reward Nets is applied to automatically generate and solve the survivability model. Survivability metrics are defined to quantify system attributes. The proposed model and metrics not only enable us to quantitatively assess the system survivability in terms of security risk and dependability, but also provide insights on the system investment decision. Numerical experiments are constructed to study the impact of key parameters on system security, dependability and profit.

read more

Content maybe subject to copyright    Report

Survivability Model for Security and Dependability
Analysis of a Vulnerable Critical System
Xiaolin Chang
a
, Shaohua Lv
a
, Ricardo J. Rodríguez
b
, Kishor Trivedi
c
a
Beijing Key Laboratory of Security and Privacy in Intelligent Transportation, Beijing Jiaotong University, P. R. China
b
Centro Universitario de la Defensa, Academia General Militar, Zaragoza, Spain
c
Department of Electrical and Computer Engineering, Duke University, USA
Email: {xlchang, 16120401}@bjtu.edu.cn,
rjrodriguez@unizar.es
, ktrivedi@duke.edu
Abstract— This paper aims to analyze transient security
and dependability of a vulnerable critical system, under
vulnerability-related attack and two reactive defense
strategies, from a severe vulnerability announcement until
the vulnerability is fully removed from the system. By severe,
we mean that the vulnerability-based malware could cause
significant damage to the infected system in terms of
security and dependability while infecting more and more
new vulnerable computer systems. We propose a Markov
chain-based survivability model for capturing the
vulnerable critical system behaviors during the
vulnerability elimination process. A high-level formalism
based on Stochastic Reward Nets is applied to automatically
generate and solve the survivability model. Survivability
metrics are defined to quantify system attributes. The
proposed model and metrics not only enable us to
quantitatively assess the system survivability in terms of
security risk and dependability, but also provide insights on
the system investment decision. Numerical experiments are
constructed to study the impact of key parameters on system
security, dependability and profit.
Keywords Reactive defense strategy; Quantitative analysis;
Stochastic Reward Nets; Survivability; Security
I. INTRODUCTION
A software vulnerability is a defect in software which can be
exploited by attackers/malwares (malicious software) to
compromise a system for their benefits. A typical malware goes
through the following four phases: (1) gaining access to a
targeted system by means of the declaimed vulnerabilities, (2)
trying various methods to make itself persistent into the system,
(3) looking for data of interest to be stolen or modified, and (4)
damaging security by modifying unauthorized data, exfiltrating
sensitive data, or infecting new vulnerable computer systems
(denoted as host in the following). In addition, the various
activities undertaken during the attack may crash the system and
then degrade system dependability. The security and/or
dependability damage may lead to loss of customer confidence
and lead to the other possible long-term consequences due to loss
and theft of information.
In this paper, we assess the survivability of a vulnerable
critical system from a severe vulnerability announcement until
the vulnerability is fully removed from the system. We define
survivability as a transient measure of a system’s capability in
withstanding vulnerability-related malicious attacks and
executing pre-specified mission even when parts of the system
are damaged. By severe, we mean the vulnerability-based
malware could cause significant damage to the infected system
in terms of security and dependability while infecting new
vulnerable computer systems.
We develop a survivability model to capture the system
behaviors under reactive defense strategies and the actions
performed by an attacker to cause such an attack by exploiting
the vulnerability. All relevant event times are assumed to be
exponentially distributed and thus the model is a homogeneous
continuous time Markov chain (CTMC). In this paper, the
generation and solution of the proposed Markov model are
automated using a variant of stochastic Petri Nets called
Stochastic Reward Nets (SRNs), which could easily represent
common characteristics of computer systems such as
concurrency, synchronization, conditional branches, looping,
and sequencing.
This paper is close to the work in [1]. The key differences
from [1] are detailed in Section II. We summarize the major
contributions of this paper as follows:
(1) We investigate the scenario where two reactive defense
strategies are deployed to reduce or prevent the security
damage caused by malware. Moreover, we not only
investigate the attacking activity of affecting the local
system security, but also investigate the attacking
activity of infecting new hosts.
(2) We develop a survivability model by using Stochastic
Reward Nets so as to capture the system behaviors. To
the best of our knowledge, we are the first to apply
state-space analytic model to analyze the survivability
of such a vulnerable critical system. We also define
survivability metrics and propose the corresponding
calculating methods.
(3) Numerical experiments are constructed to study the
impact of key parameters on system security,
dependability and profit.
978-1-5386-5156-8/18/$31.00 ©2018 IEEE

Fig. 1 Flowchart depicting events in a system under an attack, after a vulnerability is announced and during the implementation and deployment of the reac-
tive defense strategies
The paper is organized as follows. Section II presents related
work. Section III presents the system model and survivability
measures. In Section IV, we present evaluation results. The
conclusion is drawn in Section V.
II. R
ELATED WORK
Kinds of efforts have been made to advance the improvement
in the dependability and security of various infrastructure
systems, including communication network, transportation,
power and water distribution and so on. However, undesired
events still occur to those systems. For example, natural disasters,
security attacks and hardware/software failures. Timely and
quick recovery from the unexpected events is critical to
infrastructure systems. It is known that an undesirable event
occurrence may only degrade the system performance instead of
crashing the system.
In the scenario where there are multiple actions to be taken
for recovering the system, the recovery process could be
modeled as a single-phase recovery model or a multi-phase
recovery model. In the latter type, each recovery action or a set
of parallel actions are modelled as a phase. Phase input
determines the sequence of the phases. A multi-phase recovery
model could capture the fine-grained characteristics of the
restoration process [2][5], compared to a single-phase recovery
model.
Survivability, a transient measure, is defined to describe the
ability of the system to recover a predefined service in a timely
manner after the occurrence of undesired events [2]. Its
quantitative analysis could help improve the systems’ capability
in critical service provision when damage occurs to part of the
system or the whole system get damaged.
The tremendous increase in the number of vulnerabilities
discovered and disclosed and the severity of their damage have
prompted various research to the survivability modeling and
analysis in various fields and from different perspectives [1][6][7]
and the references therein. Recently, the authors in [1] carried
out a quantitative assessment of the system secure survivability.
There are three major differences between [1] and this paper:
(1) Only one mitigation strategy, namely the patch
implementation, is considered in [1]. This paper, besides
patch strategy, also considers the isolation strategy to
Vulnerability discovered
Infected
Failing
Infect successful?
Crashing
Lmoved &
Infect new hosts
Lmov Successful?
Y
Y
Exfiltrated &
Infect new hosts
Efil Successful?
Y
Fixing
Y
N
Patch ready?
Patching
Patch ready?
Y
N
System fails?
N
System fails?
System fails?
N
Patch ready?
Y
N
System fails?
Patch ready?
N
N
Y
N
Y
N
Good
Y
Patch ready
Y
Y
Patch implementation
Exploit code implementation
System is vulnerable
Exploit ready?
Y
N
Exploit ready?
N
N
Isolation ready?
N
Isolating system
Y
Patch ready?
N
Y
Isolation ready?
N
Isolating system
Y
Patch ready?
N
Y
Isolation ready?
Isolating system
Y
Patch ready?
N
Isolation ready?
Isolating system
Patch ready?
Isolation ready?
Isolating system
Y
Patch ready?
N
N
Y
N
N
Isolation system
Y
System

separate vulnerable part in the infected system, which
avoids vulnerability-related damage but may degrade
system dependability and performance.
(2) The security loss is quantified in terms of sojourn time in
[1]. This paper proposes a new calculation method by in
terms of the times of successfully stealing/modifying
sensitive information.
(3) The model proposed in this paper could capture the
activities of infecting new vulnerable hosts.
III. S
YSTEM DESCRIPTION AND MODEL
This section first overviews the system of interest in this
paper. Then a Stochastic Reward Net model for survivability
analysis of this system is presented.
A. System Description
We now describe the system considered in this paper, shown
in Fig.1. It could be regarded as an extended system of [1]. There
are nine system states: Vulnerable, Isolating, Patching, Fixing,
Failing, Crashing, Infected, Lmoved, and Exfiltrated. Isolating and
Patching denote two reactive defense strategies considered in
this paper. All the assumptions made in [1] are applied in this
paper, in order to highlight the differences of this paper from [1].
More assumptions are given in the following.
When a vulnerability is fully disclosed, the system is in the
Vulnerable state. Meanwhile, the attacker starts the exploit
implementation. In addition, the defender designs and deploys
the two reactive defense strategies. Thus, there are three
rectangles in the second row of Fig.1, denoted by Isolation
system, Patch implementation and Exploit code implementation,
respectively.
The shaded part denoted by System describes the system state
changes under the attack actions and the two reactive defense
strategies. After the isolating strategy is deployed, the attack
could not degrade the system security but the system
performance is degraded. When the patch is ready, it must be
deployed into the system immediately and the system is
recovered to a secure state. When the attacked system is in
Lmoved or Exfiltrated state, the malware also could infect new
vulnerable hosts which have not been infected before. System
may fail or crash due to attacker behaviors or software bugs, such
as Mandelbugs [8]. If the system crashes or fails, it must be fixed
immediately even the isolation or patching strategy is ready to
be deployed. In the fixing process, both the defender and the
attacker can do nothing to the system.
The metrics used to quantify survivability vary according to
the system and system attributes of interest. We assume that as
long as system service is provisioned, there is revenue. But
revenue decreases after the isolation strategy is deployed or the
malware enters into the vulnerable system. When system service
cannot be provisioned, there may be economic loss to the service
provider due to the pre-defined SLA (Service Level Agreement)
with customers. In addition, both each successful infecting of a
new vulnerable host and each successful stealing/modifying
sensitive information could result in some loss to the service
provider. We define profit equals total revenue minus total cost.
The metrics considered in this paper include:
Metric
1
m
. Mean security loss of the local system at time
t
.
Metric
2
m
. Mean number of new infected hosts at time
t
.
Metric
3
m
. Mean accumulated security loss of the local
system in the interval [0,
t
].
Metric
4
m
. Mean accumulated number of the new infected
hosts in the interval [0,
t
].
Metric
5
m
. Mean accumulated cost in the interval [0,
t
].
Metric
6
m
. Mean accumulated revenue in the interval [0,
t
].
Metric
7
m
. Mean accumulated profit in the interval [0,
t
].
Note that although the definitions of some metrics are same
as in [3], the computation formulas are different. The first three
metrics are
transient
metrics that capture the state of the
system at time
t
after the occurrence of an undesired event.
The left metrics are c
umulative
metrics
which are expected
accumulated rewards in the interval (0
, t
]. Note that
survivability metrics are computed after the
announcement of a vulnerability. In the remainder of this
paper, time
t
refers to
the time immediately after a severe
vulnerability announcement and is measured in days.
B. Stochastic Reward Net Model
There are two major challenges for modeling the system:
(1) How to model two attack activities which occur
simultaneously. Namely, damaging the infected system
security and infecting new vulnerable hosts.
(2) How to model the priority of the patch-based defense
strategy over that of the isolation-based defense strategy
when both strategies are ready to be deployed.
Fig.4 describes an SRN model for the survivability analysis.
The shaded part is the extension to the model proposed in [1]. As
in [1], survivability focuses on capturing the evolution of the
system after an unexpected event occurs. Thus, the model in
Fig.2 does not include the vulnerability detection process.
TABLE I and II show the variable definitions and guard
definitions, respectively. The following focuses on the
explanation on the shaded part. The left part explanation is
referred to [1].

Fig. 2 Stochastic Reward Net model
When a software vulnerability is identified, one token is
removed from
vulfound
P
with rate
δ
and put in
_
vul s
P
,
vul
P
,
p
repare
P
, and
p
reisolate
P
each. This means that system failure,
exploitation code implementation, patch implementation, and
the vulnerability-related service isolation implementation occur
parallelly. A token in place
p
reisolate
P
denotes that the isolation
strategy is under implementation. When
isolate
T
fires, one token
is taken from
p
reisolate
P
and one token is put in
s
tartisolate
P
,
representing that the isolation strategy is ready for deployment.
When there is a token respectively in
s
tartisolate
P
and
vul
P
(
repair
P
,
exploit
P
,
infect
P
,
lmov
P
, or
efil
P
), the immediate transition
1c
t
(
2c
t
,
3c
t
,
4c
t
,
5c
t
, or
6c
t
) fires. Then, a token is taken from
s
tartisolate
P
and
vul
P
(
repair
P
,
exploit
P
,
infect
P
,
lmov
P
, or
efil
P
) , and
deposited in place
finishisolate
P
.
finishisolate
P
represents that the
system is isolated from the malicious software. In this situation,
the system may fail or crash. As long as there is a token in
ready
,
the system enters into the state of deploying the patch.
The priority of
1
t
,
2c
t
and
7
t
is set as
127c
tt t>>
with the
aim to achieve the following goals: whenever the patch strategy
is available, the patch must be deployed immediately; then are
the service isolation strategy and exploit code. Similarly, we set
the priority:
21c
tt>
,
33c
tt>
,
44c
tt>
,
55c
tt>
, and
66c
tt>
.
The activity of infecting a new vulnerable host is modeled by
infectm
P
,
infectm
T
,
infects
P
,
infects
t
and
infectm
g
.
infectm
g
assures that
only when there is a token in
lmov
P
or
efil
P
, a new vulnerable host
may be infected. Before we define each metrics, some variables
are defined first. We define a reward/loss to each place in Fig.2
to represent the service revenue/loss at this place per day.
vul
r
/
Place
vul
c
denotes the unit revenue/loss at
vul
P
. The other places
have similar revenue and loss definitions.
Trans
lmov
c
and
Trans
infectm
c
are
defined to denote unit loss of throughput at
lmov
T
and
infectm
T
,
respectively. Now we use the SPNP software package [9] to
calculate the above metrics as follows:
1
m
: throughput of
lmov
T
at time
t
.
2
m
: throughput of
infectm
T
at time
t
.
3
m
: the expected accumulated rate of
infectm
T
in the in-
terval [0,
t
].
4
m
: the expected accumulated rate of
lmov
T
in the inter-
val [0,
t
].

5
m
:
3
m
*
Trans
lmov
c
+
4
m
*
Trans
I
nfectm
c
+ the sum of mean accu-
mulated loss of each place in the interval [0,
t
].
6
m
: the sum of mean accumulated reward of each place
in the interval [0,
t
].
765
mmm=−
.
IV. NUMERICAL ANALYSIS AND DISCUSSIONS
This section aims to evaluate the effectiveness of the
proposed model. We evaluate our model solutions obtained by
using SPNP software package [9] to solve the SRN model, in
terms of the metrics described in Section III.B. Parameter values
are set as in [1], also given in TABLE I.
We first investigate the effect of
p
repare
λ
on security loss.
The other parameter values are fixed as in TABLE I. Fig.3-Fig.6
plot these results. P10, P12, P16, and P20 represent the results of
1
prepare
λ
= 10 days, 12 days, 16 days, 20 days respectively.
We observe:
Fig.3 indicates that for each
1
prepare
λ
,
the throughput
of damaging the local system security increases first and
then decreases. The increasing throughput is due to the in-
creasing probability that
lmov
P
has a token. But this in-
crease stops at some time. The decreasing throughput is
due to the increasing probability that the
isolation and/or
patch-based defense strategies are ready for deploy-
ment. Similar
explanation could be applied for the
changes in the throughput of infecting new hosts, shown
in Fig.5.
With the increasing mean days (
1
prepare
λ
) for the patch
implementation, the probability that the
patch-based de-
fense strategy is ready for deployment increases
slowly. Therefore, more security damage is caused.
Fig.3 indicates
the throughputs of P20, P16, P12 and P10
at the same time instant are increasing.
With the increasing mean days (
1
prepare
λ
) for the patch
implementation, much more local security damage is
caused and there are more number of new hosts to be in-
fected, shown in Fig.4 and Fig.6, respectively.
We also do experiments by fixing
1
prepare
λ
=20 days and
varying
isolate
λ
. Due to space limitation, we only present results
of
Times of successfully damaging local system security at time
t
in Fig.7. “i8” and “i16” represent the results of
1
isolate
λ
= 8
days and 16 days, respectively. “i0” represents that there is no
isolation strategy deployment. We observe that with the
increasing mean days for the isolation implementation, there is
more mean sojourn time for malware to launch attack to local
system. Then more security damage is generated.
TABLE I. P
ARAMETER
D
EFINITION
Symbol
Definition Mean value
1/
δ
Mean time that the discovered vul-
nerability is known to all
30 mins
1/
p
repare
λ
Mean time for implementing a patch 20 days
1/
deploy
λ
Mean time for deploying the patch 12days
1/
vuln
λ
Mean time for generating the exploit
code by an attacker
4 days
1/
f
ail
λ
Mean time that the computer system
fails
365 days
1/
f
ix
λ
Mean time that the computer system
completes the failure or crash fixing
2 days
1/
exploit
λ
Mean time for injecting the exploit
code into the system
7 days
1/
inf
λ
Mean time that the exploit code is
persistent
1 days
1/
lmov
λ
Mean time that the attacker finds sen-
sitive data of interest
7 days
1/
efil
λ
Mean time that the attacker obtains
the desired information
2 days
1/
isolate
λ
Mean time for shutting down those
services related to the detected vul-
nerability
8 days
1/
infectm
λ
Mean time that the attacker injects
the exploit code into another vulner-
able host
7 days
1
ρ
,
2
ρ
Probability that the exploit code
works in the system and is persistent,
respectively
0.9,0.9
3
ρ
,
4
ρ
Probability that the attacker finds its
target and the desired information,
respectively.
0.9,0.9
5
ρ
Probability that the attacker infects a
new host successfully.
0.9
TABLE II.
G
UARD
F
UNCTIONS FOR THE
SRN
M
ODEL
Guard Values
vul
g
if (#(
_
vul s
P
)==1) then 1 else 0
5f
g
if (#(
vul
P
)==1) then 1 else 0
infectm
g
if (#(
lmov
P
)==1 ||#(
efil
P
)==1) then 1 else 0
V. C
ONCLUSIONS
This paper presents a CTMC model for survivability analysis
of a critical system under a severe vulnerability. Stochastic
Reward Nets was used to facilitate the generation and solution
of the Markov model. We defined survivability metrics in terms
of system dependability and security. In addition, numerical
results were presented to study the impact of the underlying
parameters on the system survivability. These results also
provided insights on investment efforts in various system
recovery strategies including reactive defense strategies.

Citations
More filters
Journal ArticleDOI

Transmission Early-Stopping Scheme for Anti-Jamming Over Delay-Sensitive IoT Applications

TL;DR: A jamming detection scheme that uses the packet transmission time as a statistic to make detection decisions and aims to detect the jammer earlier than the deadline so that the remaining time could be utilized in retransmitting the packet over a safe channel.
Journal ArticleDOI

Numerical Evaluation of Job Finish Time Under MTD Environment

TL;DR: An SRN (Stochastic reward net) based analytical modeling approach is proposed to investigate how MTD techniques influence the job running on protected system from the perspective of job finish time to help defenders choose a better MTD configuration to complete the job execution as soon as possible.
Proceedings ArticleDOI

On the Security of Cyber-Physical Systems Against Stochastic Cyber-Attacks Models

TL;DR: In this article, a short systematic investigation for the models and techniques of cy-berattacks and threats rate against Cyber Physical Systems with multiple subsystems and redundant elements such as, network of computing devices or storage modules.
Journal ArticleDOI

Quantitative security analysis of a dynamic network system under lateral movement-based attacks

TL;DR: This paper aims to analyze the transient security of a dynamic network system under lateral movement-based attacks from the time that attack-related abnormity in the system is detected until mechanisms are designed and deployed to defend against attacks.
Proceedings ArticleDOI

Optimization of Cyber System Survivability Under Attacks Using Redundancy of Components

TL;DR: An efficient algorithm is provided to solve the system survivability optimization problem and can help the MCS designers to optimally utilize the component redundancy under a cost constraint for many crucial applications, such as military systems.
References
More filters
Proceedings ArticleDOI

SPNP: stochastic Petri net package

TL;DR: SPNP, a powerful GSPN package that allows the modeling of complex system behaviors, is presented and is compared with two other SPN-based packages, Great SPN and METASAN.
Journal ArticleDOI

Fighting bugs: remove, retry, replicate, and rejuvenate

TL;DR: Even if software developers don't fully understand the faults or know their location in the code, software rejuvenation can help avoid failures in the presence of aging-related bugs.
Journal ArticleDOI

Quantification of system survivability

TL;DR: This work has shown that Survivability is capable of capturing complex system recovery behaviors and tracking a large variety of system performance measures, and it also permits relatively simple quantification procedure as described in this paper.
Journal ArticleDOI

Modeling and Analysis of High Availability Techniques in a Virtualized System

TL;DR: This paper investigates the effect of combination of these availability techniques on VM availability in a virtualized system where various software and hardware failures may occur and provides guidelines for deploying and parameter setting of HA techniques in avirtualized system.
Journal ArticleDOI

Survivability analysis of a two-tier infrastructure-based wireless network

TL;DR: A model for quantification of the survivability of a two-tier infrastructure-based wireless network subject to massive failures, caused by e.g. natural disasters, common mode hardware and software failures, and security attacks is proposed.
Related Papers (5)
Frequently Asked Questions (12)
Q1. What are the contributions mentioned in the paper "Survivability model for security and dependability analysis of a vulnerable critical system" ?

This paper aims to analyze transient security and dependability of a vulnerable critical system, under vulnerability-related attack and two reactive defense strategies, from a severe vulnerability announcement until the vulnerability is fully removed from the system. The authors propose a Markov chain-based survivability model for capturing the vulnerable critical system behaviors during the vulnerability elimination process. The proposed model and metrics not only enable us to quantitatively assess the system survivability in terms of security risk and dependability, but also provide insights on the system investment decision. By severe, the authors mean that the vulnerability-based malware could cause significant damage to the infected system in terms of security and dependability while infecting more and more new vulnerable computer systems. 

When there is a token respectively in startisolateP and vulP ( repairP , exploitP , infectP , lmovP , or efilP ), the immediate transition1ct ( 2ct , 3ct , 4ct , 5ct , or 6ct ) fires. 

When isolateT fires, one token is taken from preisolateP and one token is put in startisolateP , representing that the isolation strategy is ready for deployment. 

The shaded part denoted by System describes the system state changes under the attack actions and the two reactive defense strategies. 

a token is taken fromstartisolateP and vulP ( repairP , exploitP , infectP , lmovP , or efilP ) , and deposited in place finishisolateP . 

This means that system failure, exploitation code implementation, patch implementation, and the vulnerability-related service isolation implementation occur parallelly. 

When a software vulnerability is identified, one token is removed from vu lfoundP with rate δ and put in _vul sP , vulP ,prepareP , and preisolateP each. 

The authors observe that with the increasing mean days for the isolation implementation, there ismore mean sojourn time for malware to launch attack to local system. 

The research of Ricardo J. Rodríguez is supported in part by Spanish Ministry of Economy, Industry and Competitiveness project CyCriSec (grant number TIN201458457-R). 

When the attacked system is in Lmoved or Exfiltrated state, the malware also could infect new vulnerable hosts which have not been infected before. 

Mean number of new infected hosts at time t . • Metric 3m . Mean accumulated security loss of the localsystem in the interval [0, t ]. • Metric 4m . Mean accumulated number of the new infectedhosts in the interval [0, t ]. • Metric 5m . Mean accumulated cost in the interval [0, t ]. • Metric 6m . Mean accumulated revenue in the interval [0, t]. • Metric 7m . Mean accumulated profit in the interval [0, t ]. 

4m * Trans Infectmc + the sum of mean accu-mulated loss of each place in the interval [0, t ]. • 6m : the sum of mean accumulated reward of each placein the interval [0, t ]. • 7 6 5m m m= − .IV.