scispace - formally typeset
Open AccessJournal ArticleDOI

Know Abnormal, Find Evil: Frequent Pattern Mining for Ransomware Threat Hunting and Intelligence

Reads0
Chats0
TLDR
In this article, the authors used sequential pattern mining to find Maximal Frequent Patterns (MFP) of activities within different ransomware families as candidate features for classification using J48, Random Forest, Bagging and MLP algorithms.
Abstract
Emergence of crypto-ransomware has significantly changed the cyber threat landscape. A crypto ransomware removes data custodian access by encrypting valuable data on victims’ computers and requests a ransom payment to re-instantiate custodian access by decrypting data. Timely detection of ransomware very much depends on how quickly and accurately system logs can be mined to hunt abnormalities and stop the evil. In this paper we first setup an environment to collect activity logs of 517 Locky ransomware samples, 535 Cerber ransomware samples and 572 samples of TeslaCrypt ransomware. We utilize Sequential Pattern Mining to find Maximal Frequent Patterns (MFP) of activities within different ransomware families as candidate features for classification using J48, Random Forest, Bagging and MLP algorithms. We could achieve 99 percent accuracy in detecting ransomware instances from goodware samples and 96.5 percent accuracy in detecting family of a given ransomware sample. Our results indicate usefulness and practicality of applying pattern mining techniques in detection of good features for ransomware hunting. Moreover, we showed existence of distinctive frequent patterns within different ransomware families which can be used for identification of a ransomware sample family for building intelligence about threat actors and threat profile of a given target.

read more

Content maybe subject to copyright    Report

This is a repository copy of Know abnormal, find evil: frequent pattern mining for
ransomware threat hunting and intelligence.
White Rose Research Online URL for this paper:
http://eprints.whiterose.ac.uk/128370/
Version: Accepted Version
Article:
Homayoun, S., Dehghantanha, A., Ahmadzadeh, M. et al. (2 more authors) (2020) Know
abnormal, find evil: frequent pattern mining for ransomware threat hunting and intelligence.
IEEE Transactions on Emerging Topics in Computing, 8 (2). pp. 341-351. ISSN 2168-6750
https://doi.org/10.1109/TETC.2017.2756908
© 2017 IEEE. Personal use of this material is permitted. Permission from IEEE must be
obtained for all other users, including reprinting/ republishing this material for advertising or
promotional purposes, creating new collective works for resale or redistribution to servers
or lists, or reuse of any copyrighted components of this work in other works. Reproduced
in accordance with the publisher's self-archiving policy.
eprints@whiterose.ac.uk
https://eprints.whiterose.ac.uk/
Reuse
Items deposited in White Rose Research Online are protected by copyright, with all rights reserved unless
indicated otherwise. They may be downloaded and/or printed for private study, or other acts as permitted by
national copyright laws. The publisher or other rights holders may allow further reproduction and re-use of
the full text version. This is indicated by the licence information on the White Rose Research Online record
for the item.
Takedown
If you consider content in White Rose Research Online to be in breach of UK law, please notify us by
emailing eprints@whiterose.ac.uk including the URL of the record and the reason for the withdrawal request.

IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING 1
Know Abnormal, Find Evil: Frequent Pattern
Mining for Ransomware Threat Hunting and
Intelligence
Sajad Homayoun, Ali Dehghantanha, Marzieh Ahmadzadeh, Sattar Hashemi, Raouf Khayami
Abstract—Emergence of crypto-ransomware has significantly
changed the cyber threat landscape. A crypto ransomware
removes data custodian access by encrypting valuable data
on victims’ computers and requests a ransom payment to re-
instantiate custodian access by decrypting data. Timely detec-
tion of ransomware very much depends on how quickly and
accurately system logs can be mined to hunt abnormalities and
stop the evil. In this paper we first setup an environment to
collect activity logs of 517 Locky ransomware samples, 535 Cerber
ransomware samples and 572 samples of TeslaCrypt ransomware.
We utilize Sequential Pattern Mining to find Maximal Frequent
Patterns (MFP) of activities within different ransomware families
as candidate features for classification using J48, Random Forest,
Bagging and MLP algorithms. We could achieve 99% accuracy
in detecting ransomware instances from goodware samples and
96.5% accuracy in detecting family of a given ransomware sam-
ple. Our results indicate usefulness and practicality of applying
pattern mining techniques in detection of good features for ran-
somware hunting. Moreover, we showed existence of distinctive
frequent patterns within different ransomware families which
can be used for identification of a ransomware sample family for
building intelligence about threat actors and threat profile of a
given target.
Index Terms—Malware, ransomware, crypto ransomware, ran-
somware detection, ransomware family detection.
I. INTRODUCTION
C
YBERCRIMINALS pose a real and persistent threat to
business, government and financial institutions all around
the globe [1]. The volume, scope and cost of cybercrime all
remain on an upward trend [2]. Malicious programs have
always been an important tool in cyber criminals portfolios
and almost everyday we are detecting new variants of malware
programs [3]. Development and wide adoption of e-currencies
such as Bitcoin led to many changes in cybercriminal ac-
tivities including development of a new type of malware
called ransomware [4]. Ransomware is a type of malware
that removes a custodian access to her data and request for
a ransom payment to re-instantiate data access [5]. There are
two main types of ransomwares namely Locker and Crypto
ransomwares. The former locks a system and denies users’
access without making any changes to the data stored on the
system while the latter encrypts all or selected data usually
S. Homayoun, M. Ahmadzadeh and Raouf Khayami are with the Depart-
ment of IT and Computer Engineering, Shiraz University of Technology,
Shiraz, Iran. e-mail: S.Homayoun@sutech.ac.ir.
A. Dehghantanha is with Department of Computer Science, School of
Computing, Science and Engineering, University of Salford, Salford, U.K.
S. Hashemi is with Department of Computer Engineering, Shiraz University,
Shiraz, Iran.
using a strong cryptography algorithm such as AES or RSA
[6].
Ransomware has dominated the threat landscape in 2016
with annual increase rate of 267% [7]. It is estimated that
in 2014 only, cybercriminals have made more than $3 mil-
lion profit using ransomware programs [8]. These days, ran-
somware programs are indiscriminatly targeting all industries
ranging from healthcare to the banking sector and even power
grids [2]. The Crypto-ransomware programs are much more
popular than Lockers as almost always security engineers
could find ways to unlock a system without paying the
ransom while the only viable solution for decrypting strongly
encrypted data is to pay ransom and receive decryption key [9].
Therefore, focus of this paper is only on crypto-ransomware
and in the rest of the paper, the word ”ransomware” is actually
referring to the ”crypto-ransomware” only. It was already
reported that cyber security training and employee awareness
would reduce the risk of ransomware attacks [10]. However,
automated tools and techniques are required to detect ran-
somware applications before they are launched [11] or within
a short period after their execution [12]. The growing danger
of ransomware attacks requires new solutions for prevention,
detection and removing ransomwares programs.
In this paper, we are using a sequential pattern mining
technique to detect best features for classification of ran-
somware applications from benign apps as well as identifying
a ransomware sample family. We investigate usefulness of
our detected features by applying them in J48, Random
Forest, Bagging and MLP classification algorithms against a
dataset contains 517 Locky ransomware samples, 535 Cerber
ransomware samples, 572 samples of TeslaCrypt ransomware
and 220 standalone Windows Portable and Executable (PE32)
benign applications. We not only achieved 99% accuracy in
detection of ransomware samples and 96.5% in detection of
their families but reduced the detection time to less than 10
seconds of launching a ransom application; a third of the
time reported by earlier studies i.e. [13]. Our results are not
only indicative of usefulness of pattern mining techniques in
identification of best features for hunting ransomware applica-
tions but show how patterns of different ransomware families
can help in detecting a ransomware family which assist in
building intelligence about threats applicable to a given target.
To the best of authors knowledge this is the very first paper
applying sequence pattern mining to detect frequent features
of ransomware applications and to build vectored datasets of
ransomware applications logs. Our created datasets contain

IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING 2
logs of Dynamic Link Libraries (DLL) activities, file system
activities and registry activities of 1624 ransomware samples
from three different families and 220 benign applications.
We are using widely accepted criteria namely True Positive
(TP), False Positive (FP), True Negative (TN), and False
Negative to evaluate our model [14]–[16]. TP is reflecting
total samples that correctly identified. FP shows incorrectly
identified samples. TN demonstrates the number of correctly
rejected samples, while FN shows incorrectly rejected sam-
ples. Precisions of a classification algorithm is a measure of
relevancy of results and is calculated by dividing TP by total of
FP and TP predicted by a classifier as shown in equation (1).
Recall reflects the proportion of positives that are correctly
identified by classification technique which is calculated by
dividing TP by total of TP and FN as shown in equation
(2). F-measure is showing the performance of a classification
algorithm and is calculated by the harmonic mean of precision
and recall as shown in equation (3).
P recision =
T P
T P + F P
(1)
Recall =
T P
T P + F N
(2)
F measure = 2 ×
P recision × Recall
P recision + Recall
(3)
We will also report Receiver Operating Characteristic (ROC)
that is a potentially powerful metric for comparison of different
classifiers, because it is invariant against skewness of classes
in the dataset. In a ROC curve the true positive rate is plotted
in function of the false positive rate for different thresholds. In
addition to ROC, Area Under the Curve (AUC) is a measure
of how well a parameter can be used to distinguish between
two classes. AUC is a single value that summarizes the ROC
by calculating the area of the convex shape below the ROC
curve. AUC can be between 0 and 1, where the value of 1
shows optimal point of perfect prediction.
Matthews Correlation Coefficient (MCC) [17] provides an-
other measures of quality to compare different classifiers [18].
The MCC value is between 1 and +1, where in cases of
perfect prediction it gives +1. 1 coefficient shows total
disagreement between prediction and observation while the
coefficient value of 0 indicates that the classifier does not work
better than a random prediction. MCC is also a useful measure
of classifier performance against imbalanced datasets. While
Precision, Recall or F-measure values in a random guessing
would be higher than 0.5, MCC value would be around 0
for random guessing. Therefore, for making sure that our
classifiers are far from random classifiers, we will compute
MCC values for each classifier. The values can be computed
using equation (7) which is composed of equations (4), (5)
and (6), where N is the total number of samples.
N = T P + F P + T N + F N (4)
S =
T P + F N
N
(5)
P =
T P + F P
N
(6)
MCC =
T P
N
S × N
p
P S(1 S)(1 P )
(7)
The remainder of this paper is organized as follows. Section
II reviews some related research in while Section III explains
our method for collecting and preprocessing of data in a
controlled environment. We describe feature extraction and
vectorization in Section IV. Section V introduces our approach
for ransomware detection followed by Section VI that de-
scribes our performance in detecting ransomwares families.
Finally, section VII discusses about the achievements of this
paper and concludes the paper.
II. RELATED WORK
Ransomware programs are reportedly becoming a dominant
tool for cybercriminals and a growing threat to our ICT in-
frastructure [4], [19], [20]. The possibility of using encryption
techniques to encrypt users data as part of a Denial of Service
(DoS) attack is known for a very long time [21]. However,
recent adoption of eCurrencies such as BitCoin provided
many new opportunities for attackers including receiving a
ransom payment for decrypting users data [21]. In spite of its
simplicity and primitive utilization of cryptographic techniques
[22], ransomware programs are becoming a major tool in cyber
criminals toolset [23]. For any cyber threat, prevention is ideal
but detection is a must and ransomware is not an exception
[3], [24].
Situational cyber security awareness plays an important role
in preventing cyber-attacks [25]. An educational framework
that is tailored to ransomware threats [10] as well as a
tool which mimicked ransomware attacks [26] proved to be
useful in reducing ransomware infections. Moreover, technical
countermeasures such verifying applications trustworthiness
when calling a crypto library [27] or minimizing attack surface
by limiting end-users privilege proved effective in preventive
ransomware attacks [9].
Most ransomwares detection solutions are relying on filesys-
tem [28]–[30] and registry events [31] to identify malicious
behaviors. Investigation of 1359 ransomware samples showed
that majority of ransomware samples are using similar APIs
and generating similar logs of filesystem activities [29]. For
example, using 20 types of filesystem and registry events
as features of a Bayesian Network model against 20 Win-
dows ransomware samples resulted to an accurate ransomware
detection with F-Measure of 0.93 [31]. UNVEIL [29] as a
rasnsomware classification system utilized filesystem events
to distinguish 13,637 ransomwares from a dataset of 148,223
malware samples with accuracy of 96.3%. CloudRPS [32] was
a cloud-based ransomware detection system which relied on
abnormal behaviors such as conversion of large quantities of
files in a short interval to detect ransomware samples. EldeRan
[13] utilized association between different operating system
events to build a matrix of applications activities and to detect
ransomware samples within 30 seconds of their execution with
AUC of 0.995. Timely detection of a ransomware upon its
execution is very crucial and systems that fail to detect ran-
somware in less than 10 seconds are not considered effective
[5]. Moreover, timely identification of a ransomware family

IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING 3
would assist in building intelligence about applicable threat
actors and threat profile for a given target.
III. DATA CREATION
We have downloaded 1624 Windows Portable Executable
(PE32) ransomware samples from virustotal.com which were
active in the period of February 2016 to March 2017 as
reported by RansomwareTracker.abuse.ch. Collected samples
belong to three families of ransomware namely 517 Locky
samples, 535 Cerber samples and 572 samples of TeslaCrypt.
The best type of goodware counterpart for malware applica-
tions are portable and standalone benign apps [25]. Therefore,
we have collected all 220 available portable Windows PE32
benign applications from portableapps.com
1
in April 2017 to
serve as goodware counterpart of our dataset.
We have setup the environment shown in Fig. 2 to collect
logs of ransomware and goodware samples runtime activities.
The Controller application on the host machine is randomly
selecting a ransomware or goodware sample and passes it
through FTP server to the Virtual Machine (VM). When
the sample is successfully transferred, the Controller notifies
the Launcher app to run the ProcessMonitor application and
executes a given sample. Similar to the previous research [5],
the first 10 seconds log of ransomware and benign applications
runtime activities is collected and the created log file is up-
loaded to the Log repository on the host machine. Since major-
ity of benign applications require human interactions to run (i.e
clicking on a button), we have developed an application called
PyWinMonkey which automates user interactions with an ap-
plication. When the log file is successfully stored on the host
machine, the Controller application reverts the VM back to its
original copy and passes the next sample. It is notable that Py-
WinMonkey is similar to Monkey
2
Android app which utilized
in many previous Android malware research papers [33] for
mimicking human interactions. We have used Python 3.6.1 to
develop Controller, Launcher and PyWinMonkey apps (accessi-
ble at https://github.com/sajadhomayoun/PyWinMonkey) and
run ProcessMonitor V3.31 on Windows10 build number 10240
on a computer with Core i7 CPU with 8 cores of 4GHz and
16GB of RAM. For each and every process, ProcessMonitor
records loaded Dynamic Linked Libraries (DLLs), file system
activities and registry activities. Therefore, we will have three
sets of events namely Registry
Events Set, which includes
all registry events, DLL
Events Set, which includes all DLL
events and FileSystem
Events Set, which contains all Filesys-
tem events as listed in Table I. Moreover, EventType(E) is a
procedure that returns the type of given event (R for Registry
events, F for Filesystem events, and D for DLL events) as Fig.
1.
As we will be using a sequential pattern mining technique
(MG-FSM) to detect candidate features for classification task,
we should convert our data into a sequential dataset which
is a collection of sequences such as D = {S
1
, S
2
, ..., S
n
}
where S
i
represents a sequentially ordered set of events. We
have created a sequence of runtime events for each and every
1
https://portableapps.com/apps
2
https://developer.android.com/studio/test/monkey.html
1: procedure EVENTTYPE(Event E)
2: if E Registry
Events Set return R
3: if E F ilesystem
Events Set return F
4: if E DLL
Events Set return D
5: end procedure
Fig. 1. Determining Even Type of a given event.
TABLE I
LIST OF ACTIVITIES CAN BE CAPTURED BY PROCESS MONITOR
Activity Type List
Registry RegQueryKey, RegOpenKey, RegQueryValue,
RegCloseKey, RegCreateKey, RegSetInfoKey,
RegEnumKey, RegQueryKeySecurity, Re-
gEnumValue, RegSetValue, RegDeleteValue,
RegQueryMultipleValueKey, RegDeleteKey,
RegLoadKey, RegFlushKey
File
QueryNameInformationFile, ReadFile,
CreateFile, QueryBasicInformationFile,
CloseFile, QueryStandardInformationFile,
CreateFileMapping, QuerySizeInformation-
Volume, FileSystemControl, QueryDirectory,
WriteFile, QueryNetworkOpenInformation-
File, QueryRemoteProtocolInformation,
QuerySecurityFile, LockFile, UnlockFileSin-
gle, DeviceIoControl, SetEndOfFileInfor-
mationFile, FlushBuffersFile, SetAllocation-
InformationFile, SetBasicInformationFile,
QueryAttributeTagFile, QueryFileInternalIn-
formationFile, QueryInformationVolume,
QueryAttributeInformationVolume,
SetRenameInformationFile, QueryNormalized-
NameInformationFile, NotifyChangeDirectory,
QueryFullSizeInformationVolume,
SetSecurityFile, QueryStreamInformationFile,
SetDispositionInformationFile, QueryEaIn-
formationFile, QueryAllInformationFile,
QueryIdInformation, SetPositionInforma-
tionFile, QueryPositionInformationFile,
SetValidDataLengthInformationFile
DLL
LoadImage
ransomware and benign application. S
i
represents a sequence
of all events E caused by launching an application i ordered
by time as follow:
S
i
= {E
1,i
(argE
1
), E
2,i
(argE
2
), ..., E
2,i
(argE
n
)} where
E
x,y
(argE
x
) represents event x for an application y and
argE
x
shows the argument passed to the event E
x
.
For example, {LoadImage(C :
\system32\gdi32.dll)}, {LoadImage(R e ad F ile(C :
\W indows\SysW OW 64\wininet.dll)} shows a sequence
of two events where the first event loads gdi32.dll in the
memory of calling process (hence C : \system32\gdi32.dll
is the parameter for this event) and the second event reads
wininet.dll file located at C : \W indows\SysW OW 64. The
size of each sequence depends on the number of events that
are called by an application and varies between different
apps.
Once all sequences are created, we have utilized the Outlier
Factor [34] technique to remove any outlier sequence from
our dataset similar to [35]. The Outlier Factor technique first
extracts all frequent patterns from a dataset and then detects

IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING 4
ControllerFTP Server
Launcher
FTP Client
Ransomware/
Goodware Samples
Logs Repository
Process Monitor
*.csv
Reverting
Snapshot
Host OS (Win10)
Virtual Machine
(Win 10)
Fig. 2. Environment Setup to Capture Malware and Goodware Activities Log
TABLE II
CREATED DATASETS
Dataset Number of Sequences
D Locky 450
D
Cerber 470
D
TeslaCrypt 507
D
Goodware 200
D
OF 174
outlier sequences as those that contain the least frequent
patterns in a given dataset.
Table II reflects final datasets with the number of sequences
in each dataset. D
Locky represents sequences of Locky
ransomware samples, D
Cerber shows Cerber ransomware
sequences and D TeslaCrypt includes sequences of TeslaCrypt
ransomware samples. D Ransomware represents combined
sequences of all ransomware samples while D
Goodware
includes sequences of events of all benign applications. We
randomly collected 52 Locky, 50 Cerber, 52 TeslaCrypt and
20 benign applications sequences in a separated dataset for
over-fitting test as well (D OF).
IV. FEATURE EXTRACTION AND VECTORIZATION
To detect the best features for classification task, we need
to first define detectable patterns of events and then uti-
lize a pattern mining algorithm to find Maximal Sequential
Patterns (MSP) collections within each dataset. Afterwards,
every sequence within every relevant dataset is traversed based
on a given MSP collection to provide features for training
classifiers.
Sequential pattern mining techniques discover all subse-
quences (Sequential Patterns) that appear in a given sequen-
tial dataset with frequency of no less than a user-specified
threshold (min
sup
) [36]. A sequence α = {a
1
, a
2
, ..., a
n
} is
called a subsequence of another sequence β = {b
1
, b
2
, ..., b
m
}
and β is a super-sequence of α, denoted as α β, if there
exists integers 1 j
1
< j
2
< ... < j
n
m such that
a
1
b
j
1
, a
2
b
j
2
, ..., a
n
bj
n
. A sequence is said to be
frequent and called a Sequential Pattern (SP) in a sequential
dataset D if sup
α min
sup
, where sup α (support of
α) denotes the frequency of occurrence of α in a given
sequential dataset D. Moreover, if a Sequential Pattern SP
is not contained in any other sequential patterns, it is called a
Maximal Sequential Pattern (MSP). Collection of all MSPs
with in a given sequential dataset D can be denoted as a
Maximal Sequential Pattern Collection (MC
D
). Members of
a MC are in format of (P, sup
P ) where P is a MSP and
sup
P shows the frequency of occurrence of P in a given
dataset D.
There are two major types of sequential pattern mining
algorithms to extract MSPs namely Apriori-based and frequent
pattern growth. Apriori-based algorithms are detecting MSPs
based on the fact that any subset of a frequent pattern must
be frequent. However, recursive nature of Apriori-based algo-
rithms increases complexity and running time of the algorithm
[37]. On the other side, frequent pattern growth algorithms
are using divide-and-conquer techniques to narrow down the
search space MSPs. To detect MSPs in this study, we utilize
a widely used frequent pattern growth algorithm [38] called
”Mind the Gap: Frequent Sequence Mining (MG-FSM)”
[39] with min
sup
of 50%. Applying MG-FSM against our
datasets generates four MSP collections namely MC
D
Locky
,
MC
D
Cerber
, MC
D T eslaCrypt
and MC
D Ransowmare
.
MC
D
= {(P
x
, sup
P
x
)|sup P
x
min
sup
P
x
(6 P
y
(P
x
P
y
))}.
We can distinguish three types of atomic MSPs and six types
of single step transition MSPs within our sequential datasets as
shown in Table III. Atomic MSPs are representing continuous
events of the same type i.e. the atomic MSP of F represents
continuous Filesystem events. Single step transitions MSPs are
representing a transition from one atomic MSP to another. For
example, MSP of RD represents a sequence of registry events
(R atomic MSP) followed by a sequence of DLL events (D
atomic MSP). It is notable that we only define two types of
atomic and single step transition MSP to avoid sparsity in
extracted features.
A MSP P = {E
1
, ..., E
n
} is atomic if
E
x
,E
y
P E
x
6=E
y
(Eve ntT ype(E
x
) == EventT ype(E
y
)).
A MSP P = {E
1
, ..., E
n
} is a single step transition if
E
x
,E
y
P E
x
6=E
y
(Eve ntT ype(E
x
) 6= EventT yp e (E
y
)).
We can define a set that contains all MSP types (MSP
Type
Set) and a procedure (MSPType(MSP P) in Fig. 3) that
returns type of given sequence S as follow:
MSP
T ype Set = {R, F, D, RF, RD, F R, F D, DR, DF }.
Support Ratio (SR) of a MSP is a value in the range of
[0,1] that shows the possibility of occurrence of the MSP in
a given dataset of ransomware and is calculated by dividing

Citations
More filters
Journal ArticleDOI

A systematic literature review of blockchain cyber security

TL;DR: It is shown that the Internet of Things (IoT) lends itself well to novel blockchain applications, as do networks and machine visualization, public key cryptography, web applications, certification schemes and the secure storage of Personally Identifiable Information (PII).
Journal ArticleDOI

A deep Recurrent Neural Network based approach for Internet of Things malware threat hunting

TL;DR: The potential of using Recurrent Neural Network (RNN) deep learning in detecting IoT malware by using RNN to analyze ARM-based IoT applications’ execution operation codes (OpCodes) is explored.
Journal ArticleDOI

Fuzzy Pattern Tree for Edge Malware Detection and Categorization in IoT

TL;DR: This study transmute the programs’ OpCodes into a vector space and employ fuzzy and fast fuzzy pattern tree methods for malware detection and categorization, obtaining a high degree of accuracy during reasonable run-times especially for the fast fuzzypattern tree.
Book ChapterDOI

Leveraging machine learning techniques for Windows ransomware network traffic detection

TL;DR: NetConverse is introduced, a machine learning evaluation study for consistent detection of Windows ransomware network traffic using a dataset created from conversation-based network traffic features and achieving a True Positive Rate (TPR) of 97.1% using the Decision Tree (J48) classifier.
Journal ArticleDOI

DRTHIS: Deep ransomware threat hunting and intelligence system at the fog layer

TL;DR: The Deep Ransomware Threat Hunting and Intelligence System (DRTHIS), a deep learning system to distinguish ransomware from goodware and identify their families, uses Long Short-Term Memory and Convolutional Neural Network, two deep learning techniques, for classification using the softmax algorithm.
References
More filters
Proceedings ArticleDOI

Mining sequential patterns

TL;DR: Three algorithms are presented to solve the problem of mining sequential patterns over databases of customer transactions, and empirically evaluating their performance using synthetic data shows that two of them have comparable performance.
Posted Content

Evaluation: from Precision, Recall and F-measure to ROC, Informedness, Markedness and Correlation

TL;DR: E elegant connections between the concepts of Informedness, Markedness, Correlation and Significance as well as their intuitive relationships with Recall and Precision are demonstrated.
Journal ArticleDOI

Comparison of the predicted and observed secondary structure of T4 phage lysozyme.

TL;DR: Although empirical predictions based on larger numbers of known protein structure tend to be more accurate than those based on a limited sample, the improvement in accuracy is not dramatic, suggesting that the accuracy of current empirical predictive methods will not be substantially increased simply by the inclusion of more data from additional protein structure determinations.
Journal ArticleDOI

A systematic analysis of performance measures for classification tasks

TL;DR: This paper presents a systematic analysis of twenty four performance measures used in the complete spectrum of Machine Learning classification tasks, i.e., binary, multi-class,multi-labelled, and hierarchical, to produce a measure invariance taxonomy with respect to all relevant label distribution changes in a classification problem.
Related Papers (5)
Frequently Asked Questions (13)
Q1. What contributions have the authors mentioned in the paper "Know abnormal, find evil: frequent pattern mining for ransomware threat hunting and intelligence" ?

In this paper the authors first setup an environment to collect activity logs of 517 Locky ransomware samples, 535 Cerber ransomware samples and 572 samples of TeslaCrypt ransomware. Moreover, the authors showed existence of distinctive frequent patterns within different ransomware families which can be used for identification of a ransomware sample family for building intelligence about threat actors and threat profile of a given target. The authors utilize Sequential Pattern Mining to find Maximal Frequent Patterns ( MFP ) of activities within different ransomware families as candidate features for classification using J48, Random Forest, Bagging and MLP algorithms. 

Applying other classification techniques such as fuzzy classification can be considered as a future work of this study. 

Most ransomwares detection solutions are relying on filesystem [28]–[30] and registry events [31] to identify malicious behaviors. 

The authors achieved F-Measure of more than 0.98 with FPR of less than 0.007 in detection of a given ransomware family using 13 selected features detected in this study. 

When the sample is successfully transferred, the Controller notifies the Launcher app to run the ProcessMonitor application and executes a given sample. 

The authors have downloaded 1624 Windows Portable Executable (PE32) ransomware samples from virustotal.com which were active in the period of February 2016 to March 2017 as reported by RansomwareTracker.abuse.ch. 

utilization of Stream Data Mining techniques to reduce ransomware detection time is another interesting extension of this study. 

When the log file is successfully stored on the host machine, the Controller application reverts the VM back to its original copy and passes the next sample. 

recent adoption of eCurrencies such as BitCoin provided many new opportunities for attackers including receiving a ransom payment for decrypting users data [21]. 

Members of a MC are in format of (P, sup P ) where P is a MSP and sup P shows the frequency of occurrence of P in a given dataset D.There are two major types of sequential pattern mining algorithms to extract MSPs namely Apriori-based and frequent pattern growth. 

The MCC value of all classifiers is more than 0.96 while Random Forest and Bagging achieved MCC of almost +1 which is very close to a perfect prediction. 

MCC values of more than 0.95 for all classifiers also indicate quality of their features in enabling classifiers to provide an almost perfect prediction. 

Timely detection of a ransomware upon its execution is very crucial and systems that fail to detect ransomware in less than 10 seconds are not considered effective [5].