scispace - formally typeset
Search or ask a question
Posted ContentDOI

Behavioral Malware Detection using Deep Graph Convolutional Neural Networks

TL;DR: Experimental results show that the DGCNN models achieve similar Area Under the ROC Curve (AUC-ROC) and F1-Score to Long-Short Term Memory (LSTM) networks, thus indicating that the models can effectively learn to distinguish between malicious and benign temporal patterns through convolution operations on graphs.
Abstract: Malware behavioral graphs provide a rich source of information that can be leveraged for detection and classification tasks. In this paper, we propose a novel behavioral malware detection method based on Deep Graph Convolutional Neural Networks (DGCNNs) to learn directly from API call sequences and their associated behavioral graphs. In order to train and evaluate the models, we created a new public domain dataset of more than 40,000 API call sequences resulting from the execution of malware and goodware instances in a sandboxed environment. Experimental results show that our models achieve similar Area Under the ROC Curve (AUC-ROC) and F1-Score to Long-Short Term Memory (LSTM) networks, widely used as the base architecture for behavioral malware detection methods, thus indicating that the models can effectively learn to distinguish between malicious and benign temporal patterns through convolution operations on graphs. To the best of our knowledge, this is the first paper that investigates the applicability of DGCNN to behavioral malware detection using API call sequences.

Summary (4 min read)

1 Introduction

  • According to a report published by AV-TEST [1], 9.74 million new malware specimens were released just in September of 2019, totaling 948 million known specimens in the wild.
  • In order to collect dynamic analysis data, it is often necessary to run the program in a sandbox environment [5].
  • The authors propose a novel behavioral malware detection method that exploits yet another structure of the dynamic analysis data, the graph structure of the API call sequences.
  • To accomplish this task, their method is based on a state-of-the-art Deep Learning architecture designed for graph classification; more specifically, the Deep Graph Convolutional Neural Network [15].
  • The rest of the paper is organized as follows.

3 Background on Deep Graph Convolutional Neural Networks

  • DGCNN is a state-of-the-art neural network architecture that can directly accept graphs of arbitrary structures to learn a graph classification function [15].
  • The augmented diagonal degree matrix of G, D̃i,i = ∑ j Ãi,j for row-wise normalization.
  • Then, the graph convolution operation can be written as follows [15]: Z = f(D̃−1ÃXW ) (1) The graph convolution operation defined by Equation 1 aggregates local substructure information by considering the nodes’ immediate neighborhoods.
  • 4) The ordered graph data is flattened and passed to a standard 1-dimensional CNN layer followed by a fully connected layer to learn a classification function.
  • For a more comprehensive review, please refer to [15].

4 Proposed Method

  • As illustrated in Figure 1, their method has eight sequential steps from data gathering to detection.
  • At this point, the authors have tracked the temporal behavioral information from the PE files and the ordered set of all possible API calls.
  • If multiple graph convolutional layers are stacked together to form a deep network, it is necessary to concatenate their results in order to consider multi-scale substructure features.
  • Finally, the learned representations are passed to a fully connected layer (7), followed by a sigmoid layer (8) binary classification.
  • In the next sections, a more in-depth description of the method is presented.

4.1 Data Collection and Post-Processing

  • The authors introduced a new public domain dataset of 42,797 malware API call sequences and 1,079 goodware API call sequences each [30].
  • On the other hand, the authors were motivated by the desire to provide an open dataset that the research community could further utilize and extend.
  • 3) We built the list of unique API calls, considering all the samples, and then converted each API call name into a unique integer identifier equal to the index of the API call name in the list.the authors.
  • The last column contains the label of the sample, 0 for goodware, and 1 for malware.
  • The authors Cuckoo sandbox environment was based on an Intel Xeon D-1540, 8 cores, 16 threads, 2.6 GHz, 64 GB RAM, and 2 TB SSD running Ubuntu Server 16.04 as the Cuckoo host and 8 32-bit Windows 7 Ultimate VirtualBox virtual machines running in parallel as Cuckoo analysis guests.

4.2 API Call Sequences and Behavioral Graphs Generation

  • On the one hand, API call sequences represent the most important part of the program behavior through time [13].
  • On the other hand, graph structures encode spatial relations, such as adjacency and connectivity, between API calls.
  • The authors method leverages both temporal and spatial information for malware detection.
  • In order to accomplish that, it is necessary to extract the graph structure from the API call sequences to generate their associated behavioral graphs.
  • Figure 2 step I shows the behavioral graph G resulting from the adjacency matrix generated by Equation 3 applied to the API call sequence x = (0, 1, 2, 0, 2, 3).

4.3 Deep Graph Convolutional Neural Networks and Graph Convolutional Layers

  • In order to take advantage of the DGCNN architecture, let us define the node feature matrix X ∈ {0, 1}|N |×L of G as the result of one-hot encoding each xi in the API call sequence x.
  • For the sake of clarity, let us take the product AX and its 2The reader may forgive a little abuse of notation here.
  • Also, notice that the rows of AX represent ordered nodes, and the columns of X represent the behavior of the program in time given by the API call sequence x.
  • Moreover, since the nodes of G are already sorted by their natural order, their model does not require the SortPooling layer introduced in [15], thus reducing its execution time.
  • Finally, the term D̃−1ÃX is multiplied by the weight matrix W , allowing the model to learn higher-level representations.

4.5 The Method

  • In summary, without considering the data collection and post-processing steps, their method can be implemented using Algorithm 2.
  • According to the principles of Deep Learning [7], Algorithm 2 can be extended by stacking the graph convolutional layers or fully connected layers followed by the sigmoid layer for binary classification or a softmax layer multi-class classification.
  • Furthermore, the authors included a Dropout [34] layer after each graph convolutional layer in order to prevent overfitting and used ReLU [35] as the activation function to perform non-linear transformations while preventing the vanishing gradient problem: Algorithm 2: The Model input :API call sequence x.

5 Performance Evaluation

  • First, to measure the performance of their method in detecting malware considering a balanced dataset and the original imbalanced dataset of API call sequences.
  • Second, to establish a fair performance comparison between their models and LSTM networks on the same task.
  • Two experiments were performed for model selection, training and evaluation: Experiment 5.1 and Experiment 5.2.
  • In total, 1,296 models were defined, trained, and evaluated, resulting in 6 optimized models for malware detection using API call sequences.

5.1 Experiment 1

  • In an exhaustive grid search, the model is trained and evaluated with all the hyperparameters combinations.
  • The stratified k-fold cross-validation ensures that each training set split contains a similar proportion of positive and negative samples.
  • Then, the model is trained with k − 1 folds, and then its performance is evaluated using the fold that was left out of the training process.
  • The average of the evaluation performances is an estimate of the model’s performance on unseen data.

5.2 Experiment 2

  • In the second experiment, the original imbalanced dataset of 42,797 malware API call sequences and 1,079 goodware API call sequences was considered without undersampling.
  • Then, the same procedures of Experiment 5.1 were followed.

6.1 Balanced Dataset

  • As the authors can see in Table 2, their models achieve the highest AUC-ROC, F1-score, precision, recall, and accuracy.
  • A particularly important performance metric when evaluating malware detectors is the recall.
  • High precision implies a low number of false positives, which is less critical but is desired for malware detectors.
  • Ideally, both recall and precision should be high, implying a high F1-score.
  • Finally, high accuracy implies a high number of correct overall predictions.

6.2 Imbalanced Dataset

  • As the authors can see in Table 3, LSTM networks achieve the best results, followed by Model-2 and Model1, respectively; however, notice that their models are capable of learning a classification function using considerably fewer parameters and epochs.
  • AUC-ROC is the most reliable metric in this scenario [45] since even the Dummy detector achieves a relatively high F1-score and, consequently, high recall and precision.

6.3 General Considerations

  • In general, their models achieved similar performances to LSTM networks on the proposed task.
  • As Tables 4 and 5 show, Model-1 and Model-2 dropout rates are the highest as opposed to the number of parameters.
  • In fact, their models overfitted the training set just after ten epochs on average, indicating that additional dropout layers or L2 regularization [47], as well as the addition of more examples, could further improve their performance.
  • In addition, notice that their work only took into account one kind of execution trace, the API call sequences.

6.4 Visualization

  • In an attempt to visualize the inner workings of the models, the authors applied Principal Component Analysis (PCA) [46] to the sets of activations in the hidden layer preceding the fully connected layer during the evaluation phases.
  • Deeper layers should contain high-level features that are able to be separated into classes by the fully connected layer.
  • Figure 4 (a) shows the result of PCA applied to the test set, Figure 4 (b) shows the result of PCA applied to LSTM networks, and Figures 4 (c) and (d) show the PCA visualization for Model-1 and Model-2, respectively.
  • Taking that into account, it is interesting to consider how a DGCNN-based behavioral malware classification method would behave in a multiclass classification problem.

7 Conclusion

  • The authors propose a novel behavioral malware detection method based on DGCNNs to learn directly from API call sequences.
  • In order to train, evaluate, and test the models, the authors introduced a new public domain dynamic analysis dataset of more than 40k API call sequences of malware and goodware.
  • Even though DGCNNs are memory-less networks, as opposed to LSTM networks, their results show that the graph structure of the API call sequences plays an essential role in the problem of detecting whether a program is malware.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

Behavioral Malware Detection Using Deep Graph
Convolutional Neural Networks
A Preprint
Angelo Oliveira
Universidade Nove de Julho, Brazil
alpha@angeloliveira.net
Renato José Sassi
Universidade Nove de Julho, Brazil
renato.sassi@ieee.org
October 24, 2019
Abstract
Malware behavioral graphs provide a rich source of information that can be leveraged for
detection and classification tasks. In this paper, we propose a novel behavioral malware
detection method based on Deep Graph Convolutional Neural Networks (DGCNNs) to learn
directly from API call sequences and their associated behavioral graphs. In order to train
and evaluate the models, we created a new public domain dataset of more than 40,000
API call sequences resulting from the execution of malware and goodware instances in a
sandboxed environment. Experimental results show that our models achieve similar Area
Under the ROC Curve (AUC-ROC) and F1-Score to Long-Short Term Memory (LSTM)
networks, widely used as the base architecture for behavioral malware detection methods,
thus indicating that the models can effectively learn to distinguish between malicious and
benign temporal patterns through convolution operations on graphs. To the best of our
knowledge, this is the first paper that investigates the applicability of DGCNN to behavioral
malware detection using API call sequences.
K eywords
malware detection
·
behavioral graphs
·
deep graph convolutional neural networks
·
deep
learning · computer security
1 Introduction
According to a report published by AV-TEST [1], 9.74 million new malware specimens were released just in
September of 2019, totaling 948 million known specimens in the wild. Dealing with the rapid increase of
Corresp onding author.

A preprint - October 24, 2019
malware in number, complexity, and variability requires the research and development of new intelligent,
automatic malware detection methods capable of scaling accordingly. There are two main approaches to
detecting malware; static malware analysis, and dynamic malware analysis [2]. On the one hand, static
malware analysis can be conducted quickly by comparing a set of handcrafted features of the incoming file to
previously observed malware features or signatures, which makes static analysis vulnerable to code obfuscation
techniques employed by polymorphic and metamorphic malware [3], as well as to complete new specimens of
malware or zero-days. Traditional signature-based malware detection methods are the cornerstone of the
majority of the commercial endpoint protection systems since they are relatively fast and do not depend
on any additional infrastructure to collect and analyze the data; however, they require expert knowledge to
reverse engineer malware instances and produce the features that will be used for detection. Evidently, this
approach does not scale as fast as malware production. On the other hand, dynamic malware analysis or
behavioral analysis is based on behavioral data such as API or system calls, which is harder to obfuscate [4].
In order to collect dynamic analysis data, it is often necessary to run the program in a sandbox environment
[5]. A sandbox provides a controlled and isolated environment for the guest program to run while monitoring
and tracking its activities. Once the data is collected and preprocessed, it can be used to feed behavioral
detection algorithms [6]. Deep Learning algorithms have shown unprecedented success in various domains
such as image classification, natural language processing, and speech recognition [7]. Following this trend,
Deep Learning algorithms have also been applied to malware detection and classification tasks using static
and dynamic analysis data exploiting its temporal [8, 9, 10], spatial [11, 12], or spatio-temporal [13, 14]
structure. In this work, we propose a novel behavioral malware detection method that exploits yet another
structure of the dynamic analysis data, the graph structure of the API call sequences. To accomplish this
task, our method is based on a state-of-the-art Deep Learning architecture designed for graph classification;
more specifically, the Deep Graph Convolutional Neural Network (DGCNN) [15]. Due to their capability of
learning from non-Euclidean data such as graphs, Graph Neural Networks (GNNs) [16, 17] can be applied to
problems in a vast range of domains from protein classification [18] to Materials science [19]. By defining a
graph structure to represent the API call sequence of a program, we combine both the spatial and temporal
information from its behavior. Then, we introduce a simplified version of the DGCNN to learn high-level
representations that can be used by a classifier to detect whether the program is malware. Experimental
results show that the proposed method achieves similar AUC-ROC [20] and F1-Score to specialized Deep
Learning architectures for sequence learning such as LSTM networks [21], widely used as the base architecture
for behavioral malware detection methods [22]. In particular, our models achieve higher AUC-ROC, F1-Score,
Precision, Recall, and Accuracy than LSTM networks when trained and tested on a balanced dataset. To the
best of our knowledge, this is the first paper that investigates the applicability of DGCNN to behavioral
malware detection using API call sequences. The rest of the paper is organized as follows. Related work is
reviewed in Section 2. A brief background on DGCNNs is introduced in Section 3. The proposed method is
2

A preprint - October 24, 2019
detailed in Section 4. Performance evaluation is described in Section 5. Results, discussion, limitations, and
future work are presented in Section 6. Finally, conclusions are drawn in Section 7.
2 Related Work
Classical Deep Learning algorithms such as Convolutional Neural Networks (CNNs) [23] and LSTM networks
have been successfully applied to malware detection and classification problems using both static and dynamic
analysis data [22]; however, Deep Learning on graphs has been mainly applied to data extracted employing
static analysis methods. [24] proposed a malware classification method (MAGIC) based on a modified version
of the DGCNN to learn directly from attributed control flow graphs (ACFGs) extracted from disassembled
binaries, in which each vertex summarizes code characteristics as numerical values. [25] introduced a
malware detection approach using graph embedding to map the control flow graphs (CFGs) extracted from
disassembled binaries to low-dimensional vectors as inputs for two stacked denoising autoencoders (SDAs)
that are responsible for representation learning. [26] presented a method for Android malware detection that
creates CFGs using data extracted from decompiled applications’ source codes and information from their
manifest files. Then, a Graph Convolutional Network (GCN) [27] was used to learn high-level representations
that could be used in detection tasks. [28] studied the effectiveness of DGCNNs in processing large-scale
graphs with hundreds of thousands of nodes by conducting experiments on malware detection and software
defect prediction. Our work follows a similar approach to [24] and shares the same theoretical basis on
applying DGCNNs for classification tasks [15]. However, we use the standpoint of dynamic analysis by
extracting behavioral graphs from the API call sequences and using both the API call sequences and the
behavioral graphs as inputs to a modified version of the DGCNN. In addition, the standard LSTM network
was chosen as a benchmark since it has been successfully applied as the base architecture for several behavioral
malware detection and classification methods using API call sequences data [22].
3 Background on Deep Graph Convolutional Neural Networks
DGCNN is a state-of-the-art neural network architecture that can directly accept graphs of arbitrary structures
to learn a graph classification function [15]. In other words, DGCNNs deal with the task of graph classification
as opposed to node classification [27]. Let
G
be a directed graph of order
n N
and
A Z
n×n
its associated
adjacency matrix. Now, let us define the following: The augmented adjacency matrix of
G
,
˜
A
=
A
+
I
n
, to
ensure that the convolution operation takes into account the features of each node as well as its neighbours’
features. The augmented diagonal degree matrix of
G
,
˜
D
i,i
=
P
j
˜
A
i,j
for row-wise normalization. The node
feature matrix
X R
n×c
,
c N
, where each row of
X
is a node “feature descriptor”, and each column of
X
is a node “feature channel. The matrix of learning parameters
W R
c×c
0
, where
c
0
N
is the number
of output feature channels, with the non-linear activation function
f
:
R
n×c
0
R
n×c
0
. Then, the graph
3

A preprint - October 24, 2019
convolution operation can be written as follows [15]:
Z = f(
˜
D
1
˜
AXW ) (1)
The graph convolution operation defined by Equation 1 aggregates local substructure information by
considering the nodes’ immediate neighborhoods. In order to extract multi-scale substructure features,
Equation 1 can be stacked to form a deep network, using the following recurrence relation [15]:
Z
(t+1)
= f(
˜
D
1
˜
AZ
(t)
W
(t)
), where Z
(0)
= X and W
(t)
R
c
t
×c
t+1
, t N (2)
In summary, standard DGCNNs have four sequential steps [15]: 1) Graph convolutional layers generalize the
convolution operation from Euclidean domains or grid-like structures such as image data to non-Euclidean
domains such as graph data by generating node representations as the aggregation of their own feature
descriptors and their neighbors’ feature descriptors. 2) Unordered graph data from each convolutional
layer are concatenated along their feature channels (or columns), resulting in matrix
Z R
n×
P
t
c
t
. 3) A
SortPooling layer sorts the unordered graph data according to their feature descriptors or structural roles.
This step guarantees that the nodes of different graphs will be placed in similar positions, according to their
weighted feature descriptors. 4) The ordered graph data is flattened and passed to a standard 1-dimensional
CNN layer followed by a fully connected layer to learn a classification function. For a more comprehensive
review, please refer to [15].
4 Proposed Method
As illustrated in Figure 1, our method has eight sequential steps from data gathering to detection. First of
all, Portable Executable (PE) files (1) are fed to a Cuckoo Sandbox [29] environment (2), which in turn runs
the PE files and generates raw JSON reports containing dynamic analysis data such as API call sequences,
generated traffic and dropped files (3). Next, the API call sequences are extracted from the reports and
post-processed in order to identify and convert the API calls into ordinal categorical values (4). At this point,
we have tracked the temporal behavioral information from the PE files and the ordered set of all possible API
calls. Behavioral graphs are then generated based on both the API call sequences and the set of API calls
(5), and both are passed to a graph convolutional layer to learn high-level representations of the spatial and
temporal relations among the API calls (6). If multiple graph convolutional layers are stacked together to
form a deep network, it is necessary to concatenate their results in order to consider multi-scale substructure
features. Finally, the learned representations are passed to a fully connected layer (7), followed by a sigmoid
layer (8) binary classification. In the next sections, a more in-depth description of the method is presented.
4

A preprint - October 24, 2019
(1)
PE Files
(2)
Cuckoo Sandbox
(3)
Cuckoo Reports
(4)
API Call Sequences
(5)
Behavioral Graphs
Generation
(6)
Graph Convolutional
Layers
(7)
Fully Connected
Layers
(8)
Sigmoid Layer
Data Collection and Post-Processing
High-level Features Binary Classification
Temporal Data
Spatial Data
Raw PE Files Runtime Enviroment Raw JSON Files
Figure 1: High-level flow of the proposed method.
4.1 Data Collection and Post-Processing
We introduced a new public domain dataset of 42,797 malware API call sequences and 1,079 goodware API
call sequences each [30]. Our motivation was twofold. On the one hand, we were motivated by the lack
of public domain PE dynamic malware analysis dataset for training and evaluating our models. On the
other hand, we were motivated by the desire to provide an open dataset that the research community could
further utilize and extend. Malware samples were collected from VirusShare [31], and goodware samples were
collected from both portablepps.com [32] and a 32-bit Windows 7 Ultimate directory. Both online download
and local goodware were included to increase the variability of the dataset and decrease its imbalance. In
order to gather the API call sequences from each sample, we chose Cuckoo Sandbox, which is a largely used,
open-source automated malware analysis system capable of monitoring processes behavior while running in
an isolated environment. Once the data was collected, three additional post-processing steps were performed.
1) Similar to [13], it was considered the first 100 non-consecutive repeated API calls to avoid tracking loops.
2) Since in malware detection tasks, it is prominent to recognize malicious patterns as early as possible, the
sequences were extracted from the parent process only. 3) We built the list of unique API calls, considering
all the samples, and then converted each API call name into a unique integer identifier equal to the index of
the API call name in the list. As a result, 307 distinct API calls were identified. We produced a dataset where
the first column contains the MD5 hash of the sample. The next 100 columns contain ordinal categorical
values between 0 and 306, representing the API call sequence of the sample. The last column contains the
label of the sample, 0 for goodware, and 1 for malware. The total running time to collect the data was about
3000 hours, resulting in approximately 50,000 Cuckoo JSON report files and 1.5 TB of raw data. Our Cuckoo
sandbox environment was based on an Intel Xeon D-1540, 8 cores, 16 threads, 2.6 GHz, 64 GB RAM, and
2 TB SSD running Ubuntu Server 16.04 as the Cuckoo host and 8 32-bit Windows 7 Ultimate VirtualBox
virtual machines running in parallel as Cuckoo analysis guests.
5

Citations
More filters
Journal ArticleDOI
TL;DR: This paper proposes GuardHealth: an efficient, secure and decentralized Blockchain system for data privacy preserving and sharing, which prevents data sharing without permission and is applicable for smart healthcare system.

50 citations

Journal ArticleDOI
TL;DR: A behavioral heuristic was developed that effectively identified malicious API call sequences that were deceptive or mimicry and introduced a confidence metric to the model classification decision.

23 citations

Journal ArticleDOI
TL;DR: This experience report explores the tuning and optimization of the tools underlying binary malware detection and classification, and identifies heuristics and SMT solver tactics for the effective symbolic execution of binary files.

14 citations

Journal ArticleDOI
TL;DR: The method of using graph networks to analyze and evaluate behavior profiles helps improve the efficiency of the process of analyzing and detecting APT malware on the workstation.

10 citations

Proceedings ArticleDOI
08 Nov 2020
TL;DR: In this paper, two experiments are performed for balanced and imbalanced data on a previously build a dataset of malware detection on API calls using various machine learning classifiers like k-Nearest Neighbors, Gaussian Naive Bayes, Multi Naive Bays, Decision Tree, and Random Forest.
Abstract: There is a tremendous growth of malware with each passing day. It has become difficult to cope up with such an increasing number of malware, especially with new and unseen malware. It has posed a serious threat to software and the internet. Malware and machine learning is like a pair made in heaven. The malware contains various similar patterns due to the reuse of code while machine learning is used to detect those similarities. In this paper, two experiments are performed for balanced and imbalanced data on a previously build a dataset of malware detection on API calls using various machine learning classifiers like k-Nearest Neighbors, Gaussian Naive Bayes, Multi Naive Bayes, Decision Tree, and Random Forest. In both experiments, Random Forest provides the best results with an accuracy of 90.38% on a balanced dataset and 98.94% on an imbalanced dataset.

9 citations

References
More filters
Patent
27 Jun 2007
TL;DR: In this paper, a new method and apparatus for protecting applications from local and network attacks is presented, which is based on creating a sandbox at application and kernel layer. But it is not suitable for the use of this method for monitoring and controlling the behavior and access privileges of the application and only selectively granting access.
Abstract: The disclosed invention is a new method and apparatus for protecting applications from local and network attacks. This method also detects and removes malware and is based on creating a sandbox at application and kernel layer. By monitoring and controlling the behavior and access privileges of the application and only selectively granting access, any attacks that try to take advantage of the application vulnerabilities are thwarted.

279 citations

Frequently Asked Questions (13)
Q1. What are the contributions mentioned in the paper "Behavioral malware detection using deep graph convolutional neural networks" ?

In this paper, the authors propose a novel behavioral malware detection method based on Deep Graph Convolutional Neural Networks ( DGCNNs ) to learn directly from API call sequences and their associated behavioral graphs. In order to train and evaluate the models, the authors created a new public domain dataset of more than 40,000 API call sequences resulting from the execution of malware and goodware instances in a sandboxed environment. To the best of their knowledge, this is the first paper that investigates the applicability of DGCNN to behavioral malware detection using API call sequences. 

Future work will explore deeper architectures as well as the problem of multiclass malware classification using API call sequences and their associated behavioral graphs. 

the model is trained with k − 1 folds, and then its performance is evaluated using the fold that was left out of the training process. 

If multiple graph convolutional layers are stacked together to form a deep network, it is necessary to concatenate their results in order to consider multi-scale substructure features. 

AUC-ROC is the most reliable metric in this scenario [45] since even the Dummy detector achieves a relatively high F1-score and, consequently, high recall and precision. 

According to a report published by AV-TEST [1], 9.74 million new malware specimens were released just in September of 2019, totaling 948 million known specimens in the wild. 

In total, 1,296 models were defined, trained, and evaluated, resulting in 6 optimized models for malware detection using API call sequences. 

Due to their capability of learning from non-Euclidean data such as graphs, Graph Neural Networks (GNNs) [16, 17] can be applied to problems in a vast range of domains from protein classification [18] to Materials science [19]. 

In order to train, evaluate, and test the models, the authors introduced a new public domaindynamic analysis dataset of more than 40k API call sequences of malware and goodware. 

The total running time to collect the data was about 3000 hours, resulting in approximately 50,000 Cuckoo JSON report files and 1.5 TB of raw data. 

In the second experiment, the original imbalanced dataset of 42,797 malware API call sequences and 1,079 goodware API call sequences was considered without undersampling. 

the authors use the standpoint of dynamic analysis by extracting behavioral graphs from the API call sequences and using both the API call sequences and the behavioral graphs as inputs to a modified version of the DGCNN. 

Experimental results show that the proposed method achieves similar AUC-ROC [20] and F1-Score to specialized Deep Learning architectures for sequence learning such as LSTM networks [21], widely used as the base architecture for behavioral malware detection methods [22].