Journal Article•DOI•

Data exfiltration

Faheem Ullah, Matthew Edwards¹, Rajiv Ramdhany², Ruzanna Chitchyan¹, M. Ali Babar, Awais Rashid¹ - Show less +2 more•Institutions (2)

University of Bristol¹, Lancaster University²

01 Jan 2018-Journal of Network and Computer Applications (Academic Press)-Vol. 101, pp 18-54

TL;DR: A review of data exfiltration attack vectors and countermeasures revealed that most of the state of the art is focussed on preventive and detective countermeasures and significant research is required on developing investigative countermeasures that are equally important.

read less

About: This article is published in Journal of Network and Computer Applications.The article was published on 2018-01-01 and is currently open access. It has received 76 citations till now.

...read moreread less

Summary (11 min read)

Jump to: [1. Introduction] – [2. Related work] – [Objective] – [Included papers] – [Results] – [Our Contributions] – [3. Research Methodology] – [3.1. Data Source and Search Strategy] – [3.2. Selection of the Papers] – [3.3. Data Extraction and Synthesis] – [4. RQ1. Data Exfiltration Attack Vectors] – [4.1. Network-based Attack Vectors] – [4.1.2. Passive monitoring] – [4.1.3. Timing channels] – [4.1.4. Virtual machine vulnerabilities] – [4.1.5. Spyware and Malware] – [4.1.6. Phishing] – [4.1.7. Cross Site Scripting] – [4.2. Physical Attack Vectors] – [4.2.1. Physical theft] – [4.2.2. Dumpster diving] – [5. RQ2. Countermeasures] – [5.1. Classification of countermeasures] – [5.1.1.2.1. Mandatory Access Control] – [5.1.1.2.2. Role-based Access Control] – [5.1.1.2.3. Discrete Access Control] – [5.1.2. Detective countermeasures] – [5.1.2.1.1. Known channel inspection] – [5.1.2.1.2. Deep packet inspection] – [Inspecting steganographic traffic] – [Inspecting encrypted traffic] – [Inspecting unencrypted traffic] – [5.1.2.2.1. Network-based anomaly detection] – [Semi-supervised mode] – [Unsupervised mode] – [Supervised mode] – [NBA] – [HBA] – [5.1.3. Investigative countermeasures] – [5.2. Mapping of Attack Vectors to Countermeasures] – [6. RQ3. Open and Future Challenges] – [6.1. Performance] – [6.2. Evaluation] – [6.3. Automation] – [6.4. Privacy, Encrypted Traffic, and Accuracy] – [6.5. Investigative Countermeasures] – [6.6. High Cost] – [7. Limitations] and [8. Conclusion]

1. Introduction

Data theft (formally referred to as data exfiltration) is one of the main motivators for cyber-attacks irrespective of whether carried out by organised crime, commercial competitors, state actors or even “hacktivists”.
The attack can either be network-based or physical-based.
This report also reveals that among all the data leaks, about 24% occurred in the financial sector, about 15% occurred in the healthcare sector, about 15% occurred in retail and accommodation sector, and about 12% occurred in public sector entities.
The authors have systematically analysed the countermeasures in terms of their contributions and limitations.
As their aim is to survey both attack vectors and a broad set of countermeasures — preventive, detective and investigative — the authors refer to this topic as “data exfiltration” rather than “data leakage prevention”, which implies a specific focus on preventive measures.

Objective

The authors review differs from the existing reviews in two ways: (1) whilst all the existing reviews report challenges in preventing or mitigating data exfiltration, their review reports the attack vectors used to exfiltrate data.
By challenges, the authors mean factors such as several leakage channels, difficulty in managing access right, encryption, and steganography.
Their review does not provide any such details rather provides insight into data exfiltration caused by malicious activities of a remote attacker.
Therefore, the scope of this review is limited to data exfiltration from computers, web servers, databases, virtual machines, and network.
The authors have excluded the papers that report data exfiltration from domains such as mobile devices, IoT devices, and printers.

Included papers

The authors review significantly differs from the existing reviews in terms of the included papers.
A major reason for such a huge difference in the pool of papers is that the existing reviews (i.e. [2], [3], and [4]) are primarily focused on the insider attacks and industrial countermeasures while their review focuses on external attackers and research-based countermeasures.
The review of Brindha and Shaji [5] is focussed on data exfiltration challenges and does not report any countermeasures.

Results

The findings from their review do not overlap with the findings from the existing reviews.
Similar to [2] and [3], their review also presents a classification of the countermeasures, however, their criteria and the resulting classification are quite different to the classifications presented in [2] and [3].
This classification overlaps with their classification to a certain degree.

Our Contributions

This paper provides a broad and structured overview on data exfiltration.
Highlight several distinct open issues and challenges that require the immediate attention of the research community.

3. Research Methodology

The authors followed a structured process of identifying and selecting the relevant papers from which the relevant data was extracted and analysed to answer the research questions.
Table 2 shows the research questions and their respective motivators that stimulated their analysis of the reviewed papers.
To get an overview of the countermeasures designed and incorporated by research community for fighting against data exfiltration attacks.

3.1. Data Source and Search Strategy

Seven computer science publication databases shown in Table 3 were each queried for four search terms: “data exfiltration”, “data leakage”, “data breach” and “data theft”.
The authors derived these search terms from a series of pilot searches, wherein various synonyms for data exfiltration were explored with the aim of finding a set of results which appeared most relevant and neither too broad nor too narrow.
In the rest of this survey, the authors use the terms paper, study, and document interchangeably for referring to the papers selected for this survey.
Table 3. Database sources Source URL ACM http://portal.acm.org.

3.2. Selection of the Papers

After retrieving the documents from these databases, the authors reviewed the title of each document and made a binary decision as to whether the full text would be relevant to the study’s aims (i.e., it appeared to detail either exfiltration attack vectors or countermeasures).
After the selection based on the title, the full text of each of the selected papers was reviewed.
Some papers were discarded due to the lack of relevance of the full text to their research questions, bringing the pool of papers to its final total of 108 papers.

3.3. Data Extraction and Synthesis

After selecting 108 papers, the authors extracted the data using a pre-designed data exfiltration form for answering the research questions.
The six steps include: Familiarizing with the data: Data extracted from papers and recorded in excel sheet were read carefully to get deep understanding of the data exfiltration attack vectors, the countermeasures, and the research gaps.
After deep understanding of the extracted data, initial codes were assigned to the key points in the data.
The themes were analysed to divide the attack vectors and countermeasures into potential themes at multiple levels.
The themes at all levels were reviewed and required modifications were made.

4. RQ1. Data Exfiltration Attack Vectors

This section reports the results of data analysis about data exfiltration attack vectors.
For the details on these attack vectors, readers are referred to [14-16].
The authors highlight only those attack vectors that are mostly reported in their included data exfiltration countermeasures (108 countermeasures).
Fig. 3 shows that their initial categories are network and physical.

4.1. Network-based Attack Vectors

Network-based attack vectors include those vectors that use existing network infrastructure for stealing data from an organization.
The identified network-based attack vectors are shown in Fig. 3 and described in following sub-sections.

4.1.2. Passive monitoring

Sniffing the wireless broadcast traffic is well-known but usually overlooked data exfiltration vector.
With many wireless networks still insufficiently secure [20], and businesses making more and more use of wirelessconnected laptops, tablets, smartphones and other devices, the threat of attackers passively listening to an organisation’s traffic is a very real one, and many connected devices leak information [21].
Such broadcastinterceptions are not limited to typical wireless networks.
A notable example can be the incident in 2009 when it was discovered that military adversaries of the United States in Iraq were able to access the video feeds of Predator drones by simply listening on the correct channel [22].

4.1.3. Timing channels

This method of data exfiltration appears in the literature describing threats but rarely in the literature aiming to detect or prevent exfiltration, yet it is a plausible exfiltration vector for sophisticated attackers.
A timing channel is an extremely subtle form of a hidden channel which works by sending innocuous packets to an external recipient at particular times, such that the time delay between packets represents a particular byte value [15].
Such a vector is very difficult to detect, as any traffic could potentially be carrying a timing channel, and the communicated information is not embedded in the packets themselves, merely in the delay between them.
Examples are channel operating locally via network sockets [23] and even in variations on keyboard usage [24].

4.1.4. Virtual machine vulnerabilities

Modern businesses are increasingly making use of Virtual Machines (VM) hosted by a third party.
These threats mostly come from co-residency, where a malicious virtual machine is set up on the same physical machine as a target virtual machine.

4.1.5. Spyware and Malware

Spywares are installed on user’s computer to monitor user’s activity and report back to a third party [30].
Such software is normally used by software providers to enhance their performance by sending relevant updates to users based on their activities.
Spyware includes malware, adware, cookies, web bugs, browser hijackers and key loggers [15].
Recently designed malware has the capability to scan user’s personal computer for personal information and send back such information as an email attachment to all email contacts of the user.

4.1.6. Phishing

Upon visiting the fraudulent website, the user is asked to enter username, password, bank account number and similar details, which ultimately leads to landing sensitive personal information in the hands of the hacker.
Some of the famous types of phishing attacks include Deceptive phishing, DNS-based phishing, and Search Engine phishing [32].

4.1.7. Cross Site Scripting

Cross Site Scripting (XSS) is another way of stealing personal information from an authenticated session by injecting a malicious script in an attacked website [33].
Once the malicious script executes, it gives an attacker full access to the information held by the trusted website [34].
XSS is a popular method among attackers for stealing information as evident from the recent OWASP ranking [35] where it is considered as the third biggest attack vector for leakage of personal and sensitive information.

4.2. Physical Attack Vectors

Physical attack vectors include those attacks that get unauthorized and illegal physical access to data and move it to a new physical location.
The identified physical attack vectors are shown in Fig. 3 and described in the following sub-sections.

4.2.1. Physical theft

It is also possible to first print data and then hide printed material while leaving premises of an organization.
Instead of copying or printing sensitive data, an attacker may steal some physical device on which sensitive data is stored [14].
This may happen due to weak physical security at an organization’s premises or due to the carelessness of some individual employee who left a device unattended.

4.2.2. Dumpster diving

At times, organizations adopt weak practices for destroying information (both hard and soft) such as throwing printed documents or CDs into a dustbin.
If an organization throws printed documents and CDs in a dustbin, it is quite possible that an adversary may search the dustbin and sees if there is some information of potential value.
Establishment of covert channel for exfiltration of data between two VMs hosted on same physical machine 5 Spyware and malware Software used by remote attackers to identify personal information in a computer and send it back to the attacker via some medium such as email attachment. [14, 15, 30, 36] 6 Phishing.
An individual is invited to visit a fraudulent website and visiting the website in turn leak the personal information of the individual. [31, 37] 7 8 Physical theft Copying sensitive data to a removable device (CD, DVD, USB etc.) and taking device out of the organization [14-16].

5. RQ2. Countermeasures

This section reports results of the data analysis about data exfiltration countermeasures.
At the abstract level, these countermeasures can be divided into three categories based upon whether a countermeasure is preventing, detecting or investigating data exfiltration.
The number and percentage of the selected studies related to each of these three categories are shown in Fig.
It can be seen that the research community is primarily focussing on preventive and detective countermeasures, while investigative countermeasures lack sufficient exploration.
Data at rest is the data that is stored in a storage device (hard drive or mobile device) and is not currently under any kind of processing.

5.1. Classification of countermeasures

Since the number of countermeasures pertaining to each of the three basic categories (preventive, detective and investigative) was quite high, the authors have further categorized the countermeasures based on their thematic analysis as reported in Section 3.3.
In rest/in transit N u m b er & p er ce n ta ge o f st u d ie s Data State Number of papers Percentage of papers Fig. 7. Classification of Data exfiltration countermeasures 5.1.1.
These countermeasures are incorporated in the endpoint devices (such as PCs, Laptops, and Servers) to control access to the data resided on these devices or apply particular security tactics (such as encryption, data classification, and cyber deception) to help secure data against exfiltration attacks.
Countermeasures Preventive Detective Investigative Data classification Access control (AC) Encryption Cyber Deception Distributed storage Low-level snooping defence Packet inspection Anomaly-based detection Known channel inspection Deep packet inspection Inspecting steganographic traffic Inspecting encrypted traffic Inspecting normal traffic Discrete AC Mandatory AC Role-based AC Network-based anomaly Host-based anomaly Network +.
Papers pertaining to each category in Preventive countermeasures.

5.1.1.2.1. Mandatory Access Control

Access to resources or data is controlled by the access policy defined by an administrator and enforced via the operating system.
Authors provide a detailed experimental evaluation of the proposed approach.
The reference monitor takes input from access control module about security policies and ensures the enforcement of these security policies in a MapReduce system.
At the same time, the architecture focuses on reducing the burden of key’s management on the client side by storing the security control information for each file in the form of a separate security control file along with the corresponding encrypted data file.
Suzuki et al. [59] develop an operating system, called Salvia, with a focus on preventing data leakage.

5.1.1.2.2. Role-based Access Control

Unlike mandatory access control where data objects are labelled, in role-based access control users are assigned a particular role (e.g., developer, tester, accountant) and based on the role of the user, access is granted to various resources [47].
The authors claim that the proposed approach does not introduce any overhead in performance and deployment, which seems unrealistic based on the fact that introduction of three security layers would definitely have some effect on query processing time.
Specified relations between DTE objects then define the access controls across a network.
FlowWatcher sits between the user and web application to monitor HTTP requests and responses.
The work of Fabian provides some theoretical guidance for the security-aware usage of USB devices in an organization; asset-inventory in organisations need to be extended to output ports on machines and secure configurations for such machines must be put in place to avoid the use of output ports if not authorised.

5.1.1.2.3. Discrete Access Control

In discretionary access control, the onus of regulating access to data objects fall on their respective owners [47].
The data owner may grant only read access to one user but another user may have both read and write access.
Ko et al. [68] implement a kernel-level access control mechanism to discover and notify an end user about data transmission (both authorized and unauthorized) and enables the end user to take the required action.
Parties wishing to gain access to some portion of the sensitive data send an autonomous agent to run locally, where it has access to the data in order to search for the particular information its owner requires.
The authors tie the dissemination of files via USB storage to the classification of a file.

5.1.2. Detective countermeasures

Detective countermeasures aim to detect exfiltration attempts.
Unlike preventive countermeasures that are proactive in nature, detective countermeasures are reactive that detect data exfiltration attacks and stop them wherever possible.
This figure shows the incorporation of content inspection, host-based anomaly detection, and network-based anomaly detection.
If network behaviour deviates from the normal behaviour, transfer of data will be either stopped or security administrator will be alerted.
Fig. 11 shows the papers pertaining to each category in detective countermeasures.

5.1.2.1.1. Known channel inspection

This is a simple approach where outgoing network traffic is monitored on some known high-risk channel.
The ubiquity and simplicity of email as a transfer mechanism, combined with the relative ease with which it can be monitored via a mail proxy, make it a good target for detection systems.
The main objective of this work is to help a common user differentiate between phishing and non-phishing emails and so avoid getting into an interaction with such emails.
The proposed technique consists of two steps.
First, monitoring of the data transmission on the known communication channel; second, computing a relation between data observed on the channel and the confidential data.

5.1.2.1.2. Deep packet inspection

Unlike known channel inspection, deep packet inspection monitors all outgoing traffic for an overlap with sensitive data.
Such an approach provides higher level of detection capability as it ensures that not even a single data packet goes un-inspected.

Inspecting steganographic traffic

One of the common techniques used by hackers is to hide the sensitive data inside other non-sensitive data so that the installed detective system cannot detect the sensitive data.
Modern steganography works by identifying either redundant space within innocuous files or unused fields in common communication protocols (including the ubiquitous TCP/IP) and then encoding the message into these overlooked areas [107].
Since the proposed approach does not stop the egress of video and only removes the hidden data from the frames, it is not clear whether the removal of such hidden data causes any damage to the quality of the video.
The first is application identification – allowing network administers to enforce when traffic, which uses the same protocol (SSH, HTTP), is devoted to a particular application (webmail, video streaming, and social media).
If carrier data comes under an attack on its way, this secret message hidden in reserved bits alerts the communicating parties that carrier data is under an attack.

Inspecting encrypted traffic

Another technique used by attackers to evade detection is to first encrypt the data and then steal it.
The reverse-proxy will decrypt and inspect the client request and then engage in a TLS-encrypted communication with the remote server on behalf of the client.
The Data guard sends results of detection process to the Policy authority that checks if data to be exported contains any sensitive signatures before sending the data to a recipient.
The authors do not provide any details on the detection rate achieved by the proposed approach.
They discuss three possible approaches to handling encrypted communications within this system: detecting misuse of the encryption protocols, altering protocols to allow packet payload analysis, and finally statistical approaches, which examine packet sizes and time intervals.

Inspecting unencrypted traffic

A variety of approaches exist that can inspect data that is neither hidden nor encrypted.
The proposed approach seems suitable for prevention against malware attacks that gather and send out sensitive information out of the user’s PC via a P2P channel.
In order to train the dataset, documents (both confidential and non-confidential) are clustered using K-means algorithm.
More rigorous evaluation is required since the paper does not clarify the detection accuracy of the proposed idea or its effects on the overall performance.

5.1.2.2.1. Network-based anomaly detection

Network-Based Anomaly (NBA) detection techniques monitor network traffic to determine whether communication flows differ from baseline conditions in terms of traffic volume, source/destination address pairs, diversity of destination addresses, and time of day or the (mis) use of particular network protocols.
The difficulty with this category of techniques lies in the determination of the baseline conditions for ‘normal’ user activity.
A sufficient number of anomaly-based detection techniques have been presented and delineating all of them is challenging.
Network traffic under monitoring is compared with both the classes (normal and abnormal) to decide which class it belongs to.
ELM is used for classification of intrusion attempts.

Semi-supervised mode

Data gathered from system level events through runtime monitors is represented in the form of quantitative data flow graph.
They evaluate a number of implementations of the Local Outlier Factor (LOF) algorithm, with the greatest reduction in processing time coming from the combination of a kd-tree index of neighbours and the Approximated k-Nearest Neighbours algorithm.
The system works in three steps: (1) Parser ensures that traffic using IEC 60870-5-104 protocol is compatible with Bro framework (2) learning component categorize packets into whitelists and record timing statistics (3) detector compares each packet with the three whitelists and if it does not match with anyone, then it is considered abnormal network traffic and an alarm is raised.
The authors test the developed anomaly detection system with a number of attacks such as malware, man-in-the-middle, and spoofing attack.
The collected data is analysed to identify the behaviour patterns of data stealers for alerting organizations against the identified behaviours patterns.

Unsupervised mode

DBMSs have their own known vulnerabilities when it comes to data exfiltration, particularly with regard to their common deployment as part of the web applications.
They suggest that a “protective shell” be introduced into the DBMS which learns about the legal and illegal query strings for a given application.
It is very likely that the introduction of this new layer affects query performance; the paper neither talks about specific performance issues nor provides any evaluation details.
Flood and Keane [155] propose a similar approach in the context of cloud services.
In their version, a Finite State Machine is built from observations of training data.

Supervised mode

Berlin et al. [144] present a malicious behaviour detection system using windows audit logs.
Logs are collected from users of an enterprise and sandboxed virtual machine.
This labelling is done using VirusTotal that runs around 55 malware engines over the samples to find whether a sample is a malware or not.
The proposed approach detects phishing and malware attacks launched to steal users’ access credentials.
Whilst users do not typically emit system calls themselves, their typical use of programs is captured by these patterns; at the same time, a model of system calls rather than selected execution of programs would capture maliciously-injected code that a user would be unaware of.

NBA

In transit Packet arrival time is used for detection.
Any packet that deviates from expected time arrival specify malicious activity.
Malware attack, Man-in-themiddle attack, spoofing attack Wüchner et al. [147].
In use Monitors database activity in a more effective manner using densitybased outlier for detecting data exfiltration SQL Injection attack, XSS attack Yang et al. [153].
In use A technique for collection of news stories and other reports on data stealers from online sources and performing statistical analysis to identify the behaviour patterns of such data stealers and help organizations to be careful against such behaviours.

HBA

In use A technique for training database to protect itself from SQL injection attacks SQL Injection attack Flood & Keane [155].
In use/in transit Correlates the CPU activity with network activity and any large size data transfer which doesn’t correlate with host CPU is considered exfiltration of data SQL Injection attack, XSS attack, Malware attacks, Phishing attack Myers et al. [157].
In use/in transit A combination of network and host-based analysis to detect exfiltration Malware attacks.
In use/in transit Framework for gathering network and host data and analysing it for detecting data exfiltration.

5.1.3. Investigative countermeasures

After successful data exfiltration, it is usually not possible to reverse its impact.
Investigating a data exfiltration incident can help in mitigating the effects of an attack.
Information gathered during such investigation can also be useful for other enterprises.
Of course, at times it may not be possible for an enterprise to share such sensitive information with broader community due to confidentiality reasons.
Fig. 12 shows the papers pertaining to each category in investigative countermeasures.

5.2. Mapping of Attack Vectors to Countermeasures

Fig. 13 shows the mapping of the attack vectors identified in section 3 onto the countermeasures reported in section 4.3.
This mapping provides a reader with an understanding of which attack vectors are addressed by which countermeasures.
It is imperative to mention here that some of the preventive and detective countermeasures adopt an aggressive and proactive approach (instead of reactive) for protecting data.
Zhuang et al. [90] propose to distribute data smartly among multiple clouds which would reduce the risk of leaking all data in a single attack.
The countermeasures that address attack vectors other than those reported in Section 3 (such as brute-force attack and flooding attack) are also not shown in Fig. 13.

6. RQ3. Open and Future Challenges

According to one statistics, the total annual cost of cybercrime is around $400 billion [180].
Data exfiltration is the main motivator for these attacks.
These systems include Intrusion Detection System, Intrusion Prevention Systems, Security Information and Event Management (SIEM) system, Anti-malware, and Firewalls.
Whilst these data exfiltration countermeasures have recently attracted the attention of the research community, there exist several open and challenging issues.

6.1. Performance

The in-depth critical analysis of around 108 countermeasures reveals that performance is one of the most critical qualities for systems designed for preventing, detecting, or investigating data exfiltration.
The major reason for such poor performance is the large size, high speed, and heterogeneous nature of data dealt with by these systems.
It is imperative to collect and analyse this big data in real-time and without causing significant delay in any data transmission process.
Furthermore, the selection of features from security event data is another potential approach for improving performance and it needs to be investigated that how feature selection tools and technologies can be developed and incorporated in defence against data exfiltration attacks.

6.2. Evaluation

The authors strongly emphasis that a generic framework needs to be developed for the assessment of data exfiltration countermeasures that can guide researchers and practitioners on how to evaluate their systems.
These limitations make the datasets unable to reflect the actual strengths and weaknesses of the proposed countermeasure.
The authors assert that evaluation framework should be able to guide for evaluating a countermeasure with APTs.
For improving the standard of evaluation, the authors also encourage close collaboration between academia and industry.

6.3. Automation

A lack of automation impacts performance of the overall system in terms of deployment and response time.
The involvement of network administrator and investigating experts increases the cost and makes the incorporation of such systems quite challenging for enterprises.
Similarly, personal users are often reluctant to pay attention to security alerts and approvals during data transmission.
The dependency on a dedicated human should be reduced to a minimum to make these systems more acceptable for enterprises and personal use.

6.4. Privacy, Encrypted Traffic, and Accuracy

With respect to data exfiltration countermeasures, the three terms (i.e. Privacy, Encrypted Traffic, and Accuracy) are closely related.
These countermeasures directly monitor the outgoing network traffic generated by users and scan it for detecting sensitive information.
To address the privacy and security concerns, the approach of encrypting data before sending it out is broadly adopted.
To address this issue, countermeasures have been developed as reported in their review ([111-115]) that can examine the encrypted traffic to detect data exfiltration.

6.5. Investigative Countermeasures

In their review, the authors analysed and categorized data exfiltration countermeasures into preventive, detective, and investigative categories.
As evident from Fig – 2, research community primarily remains focussed on preventive and detective countermeasures.
Whilst the preventive and detective countermeasures are quite crucial for fighting data exfiltration, the authors believe investigative countermeasures are equally important.
Furthermore, identifying and prosecuting attackers sends a very strong message to other potential attackers that there are systems in place to track and catch them.

6.6. High Cost

Cost is one of the primary concerns both for individuals and especially enterprises while deciding upon the incorporation of a particular system, tool, or technology in their infrastructure.
The authors assert that the incorporation of specialized hardware should be discouraged in the design of data exfiltration countermeasures as deploying and maintaining hardware on a large scale would be very much unfeasible for enterprises.
Similarly, a thorough investigation is required to explore ways of reducing the cost for storing and maintaining negative data in cyber deception approaches.

7. Limitations

There are two reasons for such a limitation: (1) There exists a large number of attack vectors as reported in [14-16] and covering all of them is quite challenging (2) The authors review is focussed on countermeasures and not on attack vectors.
The motivation for including attack vectors is to contextualize the discussion on the countermeasures.
Similarly, a wide range of literature exists that directly or indirectly address data exfiltration in various domains (mobile computing, IoT devices, Printers); it may not be possible for a single review like ours to cover all such literature.
A wide range of papers exist on access control or encryption but it was not the intention of this review to include all those studies.
Similarly, extending the survey by following citations from included publications could unveil larger bodies of work on exfiltration methods which did not match their queries.

8. Conclusion

Data exfiltration is a serious and ongoing issue in the field of information security.
Another critical overview provided by their review is the applicability of countermeasures for particular data state (1: in use, 2: in transit, and 3: at rest), which gives an insight into what particular data states are mostly attacked and how countermeasures protect data in these particular states.
The lack of such capability leads to poor performance and response time.
This is pertinent given the increasing concerns over a surveillance society.
The authors hope that the insights provided in this paper will give academic researchers and industry practitioners with new directions and motivations for enhancing research and development efforts to devise, evaluate, and deploy new and innovative countermeasures for securing against data exfiltration attacks.

Did you find this useful? Give us your feedback

Figures (4)

Table 6. List of detective countermeasures

Table 1. A comparison of our survey with existing survey articles

Fig. 12. Papers pertaining to each category in Investigative countermeasures

Frequently Asked Questions (13)

Q1. What have the authors contributed in "Data exfiltration: a review of external attack vectors and countermeasures" ?

One of the main targets of cyber-attacks is data exfiltration, which is the leakage of sensitive or private data to an unauthorized entity. This paper is aimed at identifying and critically analysing data exfiltration attack vectors and countermeasures for reporting the status of the art and determining gaps for future research. The authors have followed a structured process for selecting 108 papers from seven publication databases. This review has revealed that ( a ) most of the state of the art is focussed on preventive and detective countermeasures and significant research is required on developing investigative countermeasures that are equally important ; ( b ) Several data exfiltration countermeasures are not able to respond in real-time, which specifies that research efforts need to be invested to enable them to respond in real-time ( c ) A number of data exfiltration countermeasures do not take privacy and ethical concerns into consideration, which may become an obstacle in their full adoption ( d ) Existing research is primarily focussed on protecting data in ‘ in use ’ state, therefore, future research needs to be directed towards securing data in ‘ in rest ’ and ‘ in transit ’ states ( e ) There is no standard or framework for evaluation of data exfiltration countermeasures. Furthermore, the authors have explored the applicability of various countermeasures for different states of data ( i. e., in use, in transit, or at rest ).

Q2. What are the future works in "Data exfiltration: a review of external attack vectors and countermeasures" ?

Fig. 14. Future Research Challenges in Defence against Data Exfiltration

Q3. What is the direct method of data exfiltration for a remote attacker?

Perhaps the most direct method of data exfiltration for a remote attacker is manipulating a public-facing server into disclosing non-public information, such as through the well-known category of SQL injection attacks.

Q4. What are the common physical attack vectors?

Physical attack vectors include those attacks that get unauthorized and illegal physical access to data and move it to a new physical location.

Q5. What is the importance of ensuring that only legitimate users can access the data?

It is important to enforce authentication and authorization mechanisms for ensuring that only legitimate users with the required credentials can access the data.

Q6. What is the main reason why organizations may be reluctant to adopt such an approach?

Intelligent and planned outsourcing of data to several clouds seems a good idea for reducing the risk of data leakage in cloud environments, however, organizations may be reluctant in adopting such an approach due to the extra storage cost and complexity of data management.

Q7. What are the three possible approaches to handling encrypted communications?

They discuss three possible approaches to handling encrypted communications within this system: detecting misuse of the encryption protocols, altering protocols to allow packet payload analysis, and finally statistical approaches, which examine packet sizes and time intervals.

Q8. What is the proposed approach to preventing access to user’s profile?

The proposed approach can help in preventing the access of a hacker, who had stolen the credentials (username and password) of a user or website admin using an attack vector such as phishing, spyware, or XSS, to personal information resided in the rest state in a cloud.

Q9. Why do the authors believe that it is quite risky not to adopt encrypted traffic?

Apart from their wide-scale adoption, the authors also believe that it is quite risky not to adopt encrypted traffic transmission approach because it leaves open the option of data exfiltration via passive monitoring.

Q10. What makes the countermeasures expensive to be incorporated by enterprises?

The high-level dependency on human experts and hardware devices make these countermeasures very expensive to be incorporated by enterprises.

Q11. Why did the authors not evaluate the efficiency of the proposed approach?

Due to the unavailability of the required SGX hardware, even the authors could not evaluate the efficiency of the proposed approach.

Q12. What are the limitations of the proposed approach to ensuring controlled access in a system?

The high processing and storage capability (8 core processor and 32 GB main memory) may hinder the adaptation of the proposed approach to ensuring controlled access in a system.

Q13. What is the labelling process used to find whether a sample is a malware?

This labelling is done using VirusTotal that runs around 55 malware engines over the samples to find whether a sample is a malware or not.