scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Advanced Persistent Threat intelligent profiling technique: A survey

01 Oct 2022-Computers & Electrical Engineering-Vol. 103, pp 108261-108261
TL;DR: A systematic review of intelligent threat profiling techniques for APT attacks, covering three aspects: data, methods, and applications, is provided in this paper , which summarizes the latest research in applications, proposes the research framework and technical architecture, and provides insights into future research trends.
About: This article is published in Computers & Electrical Engineering.The article was published on 2022-10-01. It has received 4 citations till now. The article focuses on the topics: Profiling (computer programming) & Computer science.
Citations
More filters
Proceedings ArticleDOI
30 Jan 2023
TL;DR: TTPHunter as discussed by the authors fine-tunes linear classifiers, which take input as BERT (Bidirectional Encoder Representations from Transformers) embeddings of sentences, to extract TTPs from APT reports.
Abstract: With the proliferation of attacks from various Advanced Persistent Threats (APT) groups, it is essential to comprehend the threat actor’s attack patterns to accelerate threat detection and response. The MITRE ATT&CK framework’s Tactics, Techniques, and Procedures (TTPs) help to decipher attack patterns. The APT reports, published by security firms, contain rich information on tools and techniques used by threat actors. These reports are available in unstructured and natural language texts. There is a need for an automated tool to extract TTPs present in natural language text. However, there are few tools available in the literature, but their performance is not very satisfactory. In this work, we propose TTPHunter, to extract TTPs from APT reports by mapping sentence context to relevant TTPs. We fine-tune linear classifiers, which take input as BERT (Bidirectional Encoder Representations from Transformers) embeddings of sentences. We create two datasets: sentence-based (8,387 sentence samples) and document-based (50 threat reports) to validate TTPHunter. TTPHunter achieves the F1-score of 88% and 75% for both datasets, respectively. We compare the TTPHunter with rcATT and AttacKG baseline models, and it outperforms both baselines.

1 citations

Journal ArticleDOI
TL;DR: In this paper , the authors describe different standards for IOC representation and discuss the associated challenges that restrict security investigators from developing IOCs in the industrial sectors, and also discuss the potential IOCs against cyber-attacks in ICS systems.
Abstract: Numerous sophisticated and nation-state attacks on Industrial Control Systems (ICSs) have increased in recent years, exemplified by Stuxnet and Ukrainian Power Grid. Measures to be taken post-incident are crucial to reduce damage, restore control, and identify attack actors involved. By monitoring Indicators of Compromise (IOCs), the incident responder can detect malicious activity triggers and respond quickly to a similar intrusion at an earlier stage. However, to implement IOCs in critical infrastructures, we need to understand their contexts and requirements. Unfortunately, there is no survey paper in the literature on IOC in the ICS environment, and only limited information is provided in research articles. In this article, we describe different standards for IOC representation and discuss the associated challenges that restrict security investigators from developing IOCs in the industrial sectors. We also discuss the potential IOCs against cyber-attacks in ICS systems. Furthermore, we conduct a critical analysis of existing works and available tools in this space. We evaluate the effectiveness of identified IOCs’ by mapping these indicators to the most frequently targeted attacks in the ICS environment. Finally, we highlight the lessons to be learned from the literature and the future problems in the domain along with the approaches that might be taken.
Journal ArticleDOI
TL;DR: In this article , a combination of deep graph networks and contrastive learning is proposed to enhance the effectiveness of the APT malware detection, and the experimental results demonstrate that the proposed model, whose effectiveness ranges from 88% to 94% across all performance metrics, is not only scientifically effective but also practically significant.
Abstract: Advanced persistent threat (APT) attacking campaigns have been a common method for cyber-attackers to attack and exploit end-user computers (workstations) in recent years. In this study, to enhance the effectiveness of the APT malware detection, a combination of deep graph networks and contrastive learning is proposed. The idea is that several deep graph networks such as Graph Convolution Networks (GCN), Graph Isomorphism Networks (GIN), are combined with some popular contrastive learning models like N-pair Loss, Contrastive Loss, and Triplet Loss, in order to optimize the process of APT malware detection and classification in endpoint workstations. The proposed approach consists of three main phases as follows. First, the behaviors of APT malware are collected and represented as graphs. Second, GIN and GCN networks are used to extract feature vectors from the graphs of APT malware. Finally, different contrastive learning models, i.e. N-pair Loss, Contrastive Loss, and Triplet Loss are applied to determine which feature vectors belong to APT malware, and which ones belong to normal files. This combination of deep graph networks and contrastive learning algorithm is a novel approach, that not only enhances the ability to accurately detect APT malware but also reduces false alarms for normal behaviors. The experimental results demonstrate that the proposed model, whose effectiveness ranges from 88% to 94% across all performance metrics, is not only scientifically effective but also practically significant. Additionally, the results show that the combination of GIN and N-pair Loss performs better than other combined models. This provides a base malware detection system with flexible parameter selection and mathematical model choices for optimal real-world applications.
Journal ArticleDOI
TL;DR: In this article , a conceptual framework is proposed to incorporate the situational awareness model in line with game theory as an AI technique used to generate APT-based tactics, techniques, and procedures (TTPs) and normal TTPs and cognitive decision making.
Abstract: Advanced persistent threat (APT) refers to a specific form of targeted attack used by a well-organized and skilled adversary to remain undetected while systematically and continuously exfiltrating sensitive data. Various APT attack vectors exist, including social engineering techniques such as spear phishing, watering holes, SQL injection, and application repackaging. Various sensors and services are essential for a smartphone to assist in user behavior that involves sensitive information. Resultantly, smartphones have become the main target of APT attacks. Due to the vulnerability of smartphone sensors, several challenges have emerged, including the inadequacy of current methods for detecting APTs. Nevertheless, several existing APT solutions, strategies, and implementations have failed to provide comprehensive solutions. Detecting APT attacks remains challenging due to the lack of attention given to human behavioral factors contributing to APTs, the ambiguity of APT attack trails, and the absence of a clear attack fingerprint. In addition, there is a lack of studies using game theory or fuzzy logic as an artificial intelligence (AI) strategy for detecting APT attacks on smartphone sensors, besides the limited understanding of the attack that may be employed due to the complex nature of APT attacks. Accordingly, this study aimed to deliver a systematic review to report on the extant research concerning APT detection for mobile sensors, applications, and user behavior. The study presents an overview of works performed between 2012 and 2023. In total, 1351 papers were reviewed during the primary search. Subsequently, these papers were processed according to their titles, abstracts, and contents. The resulting papers were selected to address the research questions. A conceptual framework is proposed to incorporate the situational awareness model in line with adopting game theory as an AI technique used to generate APT-based tactics, techniques, and procedures (TTPs) and normal TTPs and cognitive decision making. This framework enhances security awareness and facilitates the detection of APT attacks on smartphone sensors, applications, and user behavior. It supports researchers in exploring the most significant papers on APTs related to mobile sensors, services, applications, and detection techniques using AI.
References
More filters
Journal ArticleDOI
TL;DR: This article provides a comprehensive overview of graph neural networks (GNNs) in data mining and machine learning fields and proposes a new taxonomy to divide the state-of-the-art GNNs into four categories, namely, recurrent GNNS, convolutional GNN’s, graph autoencoders, and spatial–temporal Gnns.
Abstract: Deep learning has revolutionized many machine learning tasks in recent years, ranging from image classification and video processing to speech recognition and natural language understanding. The data in these tasks are typically represented in the Euclidean space. However, there is an increasing number of applications, where data are generated from non-Euclidean domains and are represented as graphs with complex relationships and interdependency between objects. The complexity of graph data has imposed significant challenges on the existing machine learning algorithms. Recently, many studies on extending deep learning approaches for graph data have emerged. In this article, we provide a comprehensive overview of graph neural networks (GNNs) in data mining and machine learning fields. We propose a new taxonomy to divide the state-of-the-art GNNs into four categories, namely, recurrent GNNs, convolutional GNNs, graph autoencoders, and spatial–temporal GNNs. We further discuss the applications of GNNs across various domains and summarize the open-source codes, benchmark data sets, and model evaluation of GNNs. Finally, we propose potential research directions in this rapidly growing field.

4,584 citations

Journal ArticleDOI
TL;DR: This article provides a systematic review of existing techniques of Knowledge graph embedding, including not only the state-of-the-arts but also those with latest trends, based on the type of information used in the embedding task.
Abstract: Knowledge graph (KG) embedding is to embed components of a KG including entities and relations into continuous vector spaces, so as to simplify the manipulation while preserving the inherent structure of the KG. It can benefit a variety of downstream tasks such as KG completion and relation extraction, and hence has quickly gained massive attention. In this article, we provide a systematic review of existing techniques, including not only the state-of-the-arts but also those with latest trends. Particularly, we make the review based on the type of information used in the embedding task. Techniques that conduct embedding using only facts observed in the KG are first introduced. We describe the overall framework, specific model design, typical training procedures, as well as pros and cons of such techniques. After that, we discuss techniques that further incorporate additional information besides facts. We focus specifically on the use of entity types, relation paths, textual descriptions, and logical rules. Finally, we briefly introduce how KG embedding can be applied to and benefit a wide variety of downstream tasks such as KG completion, relation extraction, question answering, and so forth.

1,905 citations

Journal ArticleDOI
17 Jul 2019
TL;DR: Zhang et al. as discussed by the authors proposed a Text Graph Convolutional Network (Text GCN) for text classification, which jointly learns the embeddings for both words and documents, as supervised by the known class labels for documents.
Abstract: Text classification is an important and classical problem in natural language processing. There have been a number of studies that applied convolutional neural networks (convolution on regular grid, e.g., sequence) to classification. However, only a limited number of studies have explored the more flexible graph convolutional neural networks (convolution on non-grid, e.g., arbitrary graph) for the task. In this work, we propose to use graph convolutional networks for text classification. We build a single text graph for a corpus based on word co-occurrence and document word relations, then learn a Text Graph Convolutional Network (Text GCN) for the corpus. Our Text GCN is initialized with one-hot representation for word and document, it then jointly learns the embeddings for both words and documents, as supervised by the known class labels for documents. Our experimental results on multiple benchmark datasets demonstrate that a vanilla Text GCN without any external word embeddings or knowledge outperforms state-of-the-art methods for text classification. On the other hand, Text GCN also learns predictive word and document embeddings. In addition, experimental results show that the improvement of Text GCN over state-of-the-art comparison methods become more prominent as we lower the percentage of training data, suggesting the robustness of Text GCN to less training data in text classification.

1,215 citations

Journal ArticleDOI
TL;DR: A comprehensive review of the knowledge graph covering overall research topics about: 1) knowledge graph representation learning; 2) knowledge acquisition and completion; 3) temporal knowledge graph; and 4) knowledge-aware applications and summarize recent breakthroughs and perspective directions to facilitate future research.
Abstract: Human knowledge provides a formal understanding of the world. Knowledge graphs that represent structural relations between entities have become an increasingly popular research direction toward cognition and human-level intelligence. In this survey, we provide a comprehensive review of the knowledge graph covering overall research topics about: 1) knowledge graph representation learning; 2) knowledge acquisition and completion; 3) temporal knowledge graph; and 4) knowledge-aware applications and summarize recent breakthroughs and perspective directions to facilitate future research. We propose a full-view categorization and new taxonomies on these topics. Knowledge graph embedding is organized from four aspects of representation space, scoring function, encoding models, and auxiliary information. For knowledge acquisition, especially knowledge graph completion, embedding methods, path inference, and logical rule reasoning are reviewed. We further explore several emerging topics, including metarelational learning, commonsense reasoning, and temporal knowledge graphs. To facilitate future research on knowledge graphs, we also provide a curated collection of data sets and open-source libraries on different tasks. In the end, we have a thorough outlook on several promising research directions.

1,025 citations

Proceedings ArticleDOI
20 Jul 2017
TL;DR: A novel reinforcement learning framework for learning multi-hop relational paths is described, which uses a policy-based agent with continuous states based on knowledge graph embeddings, which reasons in a KG vector-space by sampling the most promising relation to extend its path.
Abstract: We study the problem of learning to reason in large scale knowledge graphs (KGs). More specifically, we describe a novel reinforcement learning framework for learning multi-hop relational paths: we use a policy-based agent with continuous states based on knowledge graph embeddings, which reasons in a KG vector-space by sampling the most promising relation to extend its path. In contrast to prior work, our approach includes a reward function that takes the accuracy, diversity, and efficiency into consideration. Experimentally, we show that our proposed method outperforms a path-ranking based algorithm and knowledge graph embedding methods on Freebase and Never-Ending Language Learning datasets.

510 citations