scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A Survey of Data Mining and Machine Learning Methods for Cyber Security Intrusion Detection

TL;DR: The complexity of ML/DM algorithms is addressed, discussion of challenges for using ML/ DM for cyber security is presented, and some recommendations on when to use a given method are provided.
Abstract: This survey paper describes a focused literature survey of machine learning (ML) and data mining (DM) methods for cyber analytics in support of intrusion detection. Short tutorial descriptions of each ML/DM method are provided. Based on the number of citations or the relevance of an emerging method, papers representing each method were identified, read, and summarized. Because data are so important in ML/DM approaches, some well-known cyber data sets used in ML/DM are described. The complexity of ML/DM algorithms is addressed, discussion of challenges for using ML/DM for cyber security is presented, and some recommendations on when to use a given method are provided.
Citations
More filters
Journal ArticleDOI
TL;DR: The experimental results show that RNN-IDS is very suitable for modeling a classification model with high accuracy and that its performance is superior to that of traditional machine learning classification methods in both binary and multiclass classification.
Abstract: Intrusion detection plays an important role in ensuring information security, and the key technology is to accurately identify various attacks in the network. In this paper, we explore how to model an intrusion detection system based on deep learning, and we propose a deep learning approach for intrusion detection using recurrent neural networks (RNN-IDS). Moreover, we study the performance of the model in binary classification and multiclass classification, and the number of neurons and different learning rate impacts on the performance of the proposed model. We compare it with those of J48, artificial neural network, random forest, support vector machine, and other machine learning methods proposed by previous researchers on the benchmark data set. The experimental results show that RNN-IDS is very suitable for modeling a classification model with high accuracy and that its performance is superior to that of traditional machine learning classification methods in both binary and multiclass classification. The RNN-IDS model improves the accuracy of the intrusion detection and provides a new research method for intrusion detection.

1,123 citations


Cites methods from "A Survey of Data Mining and Machine..."

  • ...RELEVANT WORK In prior studies, a number of approaches based on traditional machine learning, including SVM [10], [11], K-Nearest Neighbour (KNN) [12], ANN [13], Random Forest (RF) [14], [15] and others [16], [17], have been proposed and have achieved success for an intrusion detection system....

    [...]

Journal ArticleDOI
TL;DR: This paper bridges the gap between deep learning and mobile and wireless networking research, by presenting a comprehensive survey of the crossovers between the two areas, and provides an encyclopedic review of mobile and Wireless networking research based on deep learning, which is categorize by different domains.
Abstract: The rapid uptake of mobile devices and the rising popularity of mobile applications and services pose unprecedented demands on mobile and wireless networking infrastructure. Upcoming 5G systems are evolving to support exploding mobile traffic volumes, real-time extraction of fine-grained analytics, and agile management of network resources, so as to maximize user experience. Fulfilling these tasks is challenging, as mobile environments are increasingly complex, heterogeneous, and evolving. One potential solution is to resort to advanced machine learning techniques, in order to help manage the rise in data volumes and algorithm-driven applications. The recent success of deep learning underpins new and powerful tools that tackle problems in this space. In this paper, we bridge the gap between deep learning and mobile and wireless networking research, by presenting a comprehensive survey of the crossovers between the two areas. We first briefly introduce essential background and state-of-the-art in deep learning techniques with potential applications to networking. We then discuss several techniques and platforms that facilitate the efficient deployment of deep learning onto mobile systems. Subsequently, we provide an encyclopedic review of mobile and wireless networking research based on deep learning, which we categorize by different domains. Drawing from our experience, we discuss how to tailor deep learning to mobile environments. We complete this survey by pinpointing current challenges and open future directions for research.

975 citations

Journal ArticleDOI
TL;DR: A taxonomy of contemporary IDS is presented, a comprehensive review of notable recent works, and an overview of the datasets commonly used for evaluation purposes are presented, and evasion techniques used by attackers to avoid detection are presented.
Abstract: Cyber-attacks are becoming more sophisticated and thereby presenting increasing challenges in accurately detecting intrusions. Failure to prevent the intrusions could degrade the credibility of security services, e.g. data confidentiality, integrity, and availability. Numerous intrusion detection methods have been proposed in the literature to tackle computer security threats, which can be broadly classified into Signature-based Intrusion Detection Systems (SIDS) and Anomaly-based Intrusion Detection Systems (AIDS). This survey paper presents a taxonomy of contemporary IDS, a comprehensive review of notable recent works, and an overview of the datasets commonly used for evaluation purposes. It also presents evasion techniques used by attackers to avoid detection and discusses future research challenges to counter such techniques so as to make computer systems more secure.

684 citations


Cites background or methods from "A Survey of Data Mining and Machine..."

  • ...AIDS methods can be categorized into three main groups: Statistics-based (Chao et al., 2015), knowledgebased (Elhag et al., 2015; Can & Sahingoz, 2015), and machine learning-based (Buczak & Guven, 2016; Meshram & Haas, 2017)....

    [...]

  • ...Existing review articles (e.g., such as (Buczak & Guven, 2016; Axelsson, 2000; Ahmed et al., 2016; Lunt, 1988; Agrawal & Agrawal, 2015)) focus on intrusion detection techniques or dataset issue or type of computer attack and IDS evasion....

    [...]

  • ...Prior studies such as (Sadotra & Sharma, 2016; Buczak & Guven, 2016) have not completely reviewed IDSs in term of the datasets, challenges and techniques....

    [...]

Journal ArticleDOI
TL;DR: This survey delineates the limitations, give insights, research challenges and future opportunities to advance ML in networking, and jointly presents the application of diverse ML techniques in various key areas of networking across different network technologies.
Abstract: Machine Learning (ML) has been enjoying an unprecedented surge in applications that solve problems and enable automation in diverse domains. Primarily, this is due to the explosion in the availability of data, significant improvements in ML techniques, and advancement in computing capabilities. Undoubtedly, ML has been applied to various mundane and complex problems arising in network operation and management. There are various surveys on ML for specific areas in networking or for specific network technologies. This survey is original, since it jointly presents the application of diverse ML techniques in various key areas of networking across different network technologies. In this way, readers will benefit from a comprehensive discussion on the different learning paradigms and ML techniques applied to fundamental problems in networking, including traffic prediction, routing and classification, congestion control, resource and fault management, QoS and QoE management, and network security. Furthermore, this survey delineates the limitations, give insights, research challenges and future opportunities to advance ML in networking. Therefore, this is a timely contribution of the implications of ML for networking, that is pushing the barriers of autonomic network operation and management.

677 citations


Cites background from "A Survey of Data Mining and Machine..."

  • ...More recently, [82] looked at the application of Data Mining and ML for cyber-security intrusion detection....

    [...]

  • ...[82], both state-of-theart surveys, have a specialized treatment of ML to specific problems in networking....

    [...]

  • ...Previous surveys [82, 161, 447] looked at the application of ML for cyber-security....

    [...]

  • ...Though there are various surveys on ML in networking [18, 61, 82, 142, 246, 339], this survey is purposefully different....

    [...]

Journal ArticleDOI
TL;DR: This survey report describes key literature surveys on machine learning (ML) and deep learning (DL) methods for network analysis of intrusion detection and provides a brief tutorial description of each ML/DL method.
Abstract: With the development of the Internet, cyber-attacks are changing rapidly and the cyber security situation is not optimistic. This survey report describes key literature surveys on machine learning (ML) and deep learning (DL) methods for network analysis of intrusion detection and provides a brief tutorial description of each ML/DL method. Papers representing each method were indexed, read, and summarized based on their temporal or thermal correlations. Because data are so important in ML/DL methods, we describe some of the commonly used network datasets used in ML/DL, discuss the challenges of using ML/DL for cybersecurity and provide suggestions for research directions.

676 citations

References
More filters
Proceedings ArticleDOI
16 Mar 2008
TL;DR: This work hypothesizes that this latency can be shortened by creating signatures via anomaly-based algorithms via clustering, and investigates a modified density-based clustering algorithm as an IDS, with its strengths and weaknesses identified.
Abstract: Current practices for combating cyber attacks typically use Intrusion Detection Systems (IDSs) to detect and block multistage attacks. Because of the speed and impacts of new types of cyber attacks, current IDSs are limited in providing accurate detection while reliably adapting to new attacks. In signature-based IDS systems, this limitation is made apparent by the latency from day zero of an attack to the creation of an appropriate signature. This work hypothesizes that this latency can be shortened by creating signatures via anomaly-based algorithms. A hybrid supervised and unsupervised clustering algorithm is proposed for new signature creation. These new signatures created in real-time would take effect immediately, ideally detecting new attacks. This work first investigates a modified density-based clustering algorithm as an IDS, with its strengths and weaknesses identified. A signature creation algorithm leveraging the summarizing abilities of clustering is investigated. Lessons learned from the supervised signature creation are then leveraged for the development of unsupervised real-time signature classification. Automating signature creation and classification via clustering is demonstrated as satisfactory but with limitations.

30 citations


"A Survey of Data Mining and Machine..." refers background in this paper

  • ...[50], real-time signature generation by an anomaly detector can be an important asset....

    [...]

Journal Article
TL;DR: An approach to network risk assessment that is novel in a number of ways, where the risk level of a network is determined as the composition of the risks of individual hosts, providing a more precise, fine-grained model.
Abstract: Security-oriented risk assessment tools are used to determine the impact of certain events on the security status of a network. Most existing approaches are generally limited to manual risk evaluations that are not suitable for real-time use. In this paper, we introduce an approach to network risk assessment that is novel in a number of ways. First of all, the risk level of a network is determined as the composition of the risks of individual hosts, providing a more precise, fine-grained model. Second, we use Hidden Markov models to represent the likelihood of transitions between security states. Third, we tightly integrate our risk assessment tool with an existing framework for distributed, large-scale intrusion detection, and we apply the results of the risk assessment to prioritize the alerts produced by the intrusion detection sensors. We also evaluate our approach on both simulated and real-world data.

25 citations


"A Survey of Data Mining and Machine..." refers methods in this paper

  • ...An example of an HMM for host intrusion detection is shown in Figure 5 [81]....

    [...]

Proceedings ArticleDOI
24 Oct 2007
TL;DR: This paper proposes a prototype system based on SVG technology and emphasize the method of similarity measurement, also covers the implementation of the principles, which provides CBIR services.
Abstract: Vector images are becoming more advanced and diffused on the Internet due to their unique characteristics over traditional raster images. As a W3C standard, SVG is a brand-new vector-based image format built on readable/searchable plain-text source code which inherited from XML technology, thus is suitable for Content-based image retrieval. CBIR is the application of computer vision to the image retrieval problem, which analyzes the actual contents of the images themselves rather than relying on annotations or keywords generated by human labor. Without textual descriptions or label information, successful retrieval of relevant images from large- scale image collections is a very interesting yet challenging task, so in this paper, we propose a prototype system based on SVG technology and emphasize the method of similarity measurement, also covers the implementation of the principles, which provides CBIR services.

22 citations