A Survey of Data Mining and Machine Learning Methods for Cyber Security Intrusion Detection

doi:10.1109/COMST.2015.2494502

Home
/
Papers
/
A Survey of Data Mining and Machine Learning Methods for Cyber Security Intrusion Detection

Journal Article•DOI•

A Survey of Data Mining and Machine Learning Methods for Cyber Security Intrusion Detection

Anna L. Buczak¹, Erhan Guven¹•Institutions (1)

Johns Hopkins University Applied Physics Laboratory¹

22 Jan 2016-IEEE Communications Surveys and Tutorials (IEEE)-Vol. 18, Iss: 2, pp 1153-1176

TL;DR: The complexity of ML/DM algorithms is addressed, discussion of challenges for using ML/ DM for cyber security is presented, and some recommendations on when to use a given method are provided.

read less

Abstract: This survey paper describes a focused literature survey of machine learning (ML) and data mining (DM) methods for cyber analytics in support of intrusion detection. Short tutorial descriptions of each ML/DM method are provided. Based on the number of citations or the relevance of an emerging method, papers representing each method were identified, read, and summarized. Because data are so important in ML/DM approaches, some well-known cyber data sets used in ML/DM are described. The complexity of ML/DM algorithms is addressed, discussion of challenges for using ML/DM for cyber security is presented, and some recommendations on when to use a given method are provided.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

A Deep Learning Approach for Intrusion Detection Using Recurrent Neural Networks

[...]

Chuanlong Yin, Yuefei Zhu, Jinlong Fei, Xinzheng He

12 Oct 2017-IEEE Access

TL;DR: The experimental results show that RNN-IDS is very suitable for modeling a classification model with high accuracy and that its performance is superior to that of traditional machine learning classification methods in both binary and multiclass classification.

...read moreread less

Abstract: Intrusion detection plays an important role in ensuring information security, and the key technology is to accurately identify various attacks in the network. In this paper, we explore how to model an intrusion detection system based on deep learning, and we propose a deep learning approach for intrusion detection using recurrent neural networks (RNN-IDS). Moreover, we study the performance of the model in binary classification and multiclass classification, and the number of neurons and different learning rate impacts on the performance of the proposed model. We compare it with those of J48, artificial neural network, random forest, support vector machine, and other machine learning methods proposed by previous researchers on the benchmark data set. The experimental results show that RNN-IDS is very suitable for modeling a classification model with high accuracy and that its performance is superior to that of traditional machine learning classification methods in both binary and multiclass classification. The RNN-IDS model improves the accuracy of the intrusion detection and provides a new research method for intrusion detection.

...read moreread less

1,123 citations

Cites methods from "A Survey of Data Mining and Machine..."

...RELEVANT WORK In prior studies, a number of approaches based on traditional machine learning, including SVM [10], [11], K-Nearest Neighbour (KNN) [12], ANN [13], Random Forest (RF) [14], [15] and others [16], [17], have been proposed and have achieved success for an intrusion detection system....
[...]

Journal Article•DOI•

Deep Learning in Mobile and Wireless Networking: A Survey

[...]

Chaoyun Zhang¹, Paul Patras¹, Hamed Haddadi²•Institutions (2)

University of Edinburgh¹, Imperial College London²

13 Mar 2019-IEEE Communications Surveys and Tutorials

TL;DR: This paper bridges the gap between deep learning and mobile and wireless networking research, by presenting a comprehensive survey of the crossovers between the two areas, and provides an encyclopedic review of mobile and Wireless networking research based on deep learning, which is categorize by different domains.

...read moreread less

Abstract: The rapid uptake of mobile devices and the rising popularity of mobile applications and services pose unprecedented demands on mobile and wireless networking infrastructure. Upcoming 5G systems are evolving to support exploding mobile traffic volumes, real-time extraction of fine-grained analytics, and agile management of network resources, so as to maximize user experience. Fulfilling these tasks is challenging, as mobile environments are increasingly complex, heterogeneous, and evolving. One potential solution is to resort to advanced machine learning techniques, in order to help manage the rise in data volumes and algorithm-driven applications. The recent success of deep learning underpins new and powerful tools that tackle problems in this space. In this paper, we bridge the gap between deep learning and mobile and wireless networking research, by presenting a comprehensive survey of the crossovers between the two areas. We first briefly introduce essential background and state-of-the-art in deep learning techniques with potential applications to networking. We then discuss several techniques and platforms that facilitate the efficient deployment of deep learning onto mobile systems. Subsequently, we provide an encyclopedic review of mobile and wireless networking research based on deep learning, which we categorize by different domains. Drawing from our experience, we discuss how to tailor deep learning to mobile environments. We complete this survey by pinpointing current challenges and open future directions for research.

...read moreread less

975 citations

Journal Article•DOI•

Survey of intrusion detection systems: techniques, datasets and challenges

[...]

Ansam Khraisat¹, Iqbal Gondal¹, Peter Vamplew¹, Joarder Kamruzzaman¹•Institutions (1)

Federation University Australia¹

01 Jul 2019-Cybersecurity

TL;DR: A taxonomy of contemporary IDS is presented, a comprehensive review of notable recent works, and an overview of the datasets commonly used for evaluation purposes are presented, and evasion techniques used by attackers to avoid detection are presented.

...read moreread less

Abstract: Cyber-attacks are becoming more sophisticated and thereby presenting increasing challenges in accurately detecting intrusions. Failure to prevent the intrusions could degrade the credibility of security services, e.g. data confidentiality, integrity, and availability. Numerous intrusion detection methods have been proposed in the literature to tackle computer security threats, which can be broadly classified into Signature-based Intrusion Detection Systems (SIDS) and Anomaly-based Intrusion Detection Systems (AIDS). This survey paper presents a taxonomy of contemporary IDS, a comprehensive review of notable recent works, and an overview of the datasets commonly used for evaluation purposes. It also presents evasion techniques used by attackers to avoid detection and discusses future research challenges to counter such techniques so as to make computer systems more secure.

...read moreread less

684 citations

Cites background or methods from "A Survey of Data Mining and Machine..."

...AIDS methods can be categorized into three main groups: Statistics-based (Chao et al., 2015), knowledgebased (Elhag et al., 2015; Can & Sahingoz, 2015), and machine learning-based (Buczak & Guven, 2016; Meshram & Haas, 2017)....
[...]
...Existing review articles (e.g., such as (Buczak & Guven, 2016; Axelsson, 2000; Ahmed et al., 2016; Lunt, 1988; Agrawal & Agrawal, 2015)) focus on intrusion detection techniques or dataset issue or type of computer attack and IDS evasion....
[...]
...Prior studies such as (Sadotra & Sharma, 2016; Buczak & Guven, 2016) have not completely reviewed IDSs in term of the datasets, challenges and techniques....
[...]

Journal Article•DOI•

A comprehensive survey on machine learning for networking: evolution, applications and research opportunities

[...]

Raouf Boutaba¹, Mohammad A. Salahuddin¹, Noura Limam¹, Sara Ayoubi¹, Nashid Shahriar¹, Felipe Estrada-Solano², Felipe Estrada-Solano¹, Oscar Mauricio Caicedo² - Show less +4 more•Institutions (2)

University of Waterloo¹, University of Cauca²

21 Jun 2018-Journal of Internet Services and Applications

TL;DR: This survey delineates the limitations, give insights, research challenges and future opportunities to advance ML in networking, and jointly presents the application of diverse ML techniques in various key areas of networking across different network technologies.

...read moreread less

Abstract: Machine Learning (ML) has been enjoying an unprecedented surge in applications that solve problems and enable automation in diverse domains. Primarily, this is due to the explosion in the availability of data, significant improvements in ML techniques, and advancement in computing capabilities. Undoubtedly, ML has been applied to various mundane and complex problems arising in network operation and management. There are various surveys on ML for specific areas in networking or for specific network technologies. This survey is original, since it jointly presents the application of diverse ML techniques in various key areas of networking across different network technologies. In this way, readers will benefit from a comprehensive discussion on the different learning paradigms and ML techniques applied to fundamental problems in networking, including traffic prediction, routing and classification, congestion control, resource and fault management, QoS and QoE management, and network security. Furthermore, this survey delineates the limitations, give insights, research challenges and future opportunities to advance ML in networking. Therefore, this is a timely contribution of the implications of ML for networking, that is pushing the barriers of autonomic network operation and management.

...read moreread less

677 citations

Cites background from "A Survey of Data Mining and Machine..."

...More recently, [82] looked at the application of Data Mining and ML for cyber-security intrusion detection....
[...]
...[82], both state-of-theart surveys, have a specialized treatment of ML to specific problems in networking....
[...]
...Previous surveys [82, 161, 447] looked at the application of ML for cyber-security....
[...]
...Though there are various surveys on ML in networking [18, 61, 82, 142, 246, 339], this survey is purposefully different....
[...]

Journal Article•DOI•

Machine Learning and Deep Learning Methods for Cybersecurity

[...]

Yang Xin¹, Lingshuang Kong², Liu Zhi³, Yuling Chen³, Yanmiao Li¹, Hongliang Zhu¹, Mingcheng Gao¹, Haixia Hou¹, Chunhua Wang - Show less +5 more•Institutions (3)

Beijing University of Posts and Telecommunications¹, Shandong University², Guizhou University³

15 May 2018-IEEE Access

TL;DR: This survey report describes key literature surveys on machine learning (ML) and deep learning (DL) methods for network analysis of intrusion detection and provides a brief tutorial description of each ML/DL method.

...read moreread less

Abstract: With the development of the Internet, cyber-attacks are changing rapidly and the cyber security situation is not optimistic. This survey report describes key literature surveys on machine learning (ML) and deep learning (DL) methods for network analysis of intrusion detection and provides a brief tutorial description of each ML/DL method. Papers representing each method were indexed, read, and summarized based on their temporal or thermal correlations. Because data are so important in ML/DL methods, we describe some of the commonly used network datasets used in ML/DL, discuss the challenges of using ML/DL for cybersecurity and provide suggestions for research directions.

...read moreread less

676 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Random Forests

[...]

Leo Breiman¹•Institutions (1)

University of California, Berkeley¹

01 Oct 2001

TL;DR: Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the forest, and are also applicable to regression.

...read moreread less

Abstract: Random forests are a combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest. The generalization error for forests converges a.s. to a limit as the number of trees in the forest becomes large. The generalization error of a forest of tree classifiers depends on the strength of the individual trees in the forest and the correlation between them. Using a random selection of features to split each node yields error rates that compare favorably to Adaboost (Y. Freund & R. Schapire, Machine Learning: Proceedings of the Thirteenth International conference, aaa, 148–156), but are more robust with respect to noise. Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the splitting. Internal estimates are also used to measure variable importance. These ideas are also applicable to regression.

...read moreread less

79,257 citations

"A Survey of Data Mining and Machine..." refers methods in this paper

...The Random Forest classifier [61] is an ML method that combines the decision trees and ensemble learning....
[...]

Book•

Fuzzy sets

[...]

Lotfi A. Zadeh

01 Aug 1996

TL;DR: A separation theorem for convex fuzzy sets is proved without requiring that the fuzzy sets be disjoint.

...read moreread less

Abstract: A fuzzy set is a class of objects with a continuum of grades of membership. Such a set is characterized by a membership (characteristic) function which assigns to each object a grade of membership ranging between zero and one. The notions of inclusion, union, intersection, complement, relation, convexity, etc., are extended to such sets, and various properties of these notions in the context of fuzzy sets are established. In particular, a separation theorem for convex fuzzy sets is proved without requiring that the fuzzy sets be disjoint.

...read moreread less

52,705 citations

Journal Article•DOI•

Maximum likelihood from incomplete data via the EM algorithm

[...]

Arthur P. Dempster¹, Nan M. Laird¹, Donald B. Rubin¹•Institutions (1)

Harvard University¹

01 Sep 1977-Journal of the royal statistical society series b-methodological

49,597 citations

Book•

The Nature of Statistical Learning Theory

[...]

Vladimir Vapnik¹•Institutions (1)

Bell Labs¹

01 Jan 1995

TL;DR: Setting of the learning problem consistency of learning processes bounds on the rate of convergence ofLearning processes controlling the generalization ability of learning process constructing learning algorithms what is important in learning theory?

...read moreread less

Abstract: Setting of the learning problem consistency of learning processes bounds on the rate of convergence of learning processes controlling the generalization ability of learning processes constructing learning algorithms what is important in learning theory?.

...read moreread less

40,147 citations

"A Survey of Data Mining and Machine..." refers methods in this paper

...The approach is based on a minimized classification risk [95] rather than on optimal classi-...
[...]

Journal Article•DOI•

Collective dynamics of small-world networks

[...]

Duncan J. Watts¹, Steven H. Strogatz¹•Institutions (1)

Cornell University¹

04 Jun 1998-Nature

TL;DR: Simple models of networks that can be tuned through this middle ground: regular networks ‘rewired’ to introduce increasing amounts of disorder are explored, finding that these systems can be highly clustered, like regular lattices, yet have small characteristic path lengths, like random graphs.

...read moreread less

Abstract: Networks of coupled dynamical systems have been used to model biological oscillators, Josephson junction arrays, excitable media, neural networks, spatial games, genetic control networks and many other self-organizing systems. Ordinarily, the connection topology is assumed to be either completely regular or completely random. But many biological, technological and social networks lie somewhere between these two extremes. Here we explore simple models of networks that can be tuned through this middle ground: regular networks 'rewired' to introduce increasing amounts of disorder. We find that these systems can be highly clustered, like regular lattices, yet have small characteristic path lengths, like random graphs. We call them 'small-world' networks, by analogy with the small-world phenomenon (popularly known as six degrees of separation. The neural network of the worm Caenorhabditis elegans, the power grid of the western United States, and the collaboration graph of film actors are shown to be small-world networks. Models of dynamical systems with small-world coupling display enhanced signal-propagation speed, computational power, and synchronizability. In particular, infectious diseases spread more easily in small-world networks than in regular lattices.

...read moreread less

39,297 citations

"A Survey of Data Mining and Machine..." refers background in this paper

...The local clustering coefficient of a node is defined as the ratio of the number of sub-graphs with three edges and three vertices that the node is part of to the number of triples which the node is part of [49]....
[...]