scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Toward Generating a New Intrusion Detection Dataset and Intrusion Traffic Characterization

TL;DR: A reliable dataset is produced that contains benign and seven common attack network flows, which meets real world criteria and is publicly avaliable and evaluates the performance of a comprehensive set of network traffic features and machine learning algorithms to indicate the best set of features for detecting the certain attack categories.
Abstract: With exponential growth in the size of computer networks and developed applications, the significant increasing of the potential damage that can be caused by launching attacks is becoming obvious. Meanwhile, Intrusion Detection Systems (IDSs) and Intrusion Prevention Systems (IPSs) are one of the most important defense tools against the sophisticated and ever-growing network attacks. Due to the lack of adequate dataset, anomaly-based approaches in intrusion detection systems are suffering from accurate deployment, analysis and evaluation. There exist a number of such datasets such as DARPA98, KDD99, ISC2012, and ADFA13 that have been used by the researchers to evaluate the performance of their proposed intrusion detection and intrusion prevention approaches. Based on our study over eleven available datasets since 1998, many such datasets are out of date and unreliable to use. Some of these datasets suffer from lack of traffic diversity and volumes, some of them do not cover the variety of attacks, while others anonymized packet information and payload which cannot reflect the current trends, or they lack feature set and metadata. This paper produces a reliable dataset that contains benign and seven common attack network flows, which meets real world criteria and is publicly avaliable. Consequently, the paper evaluates the performance of a comprehensive set of network traffic features and machine learning algorithms to indicate the best set of features for detecting the certain attack categories.
Citations
More filters
Journal ArticleDOI
TL;DR: A highly scalable and hybrid DNNs framework called scale-hybrid-IDS-AlertNet is proposed which can be used in real-time to effectively monitor the network traffic and host-level events to proactively alert possible cyberattacks.
Abstract: Machine learning techniques are being widely used to develop an intrusion detection system (IDS) for detecting and classifying cyberattacks at the network-level and the host-level in a timely and automatic manner. However, many challenges arise since malicious attacks are continually changing and are occurring in very large volumes requiring a scalable solution. There are different malware datasets available publicly for further research by cyber security community. However, no existing study has shown the detailed analysis of the performance of various machine learning algorithms on various publicly available datasets. Due to the dynamic nature of malware with continuously changing attacking methods, the malware datasets available publicly are to be updated systematically and benchmarked. In this paper, a deep neural network (DNN), a type of deep learning model, is explored to develop a flexible and effective IDS to detect and classify unforeseen and unpredictable cyberattacks. The continuous change in network behavior and rapid evolution of attacks makes it necessary to evaluate various datasets which are generated over the years through static and dynamic approaches. This type of study facilitates to identify the best algorithm which can effectively work in detecting future cyberattacks. A comprehensive evaluation of experiments of DNNs and other classical machine learning classifiers are shown on various publicly available benchmark malware datasets. The optimal network parameters and network topologies for DNNs are chosen through the following hyperparameter selection methods with KDDCup 99 dataset. All the experiments of DNNs are run till 1,000 epochs with the learning rate varying in the range [0.01-0.5]. The DNN model which performed well on KDDCup 99 is applied on other datasets, such as NSL-KDD, UNSW-NB15, Kyoto, WSN-DS, and CICIDS 2017, to conduct the benchmark. Our DNN model learns the abstract and high-dimensional feature representation of the IDS data by passing them into many hidden layers. Through a rigorous experimental testing, it is confirmed that DNNs perform well in comparison with the classical machine learning classifiers. Finally, we propose a highly scalable and hybrid DNNs framework called scale-hybrid-IDS-AlertNet which can be used in real-time to effectively monitor the network traffic and host-level events to proactively alert possible cyberattacks.

847 citations


Cites background from "Toward Generating a New Intrusion D..."

  • ...The details of various IDS datasets are discussed in [66], [59]....

    [...]

  • ...Recently, to provide benchmark dataset to the research community, [59] generated reliable dataset....

    [...]

Journal ArticleDOI
TL;DR: In this paper, the authors proposed a new dataset, called Bot-IoT, which incorporates legitimate and simulated IoT network traffic, along with various types of attacks, and evaluated the reliability of the dataset using different statistical and machine learning methods for forensics purposes.

736 citations

Journal ArticleDOI
TL;DR: A taxonomy of contemporary IDS is presented, a comprehensive review of notable recent works, and an overview of the datasets commonly used for evaluation purposes are presented, and evasion techniques used by attackers to avoid detection are presented.
Abstract: Cyber-attacks are becoming more sophisticated and thereby presenting increasing challenges in accurately detecting intrusions. Failure to prevent the intrusions could degrade the credibility of security services, e.g. data confidentiality, integrity, and availability. Numerous intrusion detection methods have been proposed in the literature to tackle computer security threats, which can be broadly classified into Signature-based Intrusion Detection Systems (SIDS) and Anomaly-based Intrusion Detection Systems (AIDS). This survey paper presents a taxonomy of contemporary IDS, a comprehensive review of notable recent works, and an overview of the datasets commonly used for evaluation purposes. It also presents evasion techniques used by attackers to avoid detection and discusses future research challenges to counter such techniques so as to make computer systems more secure.

684 citations


Cites background from "Toward Generating a New Intrusion D..."

  • ...tack, Infiltration, Botnet and DDoS (Sharafaldin et al., 2018)....

    [...]

  • ...This dataset contains network traffic traces from Distributed Denial-of-Service (DDoS) attacks, and was collected in 2007 (Hick et al., 2007)....

    [...]

  • ...CICIDS 2017 CICIDS2017 dataset comprises both benign behaviour and also details of new malware attacks: such as Brute Force FTP, Brute Force SSH, DoS, Heartbleed, Web Attack, Infiltration, Botnet and DDoS (Sharafaldin et al., 2018)....

    [...]

Journal ArticleDOI
TL;DR: This survey classifies the IoT security threats and challenges for IoT networks by evaluating existing defense techniques and provides a comprehensive review of NIDSs deploying different aspects of learning techniques for IoT, unlike other top surveys targeting the traditional systems.
Abstract: Pervasive growth of Internet of Things (IoT) is visible across the globe. The 2016 Dyn cyberattack exposed the critical fault-lines among smart networks. Security of IoT has become a critical concern. The danger exposed by infested Internet-connected Things not only affects the security of IoT but also threatens the complete Internet eco-system which can possibly exploit the vulnerable Things (smart devices) deployed as botnets. Mirai malware compromised the video surveillance devices and paralyzed Internet via distributed denial of service attacks. In the recent past, security attack vectors have evolved bothways, in terms of complexity and diversity. Hence, to identify and prevent or detect novel attacks, it is important to analyze techniques in IoT context. This survey classifies the IoT security threats and challenges for IoT networks by evaluating existing defense techniques. Our main focus is on network intrusion detection systems (NIDSs); hence, this paper reviews existing NIDS implementation tools and datasets as well as free and open-source network sniffing software. Then, it surveys, analyzes, and compares state-of-the-art NIDS proposals in the IoT context in terms of architecture, detection methodologies, validation strategies, treated threats, and algorithm deployments. The review deals with both traditional and machine learning (ML) NIDS techniques and discusses future directions. In this survey, our focus is on IoT NIDS deployed via ML since learning algorithms have a good success rate in security and privacy. The survey provides a comprehensive review of NIDSs deploying different aspects of learning techniques for IoT, unlike other top surveys targeting the traditional systems. We believe that, this paper will be useful for academia and industry research, first, to identify IoT threats and challenges, second, to implement their own NIDS and finally to propose new smart techniques in IoT context considering IoT limitations. Moreover, the survey will enable security individuals differentiate IoT NIDS from traditional ones.

494 citations


Cites methods from "Toward Generating a New Intrusion D..."

  • ...Profile [94] approach to outline the behavior on HTTP, HTTPS, FTP, SSH and e-mail protocols....

    [...]

Journal ArticleDOI
01 Feb 2020
TL;DR: A survey of deep learning approaches for cyber security intrusion detection, the datasets used, and a comparative study to evaluate the efficiency of several methods are presented.
Abstract: In this paper, we present a survey of deep learning approaches for cybersecurity intrusion detection, the datasets used, and a comparative study. Specifically, we provide a review of intrusion detection systems based on deep learning approaches. The dataset plays an important role in intrusion detection, therefore we describe 35 well-known cyber datasets and provide a classification of these datasets into seven categories; namely, network traffic-based dataset, electrical network-based dataset, internet traffic-based dataset, virtual private network-based dataset, android apps-based dataset, IoT traffic-based dataset, and internet-connected devices-based dataset. We analyze seven deep learning models including recurrent neural networks, deep neural networks, restricted Boltzmann machines, deep belief networks, convolutional neural networks, deep Boltzmann machines, and deep autoencoders. For each model, we study the performance in two categories of classification (binary and multiclass) under two new real traffic datasets, namely, the CSE-CIC-IDS2018 dataset and the Bot-IoT dataset. In addition, we use the most important performance indicators, namely, accuracy, false alarm rate, and detection rate for evaluating the efficiency of several methods.

464 citations


Cites background from "Toward Generating a New Intrusion D..."

  • ...[6], which implements attacks include Brute Force SSH, DoS, Heartbleed, Web Attack, Infiltration, Botnet and DDoS, and Brute Force FTP....

    [...]

  • ...In order to test the efficiency of such mechanisms, reliable datasets are needed that (i) contain both benign and several attacks, (ii) meet real world criteria, and (iii) are publicly available [6]....

    [...]

References
More filters
Proceedings ArticleDOI
08 Jul 2009
TL;DR: A new data set is proposed, NSL-KDD, which consists of selected records of the complete KDD data set and does not suffer from any of mentioned shortcomings.
Abstract: During the last decade, anomaly detection has attracted the attention of many researchers to overcome the weakness of signature-based IDSs in detecting novel attacks, and KDDCUP'99 is the mostly widely used data set for the evaluation of these systems. Having conducted a statistical analysis on this data set, we found two important issues which highly affects the performance of evaluated systems, and results in a very poor evaluation of anomaly detection approaches. To solve these issues, we have proposed a new data set, NSL-KDD, which consists of selected records of the complete KDD data set and does not suffer from any of mentioned shortcomings.

3,300 citations


"Toward Generating a New Intrusion D..." refers background or methods in this paper

  • ...KDD’99 (University of California, Irvine 1998-99): This dataset is an updated version of the DARPA98, by processing the tcpdump portion....

    [...]

  • ...This dataset has a large number of redundant records and is studded by data corruptions that led to skewed testing results (Tavallaee et al., 2009)....

    [...]

  • ...NSL-KDD was created using KDD (Tavallaee et al., 2009) to address some of the KDD’s shortcomings (McHugh, 2000)....

    [...]

Journal ArticleDOI
TL;DR: The purpose of this article is to attempt to identify the shortcomings of the Lincoln Lab effort in the hope that future efforts of this kind will be placed on a sounder footing.
Abstract: In 1998 and again in 1999, the Lincoln Laboratory of MIT conducted a comparative evaluation of intrusion detection systems (IDSs) developed under DARPA funding. While this evaluation represents a significant and monumental undertaking, there are a number of issues associated with its design and execution that remain unsettled. Some methodologies used in the evaluation are questionable and may have biased its results. One problem is that the evaluators have published relatively little concerning some of the more critical aspects of their work, such as validation of their test data. The appropriateness of the evaluation techniques used needs further investigation. The purpose of this article is to attempt to identify the shortcomings of the Lincoln Lab effort in the hope that future efforts of this kind will be placed on a sounder footing. Some of the problems that the article points out might well be resolved if the evaluators were to publish a detailed description of their procedures and the rationale that led to their adoption, but other problems would clearly remain./par>

1,346 citations


"Toward Generating a New Intrusion D..." refers background or methods in this paper

  • ...Moreover, it lacks actual attack data records (McHugh, 2000) (Brown et al....

    [...]

  • ...Moreover, it lacks actual attack data records (McHugh, 2000) (Brown et al., 2009)....

    [...]

  • ..., 2009) to address some of the KDD’s shortcomings (McHugh, 2000)....

    [...]

  • ...NSL-KDD was created using KDD (Tavallaee et al., 2009) to address some of the KDD’s shortcomings (McHugh, 2000)....

    [...]

Journal ArticleDOI
TL;DR: The intent for this dataset is to assist various researchers in acquiring datasets of this kind for testing, evaluation, and comparison purposes, through sharing the generated datasets and profiles.

1,050 citations

Proceedings ArticleDOI
01 Jan 2017
TL;DR: A time analysis on Tor traffic flows is presented, captured between the client and the entry node, to detect the application type: Browsing, Chat, Streaming, Mail, Voip, P2P or File Transfer.
Abstract: Traffic classification has been the topic of many research efforts, but the quick evolution of Internet services and the pervasive use of encryption makes it an open challenge. Encryption is essential in protecting the privacy of Internet users, a key technology used in the different privacy enhancing tools that have appeared in the recent years. Tor is one of the most popular of them, it decouples the sender from the receiver by encrypting the traffic between them, and routing it through a distributed network of servers. In this paper, we present a time analysis on Tor traffic flows, captured between the client and the entry node. We define two scenarios, one to detect Tor traffic flows and the other to detect the application type: Browsing, Chat, Streaming, Mail, Voip, P2P or File Transfer. In addition, with this paper we publish the Tor labelled dataset we generated and used to test our classifiers.

425 citations


"Toward Generating a New Intrusion D..." refers methods in this paper

  • ...We begin to extract the 80 traffic features from the dataset using CICFlowMeter (CICFlowMeter, 2017),(Habibi Lashkari et al., 2017)....

    [...]

  • ...For extracting the network traffic features, we used the CICFlowMeter (CICFlowMeter, 2017), (Habibi Lashkari et al., 2017), which is a flow based feature extractor and can extract 80 features from a pcap file....

    [...]

  • ...The dataset is completely labelled and more than 80 network traffic features extracted and calculated for all benign and intrusive flows by using CICFlowMeter software which is publicly available in Canadian Institute for Cybersecurity website (Habibi Lashkari et al., 2017)....

    [...]

Proceedings ArticleDOI
07 Apr 2013
TL;DR: A new publicly available dataset is introduced which is representative of modern attack structure and methodology and is contrasted with the legacy datasets, and the performance difference of commonly used intrusion detection algorithms is highlighted.
Abstract: Intrusion detection systems are generally tested using datasets compiled at the end of last century, justified by the need for publicly available test data and the lack of any other alternative datasets. Prominent amongst this legacy group is the KDD project. Whilst a seminal contribution at the time of compilation, these datasets no longer represent relevant architecture or contemporary attack protocols, and are beset by data corruptions and inconsistencies. Hence, testing of new IDS approaches against these datasets does not provide an effective performance metric, and contributes to erroneous efficacy claims. This paper introduces a new publicly available dataset which is representative of modern attack structure and methodology. The new dataset is contrasted with the legacy datasets, and the performance difference of commonly used intrusion detection algorithms is highlighted.

268 citations


"Toward Generating a New Intrusion D..." refers methods in this paper

  • ...ADFA (University of New South Wales 2013): This dataset includes normal training and validating data and 10 attacks per vector (Creech and Hu, 2013)....

    [...]