scispace - formally typeset
Search or ask a question
Journal ArticleDOI

An Ensemble Model for Classification of Attacks with Feature Selection based on KDD99 and NSL-KDD Data Set

20 Aug 2014-International Journal of Computer Applications (Foundation of Computer Science (FCS))-Vol. 99, Iss: 15, pp 8-13
TL;DR: This research paper has proposed ANN-Bayesian Net-GR technique that means ensemble of Artificial Neural Network (ANN) and Bayesian Net with Gain Ratio (GR) feature selection technique and its ensemble model produces highest accuracy compare to others.
Abstract: Information security is extremely critical issues for every organization to protect information from unauthorized access. Intrusion detection system has one of the important roles to prevent data or information from malicious behaviours. Basically Intrusion detection system is a classifier that can classify the data as normal or attacks. In this research paper, we have proposed ANN-Bayesian Net-GR technique that means ensemble of Artificial Neural Network (ANN) and Bayesian Net with Gain Ratio (GR) feature selection technique. We have applied various individual classification techniques and its ensemble model on KDD99 and NSL-KDD data set to check the robustness of model. Due to irrelevant features in data set, also applied Gain Ratio feature selection technique on best model. Finally our proposed model produces highest accuracy compare to others.

Content maybe subject to copyright    Report

Citations
More filters
01 Jan 2002

9,314 citations

Journal ArticleDOI
TL;DR: A new hybrid model can be used to estimate the intrusion scope threshold degree based on the network transaction data’s optimal features that were made available for training and revealed that the hybrid approach had a significant effect on the minimisation of the computational and time complexity involved when determining the feature association impact scale.

484 citations


Cites methods from "An Ensemble Model for Classificatio..."

  • ...[35] proposed using an ensemble of Bayes net and artificial neural network (ANN) to classify attacks and normal data for NSL-KDD data sets....

    [...]

Journal ArticleDOI
Gao Xianwei1, Chun Shan1, Changzhen Hu1, Zequn Niu1, Liu Zhen1 
TL;DR: It is proved that the ensemble model effectively improves detection accuracy, and it is found that the quality of data features is an important factor to determine the detection effect.
Abstract: In recent years, advanced threat attacks are increasing, but the traditional network intrusion detection system based on feature filtering has some drawbacks which make it difficult to find new attacks in time. This paper takes NSL-KDD data set as the research object, analyses the latest progress and existing problems in the field of intrusion detection technology, and proposes an adaptive ensemble learning model. By adjusting the proportion of training data and setting up multiple decision trees, we construct a MultiTree algorithm. In order to improve the overall detection effect, we choose several base classifiers, including decision tree, random forest, kNN, DNN, and design an ensemble adaptive voting algorithm. We use NSL-KDD Test+ to verify our approach, the accuracy of the MultiTree algorithm is 84.2%, while the final accuracy of the adaptive voting algorithm reaches 85.2%. Compared with other research papers, it is proved that our ensemble model effectively improves detection accuracy. In addition, through the analysis of data, it is found that the quality of data features is an important factor to determine the detection effect. In the future, we should optimize the feature selection and preprocessing of intrusion detection data to achieve better results.

238 citations


Cites methods from "An Ensemble Model for Classificatio..."

  • ...Shrivas [13] proposed ANNBayesian Net-GR technique that means ensemble of Artificial Neural Network (ANN) and Bayesian Net with Gain Ratio (GR) feature selection technique....

    [...]

Journal ArticleDOI
TL;DR: The proposed work, deploys filter and wrapper based method with firefly algorithm in the wrapper for selecting the features, and shows that 10 features are sufficient to detect the intrusion showing improved accuracy.

215 citations

Journal ArticleDOI
TL;DR: A new method to binarize a continuous pigeon inspired optimizer is proposed and compared to the traditional way for binarizing continuous swarm intelligent algorithms.
Abstract: Feature selection plays a vital role in building machine learning models. Irrelevant features in data affect the accuracy of the model and increase the training time needed to build the model. Feature selection is an important process to build Intrusion Detection System (IDS). In this paper, a wrapper feature selection algorithm for IDS is proposed. This algorithm uses the pigeon inspired optimizer to utilize the selection process. A new method to binarize a continuous pigeon inspired optimizer is proposed and compared to the traditional way for binarizing continuous swarm intelligent algorithms. The proposed algorithm was evaluated using three popular datasets: KDDCUP99, NLS-KDD and UNSW-NB15. The proposed algorithm outperformed several feature selection algorithms from state-of-the-art related works in terms of TPR, FPR, accuracy, and F-score. Also, the proposed cosine similarity method for binarizing the algorithm has a faster convergence than the sigmoid method.

206 citations

References
More filters
Book
08 Sep 2000
TL;DR: This book presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects, and provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data.
Abstract: The increasing volume of data in modern business and science calls for more complex and sophisticated tools. Although advances in data mining technology have made extensive data collection much easier, it's still always evolving and there is a constant need for new techniques and tools that can help us transform this data into useful information and knowledge. Since the previous edition's publication, great advances have been made in the field of data mining. Not only does the third of edition of Data Mining: Concepts and Techniques continue the tradition of equipping you with an understanding and application of the theory and practice of discovering patterns hidden in large data sets, it also focuses on new, important topics in the field: data warehouses and data cube technology, mining stream, mining social networks, and mining spatial, multimedia and other complex data. Each chapter is a stand-alone guide to a critical topic, presenting proven algorithms and sound implementations ready to be used directly or with strategic modification against live data. This is the resource you need if you want to apply today's most powerful data mining techniques to meet real business challenges. * Presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects. * Addresses advanced topics such as mining object-relational databases, spatial databases, multimedia databases, time-series databases, text databases, the World Wide Web, and applications in several fields. *Provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data

23,600 citations

01 Jan 2002

9,314 citations


"An Ensemble Model for Classificatio..." refers background in this paper

  • ...Keywords Intrusion Detection System, Artificial Neural Network (ANN), Ensemble Model, Feature Selection (FS), Gain Ratio (GR)....

    [...]

  • ...In this proposed model, we have ensemble two techniques as Artificial Neural Network (ANN) and Bayesian Net....

    [...]

  • ...An Artificial Neural Network [3] is composed of a set of elementary computational units, called neurons, connected together through weighted connections....

    [...]

  • ...In this research paper, we have proposed ANN-Bayesian Net-GR technique that means ensemble of Artificial Neural Network (ANN) and Bayesian Net with Gain Ratio (GR) feature selection technique....

    [...]

  • ...3 Bayesian Net Bayesian Net [3] is statistical classifiers which can predict class membership probabilities, such as the probability that a given tuple belong to a particular class....

    [...]

Book
01 Jan 2001
TL;DR: This chapter discusses the design and analysis of experiments in the context of response surface methodology, and some of the techniques used in this work were new to the literature at the time.
Abstract: Funkenbusch, P. (2005), Practical Guide to Designed Experiments, New York: Marcel Dekker. Grice, J. (2000), Review of Design and Analysis of Experiments (4th ed.), by D. Montgomery, Technometrics, 42, 208–209. Myers, R., and Montgomery, D. (2002), Response Surface Methodology (2nd ed.), New York: Wiley. Ziegel, E. (2001), Editor’s Report on Design and Analysis of Experiments (5th ed.), by R. Myers and D. Montgomery, Technometrics, 43, 245. (2002), Editor’s Report on Response Surface Methodology (2nd ed.), by R. Myers and D. Mongtomery, Technometrics, 44, 298–299.

1,294 citations

Journal ArticleDOI
TL;DR: With the combination of clustering method, ant colony algorithm and support vector machine, an efficient and reliable classifier is developed to judge a network visit to be normal or not.
Abstract: The efficiency of the intrusion detection is mainly depended on the dimension of data features. By using the gradually feature removal method, 19 critical features are chosen to represent for the various network visit. With the combination of clustering method, ant colony algorithm and support vector machine (SVM), an efficient and reliable classifier is developed to judge a network visit to be normal or not. Moreover, the accuracy achieves 98.6249% in 10-fold cross validation and the average Matthews correlation coefficient (MCC) achieves 0.861161.

332 citations


"An Ensemble Model for Classificatio..." refers methods in this paper

  • ...[6] have applied various feature reduction method on KDD99 data set....

    [...]