Enhancing MLP Performance in Intrusion Detection using Optimal Feature Subset Selection based on Genetic Principal Components
About: This article is published in Applied Mathematics & Information Sciences.The article was published on 2014-03-01 and is currently open access. It has received 3 citation(s) till now. The article focuses on the topic(s): Feature (computer vision) & Selection (genetic algorithm).
Summary (2 min read)
- Security breach in network today has become one of the major problems since a single of it may cause a significant loss or damage to the information systems.
- A number of previous intrusion detection techniques, in response, have attempted to focus on the issues of feature extraction and classification, yet less importance unfortunately has been given to the serious issue of feature selection.
- In past, a subset of features was selected using PCA based on some percentage of the top principal components.
- The standard dataset, KDD cup dataset, also was used to validate the proposed model.
- The experimental results then illustrate significant performance improvements.
3 Proposed Model
- The proposed model consists of four sections; Dataset, Feature selection, Classification and Training & Testing.
- The selection of dataset is considered as an important issue in the intrusion detection considering that an accurate dataset can result in accurate results.
- Dataset collection could be conducted in several ways; those are real time, simulated and test-bed, each of which has various issues.
- Feature selection, meanwhile, was accomplished through GA and PCA due to their proven ability in feature selection.
- The selected features sets were presented to the classifier to determine their sensitivity and importance.
- This work used KDD cup dataset, considered as a standard in the evaluation of intrusion detection techniques.
- From the dataset, 20.0 0 connections were randomly selected in which each connection of raw dataset consisted of 41 features.
- Thus, the features remained 38 in each record.
3.2 Feature selection
- The suitable feature set simplified the classifier architecture as well as improved its overall performance.
- The feature selection process and its applied techniques; PCA and GA are explained respectively as follows.
- Assuming a population of size N, the offspring then doubled the size of the population and selected the best top 10 percent individuals from the combined parent-offspring population.
- For one-point crossover, the parent chromosomes were divided at a common point chosen randomly and the resulting sub-chromosomes are swapped.
- Therefore, the fitness evaluation contains two terms:(i) accuracy and(ii) the number of selected features.
- The selected features were presented to MLP for classification.
- First, its processing elements (PEs) or neurons are nonlinear.
- Second, they are massively interconnected such that any element of a given layer feeds all the elements of the next layer.
- MLP architecture used consists of three layers; namely input, hidden and output.
- Algorithm Input: training−examples,η ,φ ,net Output:trainednetwork Initialize all weights o f net; for each pair<−→x ,−→t >∈ training-examples do Step 1:Forward phase:.
3.4 Training and Testing
- In the training phase, input patterns and desired outputs are given related to each input vector.
- To achieve this goal, weights are updated by carrying out certain steps known as training.
- Testing of trained system involves two steps;(i) verification step, and(ii) generalization step.
- The parametric specification used for MLP architecture during testing phase is given in Table2.
- Thus, research work achieved this objective by using GA and PCA that made the classifier simpler as well as more efficient in performance.
4 Experimental Results
- The MLP based intrusion analysis engine was evaluated on different feature subsets.
- This section presents MLP results and their sensitivity analysis in different scenarios.
- First of all, MLP was tested on original dataset without c© 2014 NSP Natural Sciences Publishing Cor. using PCA and GA, which consisted of 38 features.
- Five thousand exemplars or input samples were randomly selected from twenty thousand dataset.
- Five thousand exemplars contained two types of connections; normal and intrusive, in which 3,223 were normal and 1,777 were intrusive.
4.1 Comparison with existing Approaches
- Comparison of the performance of the developed system is done with some other intrusion detection approaches introduced in related work section.
- The SVM converges to the optimal solution in 1000 epochs while MLP converge in 173 epochs.
- Therefore, MLP is considered a good classifier for intrusion analysis due to its proven ability to handle large data such as traffic data on networks, less number of epochs and time in training process.
- This process reduced the number of features to ten features as compared to previous approach  which has twelve features.
- Table 7 shows the comparative analysis of applied approach with other approaches.
- A performance enhancement model is proposed for intrusion detection system based on an optimal feature subset selection using several genetic c© 2014 NSP Natural Sciences Publishing Cor. principal components.
- The feature selection has been accomplished using the techniques of PCA and GA.
- The selected principal components called genetic principal components are the basis of feature subsets.
- The KDD-cup dataset used is a benchmark to evaluate the security detection mechanisms.
- The performance of applied approach was then addressed.
Did you find this useful? Give us your feedback
Cites background from "Enhancing MLP Performance in Intrus..."
...But there is possibility to miss several important features and to include irrelevant features in feature subset during this process ....
Related Papers (5)
Frequently Asked Questions (1)
Q1. What are the contributions mentioned in the paper "Enhancing mlp performance in intrusion detection using optimal feature subset selection based on genetic principal components" ?
To overcome this issue, Principal Component Analysis ( PCA ) has been used to project a number of raw features on principal feature space and to select the features based on their sensitivity determined by the magnitude of eigenvalues. The focus of this research is to observe a space of principal features to find a subset of sensitive features to the classifier, which can optimize the detection accuracy.