scispace - formally typeset
Search or ask a question
Author

I. Sumaiya Thaseen

Bio: I. Sumaiya Thaseen is an academic researcher from VIT University. The author has contributed to research in topics: Feature selection & Intrusion detection system. The author has an hindex of 8, co-authored 23 publications receiving 190 citations.

Papers
More filters
Proceedings ArticleDOI
01 Nov 2014
TL;DR: A novel method of integrating principal component analysis (PCA) and support vector machine (SVM) by optimizing the kernel parameters using automatic parameter selection technique is proposed, which reduces the training and testing time to identify intrusions thereby improving the accuracy.
Abstract: Intrusion detection systems (IDS) play a major role in detecting the attacks that occur in the computer or networks. Anomaly intrusion detection models detect new attacks by observing the deviation from profile. However there are many problems in the traditional IDS such as high false alarm rate, low detection capability against new network attacks and insufficient analysis capacity. The use of machine learning for intrusion models automatically increases the performance with an improved experience. This paper proposes a novel method of integrating principal component analysis (PCA) and support vector machine (SVM) by optimizing the kernel parameters using automatic parameter selection technique. This technique reduces the training and testing time to identify intrusions thereby improving the accuracy. The proposed method was tested on KDD data set. The datasets were carefully divided into training and testing considering the minority attacks such as U2R and R2L to be present in the testing set to identify the occurrence of unknown attack. The results indicate that the proposed method is successful in identifying intrusions. The experimental results show that the classification accuracy of the proposed method outperforms other classification techniques using SVM as the classifier and other dimensionality reduction or feature selection techniques. Minimum resources are consumed as the classifier input requires reduced feature set and thereby minimizing training and testing overhead time.

57 citations

Journal ArticleDOI
TL;DR: The aim of this paper is to identify the critical features required in the construction of intrusion detection model, thereby achieving the maximum accuracy and to utilize an ensemble approach of classifiers with minimum complexity to overcome the issues in the existing ensemble-based intrusion detection models.
Abstract: Intrusion detection system is a device or software application that monitors a network of systems to identify any malicious activity or policy violations. In order to identify intrusions or normal activity, IDS would consider different network-related features such as source address, protocol and flag. The major challenge for any intrusion detection model is to achieve maximum accuracy with minimal false alarms. The aim of this paper is to identify the critical features required in the construction of intrusion detection model, thereby achieving the maximum accuracy. The model utilizes an ensemble approach of classifiers with minimum complexity to overcome the issues in the existing ensemble-based intrusion detection models. In this paper, Chi-square feature selection and the ensemble of classifiers such as support vector machine (SVM), modified Naive Bayes (MNB) and LPBoost are utilized to develop an intrusion detection model. The motivation for selecting Chi-square feature selection is that they rank the features based on the statistical significance test and consider only those features that are dependent on the class label. Supervised classifiers are highly consistent and produce precise results as the use of training data improves the ability to distinguish between classes with similar features. Experimental results indicate high accuracy in comparison with base classifiers by the ensemble of LPBoost. As there is a huge class imbalance present in the network traffic, the prediction of the class label by a majority voting of SVM, MNB and LPBoost is an optimal solution in preference to reliance on a single classifier.

56 citations

Journal ArticleDOI
01 Feb 2021
TL;DR: A correlation‐based feature selection integrated with neural network for identifying anomalies and the results show that the proposed model is superior in terms of accuracy, sensitivity, and specificity in comparison with some of the state‐of‐the‐art techniques.
Abstract: Serious concerns regarding vulnerability and security have been raised as a result of the constant growth of computer networks. Intrusion detection systems (IDS) have been adopted by netwo...

38 citations

Proceedings ArticleDOI
27 Mar 2014
TL;DR: A hybrid technique integrating Latent Dirichlet Allocation and genetic algorithm namely the G-LDA process, which has a better accuracy for detecting known and unknown attacks and a low false positive rate is proposed.
Abstract: Anomaly detection is one of the important challenges of network security associated today. We present a novel hybrid technique called G-LDA to identify the anomalies in network traffic. We propose a hybrid technique integrating Latent Dirichlet Allocation and genetic algorithm namely the G-LDA process. Furthermore, feature selection plays an important role in identifying the subset of attributes for determining the anomaly packets. The proposed method is evaluated by carrying out experiments on KDDCUP'99 dataset. The experimental results reveal that the hybrid technique has a better accuracy for detecting known and unknown attacks and a low false positive rate.

27 citations

Journal ArticleDOI
TL;DR: A major aim of this paper is to design a robust technique for extracting, transforming Landsat images to numerical data and pre-processing the data for classifying the soil property.
Abstract: Traditional technique for determining the soil texture and other soil properties is performed in laboratory which is a time consuming task. In this paper, machine learning algorithms are deployed to classify the soil texture and its properties without any intervention of laboratory equipment using the satellite images recorded by Landsat 8. These images are used to extract the terrain properties of the region which is integrated with weather data for the specific region and the vegetation index which are the major factors affecting the soil condition. A major aim of this paper is to design a robust technique for extracting, transforming Landsat images to numerical data and pre-processing the data for classifying the soil property. Minimum Noise Fraction (MNF) is utilized to segregate and remove noise from the Landsat images for subsequent processing. A significant amount of noise is present in the raw data which affects the accuracy of the analysis. Terrain features are extracted after noise removal from the MNF transformed images and merged with the weather data, and vegetation index for a period of time and then classified using voting classifier of the ensemble modeling or analysis of the soil texture of the region. The voting is performed by integrating the results of logistic regression, support vector machine and decision tree. With this study, the consolidated dependence of the soil texture on the environmental factors is analyzed and a cross validation accuracy of 94.44% is obtained.

23 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: There are issues and challenges that hinder the performance of FDSs, such as concept drift, supports real time detection, skewed distribution, large amount of data etc, which are provided in this survey paper.

403 citations

Journal ArticleDOI
TL;DR: The main idea behind this model is to construct a multi class SVM which has not been adopted for IDS so far to decrease the training and testing time and increase the individual classification accuracy of the network attacks.

321 citations

Journal ArticleDOI
TL;DR: In this article, the authors provide an overview of unsupervised learning in the domain of networking, and provide a comprehensive review of the current state of the art in this area, by synthesizing insights from previous survey papers.
Abstract: While machine learning and artificial intelligence have long been applied in networking research, the bulk of such works has focused on supervised learning. Recently, there has been a rising trend of employing unsupervised machine learning using unstructured raw network data to improve network performance and provide services, such as traffic engineering, anomaly detection, Internet traffic classification, and quality of service optimization. The growing interest in applying unsupervised learning techniques in networking stems from their great success in other fields, such as computer vision, natural language processing, speech recognition, and optimal control (e.g., for developing autonomous self-driving cars). In addition, unsupervised learning can unconstrain us from the need for labeled data and manual handcrafted feature engineering, thereby facilitating flexible, general, and automated methods of machine learning. The focus of this survey paper is to provide an overview of applications of unsupervised learning in the domain of networking. We provide a comprehensive survey highlighting recent advancements in unsupervised learning techniques, and describe their applications in various learning tasks, in the context of networking. We also provide a discussion on future directions and open research issues, while identifying potential pitfalls. While a few survey papers focusing on applications of machine learning in networking have previously been published, a survey of similar scope and breadth is missing in the literature. Through this timely review, we aim to advance the current state of knowledge, by carefully synthesizing insights from previous survey papers, while providing contemporary coverage of the recent advances and innovations.

182 citations

Journal ArticleDOI
TL;DR: A network intrusion detection algorithm combined hybrid sampling with deep hierarchical network is proposed, which uses convolution neural network to extract spatial features and Bi-directional long short-term memory to extract temporal features, which forms aDeep hierarchical network model.
Abstract: Intrusion detection system (IDS) plays an important role in network security by discovering and preventing malicious activities. Due to the complex and time-varying network environment, the network intrusion samples are submerged into a large number of normal samples, which leads to insufficient samples for model training and detection results with a high false detection rate. According to the problem of data imbalance, we propose a network intrusion detection algorithm combined hybrid sampling with deep hierarchical network. Firstly, we use the one-side selection (OSS) to reduce the noise samples in majority category, and then increase the minority samples by Synthetic Minority Over-sampling Technique (SMOTE). In this way, a balanced dataset can be established to make the model fully learn the features of minority samples and greatly reduce the model training time. Secondly, we use convolution neural network (CNN) to extract spatial features and Bi-directional long short-term memory (BiLSTM) to extract temporal features, which forms a deep hierarchical network model. The proposed network intrusion detection algorithm was verified by experiments on the NSL-KDD and UNSW-NB15 dataset, and the classification accuracy can achieve 83.58% and 77.16%, respectively.

173 citations

Journal ArticleDOI
TL;DR: An analysis of the UNSW-NB15 intrusion detection dataset is presented and a filter-based feature reduction technique using the XGBoost algorithm is applied that allows for methods such as the DT to increase its test accuracy from 88.13 to 90.85% for the binary classification scheme.
Abstract: Computer networks intrusion detection systems (IDSs) and intrusion prevention systems (IPSs) are critical aspects that contribute to the success of an organization. Over the past years, IDSs and IPSs using different approaches have been developed and implemented to ensure that computer networks within enterprises are secure, reliable and available. In this paper, we focus on IDSs that are built using machine learning (ML) techniques. IDSs based on ML methods are effective and accurate in detecting networks attacks. However, the performance of these systems decreases for high dimensional data spaces. Therefore, it is crucial to implement an appropriate feature extraction method that can prune some of the features that do not possess a great impact in the classification process. Moreover, many of the ML based IDSs suffer from an increase in false positive rate and a low detection accuracy when the models are trained on highly imbalanced datasets. In this paper, we present an analysis the UNSW-NB15 intrusion detection dataset that will be used for training and testing our models. Moreover, we apply a filter-based feature reduction technique using the XGBoost algorithm. We then implement the following ML approaches using the reduced feature space: Support Vector Machine (SVM), k-Nearest-Neighbour (kNN), Logistic Regression (LR), Artificial Neural Network (ANN) and Decision Tree (DT). In our experiments, we considered both the binary and multiclass classification configurations. The results demonstrated that the XGBoost-based feature selection method allows for methods such as the DT to increase its test accuracy from 88.13 to 90.85% for the binary classification scheme.

159 citations