scispace - formally typeset
Search or ask a question
Author

Vaibhav Khatavkar

Bio: Vaibhav Khatavkar is an academic researcher from College of Engineering, Pune. The author has contributed to research in topics: Latent semantic analysis & Context (language use). The author has an hindex of 2, co-authored 10 publications receiving 21 citations.

Papers
More filters
Journal ArticleDOI
TL;DR: The approach uses supervised algorithm as a black box and then filters the unlabelled data with predicted label for training the system, providing very low false alarms as compare to heuristic/anomaly based IDS.
Abstract: Network security is becoming increasingly important in today’s internet-worked systems. With the development of internet, its use on public networks, the number and the severity of security threats has increased significantly. Intrusion Detection System can provide a layer of security to these systems. Intrusion Detection can be defined as "the act of detecting actions that attempt to compromise the confidentiality, integrity or availability of a resource”. More specifically, the goal of intrusion detection system is to identify entities who attempt to subvert in-place security controls. At present, two fundamental problems, quantity and quality of the outputs i.e. false alarms or alerts of IDS, have not been solved well. The pattern of attack changes frequently. Thus IDS should upgrade accordingly. The changes in patterns are mainly the manifestations of attack. Pattern based IDS provides very low false alarms as compare to heuristic/anomaly based IDS. In real world it is very difficult to have large labeled data for training. Supervised approach can't be used in this case. So in this work we propose a semi-supervised approach for pattern based IDS. Our approach uses supervised algorithm as a black box and then filters the unlabelled data with predicted label for training the system. The experimentation is performed on KDD CUP99 dataset and NSL KDD data which is revised KDD CUP 99 data.

12 citations

Proceedings ArticleDOI
01 Jan 2017
TL;DR: The research work proposes necessity of Context Vector Machine in the domain of IR to get appropriate document from the sets of documents.
Abstract: Information Retrieval (IR) is the need of todays digital world. It is important for a user to get appropriate document from the sets of documents. In order to do so, researchers are working in the domain of IR mainly on the concepts like Document Clustering and classification, Ontology of document, Thematic of document, Concept of document, Context of document, etc. The research work proposes necessity of Context Vector Machine.

5 citations

Book ChapterDOI
01 Jan 2019
TL;DR: This paper attempts to study the effect of Latent Semantic Analysis (LSA) on SVM, a prominent technique used for classifying large datasets.
Abstract: Document Classification is a key technique in Information Retrieval. Various techniques have been developed for document classification. Every technique aims for higher accuracy and greater speed. Its performance depends on various parameters like algorithms, size, and type of dataset used. Support Vector Machine (SVM) is a prominent technique used for classifying large datasets. This paper attempts to study the effect of Latent Semantic Analysis (LSA) on SVM. LSA is used for dimensionality reduction. The performance of SVM is studied on reduced dataset generated by LSA.

2 citations

Book ChapterDOI
01 Jan 2019
TL;DR: A system which can give “context vector” of the document set using Latent Semantic Analysis which is the most trending method in document analysis is proposed which is tested on BBC news dataset and proves to be successful.
Abstract: Document analysis is one of the emerging area of research in the field of Information Retrieval. Many attempts have been made for retrieving information from a document using various machine learning algorithms. A concept of context vector is frequently used in information retrieval from document/s. Context Vector is an vector, which is used for various feature selection from documents, automatic classification of text documents, Subject Verb Agreement, etc. This paper discusses, the attempts made in the field of Information Retrieval (IR) from document using context vector. It also discuss about pros and cons of each attempt. This paper propose a system which can give “context vector” of the document set using Latent Semantic Analysis which is the most trending method in document analysis. The system is tested on BBC news dataset and proves to be successful.

2 citations

Book ChapterDOI
02 Jan 2012
TL;DR: The Pattern Based Algorithm is proposed which has high detection rate and low false alarm rate, and the performance of the proposed method is more effective than other semi supervised algorithms used for intrusion detection.
Abstract: Intrusion detection aims at distinguishing the behavior of the network. Due to rapid development of attack pattern, it is necessary to develop a system which can upgrade itself according to new attacks. Also detection rate should be high since attack rate on the network is very high. In response to this problem, Pattern Based Algorithm is proposed which has high detection rate and low false alarm rate. The work is divided into three parts: supervised approach, semi-supervised and unsupervised approach. Besides supervised learning approach, semi-supervised learning has attracted much attention in pattern recognition and machine learning for intrusion detection. Most of the semi supervised algorithms used for intrusion detection are binary classifiers, but our approach is to classify the data into multiclass. Our experimental results on KDD cup data set shows that the performance of the proposed method is more effective.

2 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: This survey focuses on intrusion detection systems (IDS) that leverage host-based data sources for detecting attacks on enterprise network, presenting targeted sub-surveys of HIDS research leveraging system logs, audit data, Windows Registry, file systems, and program analysis.
Abstract: This survey focuses on intrusion detection systems (IDS) that leverage host-based data sources for detecting attacks on enterprise network. The host-based IDS (HIDS) literature is organized by the input data source, presenting targeted sub-surveys of HIDS research leveraging system logs, audit data, Windows Registry, file systems, and program analysis. While system calls are generally included in audit data, several publicly available system call datasets have spawned a flurry of IDS research on this topic, which merits a separate section. To accommodate current researchers, a section giving descriptions of publicly available datasets is included, outlining their characteristics and shortcomings when used for IDS evaluation. Related surveys are organized and described. All sections are accompanied by tables concisely organizing the literature and datasets discussed. Finally, challenges, trends, and broader observations are throughout the survey and in the conclusion along with future directions of IDS research. Overall, this survey was designed to allow easy access to the diverse types of data available on a host for sensing intrusion, the progressions of research using each, and the accessible datasets for prototyping in the area.

74 citations

Book ChapterDOI
07 Sep 2015
TL;DR: It is demonstrated that an accurate detector of malicious behavior in network traffic can be obtained from the collected security intelligence data by using a Multiple Instance Learning algorithm tailored to the Neyman-Pearson problem.
Abstract: We address the problem of learning a detector of malicious behavior in network traffic. The malicious behavior is detected based on the analysis of network proxy logs that capture malware communication between client and server computers. The conceptual problem in using the standard supervised learning methods is the lack of sufficiently representative training set containing examples of malicious and legitimate communication. Annotation of individual proxy logs is an expensive process involving security experts and does not scale with constantly evolving malware. However, weak supervision can be achieved on the level of properly defined bags of proxy logs by leveraging internet domain black lists, security reports, and sandboxing analysis. We demonstrate that an accurate detector can be obtained from the collected security intelligence data by using a Multiple Instance Learning algorithm tailored to the Neyman-Pearson problem. We provide a thorough experimental evaluation on a large corpus of network communications collected from various company network environments.

35 citations

Posted Content
TL;DR: In this paper, a survey focusing on intrusion detection systems (IDS) that leverage host-based data sources for detecting attacks on enterprise network is presented, along with challenges, trends, and broader observations.
Abstract: This survey focuses on intrusion detection systems (IDS) that leverage host-based data sources for detecting attacks on enterprise network. The host-based IDS (HIDS) literature is organized by the input data source, presenting targeted sub-surveys of HIDS research leveraging system logs, audit data, Windows Registry, file systems, and program analysis. While system calls are generally included in audit data, several publicly available system call datasets have spawned a flurry of IDS research on this topic, which merits a separate section. Similarly, a section surveying algorithmic developments that are applicable to HIDS but tested on network data sets is included, as this is a large and growing area of applicable literature. To accommodate current researchers, a supplementary section giving descriptions of publicly available datasets is included, outlining their characteristics and shortcomings when used for IDS evaluation. Related surveys are organized and described. All sections are accompanied by tables concisely organizing the literature and datasets discussed. Finally, challenges, trends, and broader observations are throughout the survey and in the conclusion along with future directions of IDS research.

31 citations

Journal Article
TL;DR: The architecture, protocol, type of infection, communication interval, attacks and evasion techniques of these botnets are probed and studies on mitigation and detection of various aspects of botnets and new trends in botnet communication channels are reviewed.
Abstract: Mitigating the destructive effect of botnets is a concern of security scholars. Though various mechanisms are proposed for botnets detection, real world botnets still survive and do their harmful operations. Botnets have developed new evasion techniques and covert communication channels. Knowing the characteristics of real world botnets helps security researchers in developing more robust detection methods. There are some surveys in the literature that study botnet detection methods; however they do not advert to real world botnets a lot. In this paper, we study various aspects of several real world botnets, i.e. Conficker, Kraken, Rustock, Storm, TDL4, Torpig, Waledac, Zeus and P2P Zeus. Architecture, protocol, type of infection, communication interval, attacks and evasion techniques of these botnets are probed in this paper. Moreover, studies on mitigation and detection of various aspects of botnets and new trends in botnet communication channels are reviewed.

27 citations

Proceedings ArticleDOI
01 Nov 2016
TL;DR: A narrative literature review has been chosen as a method to review the developments of SS-IDS from 2008 to 2015, showing that the accuracy of the proposed SS-IDS is low and false alarms of the IDS tend to be high.
Abstract: The increasing number of attacks on computer networks has caused network securities were an important issue since the first network security breaches were discovered in 1980. Currently, the pattern of network attacks becomes more sophisticated and lead to difficulty in detecting the attacks. Failure to prevent the attacks makes privacy, data, and other network resources are threatened. There is numerous network intrusion detection system (NIDS) have been proposed to tackle the network security threats in which the detection methods can be grouped into supervised, unsupervised and semi-supervised. Several artificial intelligence-based methods have also been considered in the NIDS to improve detection accuracies, such as fuzzy, machine learning, support vector machine (SVM) and k-means. Unfortunately, a literature study on semi-supervised intrusion detection system IDS (SS-IDS) is difficult to be found. Indeed, most of IDS literature studies are only focused on supervised and unsupervised detection methods. Consequently, the latest developments and issues on SS-IDS are difficult to be traced quickly. On the other hand, many semi-supervised methods and implementation the methods on IDS have been carried out since 2008. This research conducts a literature study on SS-IDS to tackle the issue by reviewing the developments of SS-IDS from 2008 to 2015. A narrative literature review has been chosen as a method to review the SS-IDS literature. A narrative literature review is a method of scientific publications that addresses specific topics of theoretical viewpoints and contextual. The review results show that the accuracy of the proposed SS-IDS is low. In addition false alarms of the IDS tend to be high.

17 citations