scispace - formally typeset
Search or ask a question
Author

Constantinos Bassias

Bio: Constantinos Bassias is an academic researcher. The author has contributed to research in topics: Supervised learning & Anomaly detection. The author has an hindex of 2, co-authored 2 publications receiving 214 citations.

Papers
More filters
Proceedings ArticleDOI
09 Apr 2016
TL;DR: The system presents four key features: a big data behavioral analytics platform, an outlier detection system, a mechanism to obtain feedback from security analysts, and a supervised learning module, which is capable of learning to defend against unseen attacks.
Abstract: We present AI2, an analyst-in-the-loop security system where Analyst Intuition (AI) is put together with state-of-the-art machine learning to build a complete end-to-end Artificially Intelligent solution (AI). The system presents four key features: a big data behavioral analytics platform, an outlier detection system, a mechanism to obtain feedback from security analysts, and a supervised learning module. We validate our system with a real-world data set consisting of 3.6 billion log lines and 70.2 million entities. The results show that the system is capable of learning to defend against unseen attacks. With respect to unsupervised outlier analysis, our system improves the detection rate in 2.92× and reduces false positives by more than 5×.

233 citations

Patent
16 Dec 2016
TL;DR: In this paper, a method and system for training a big data machine to defend, retrieve log lines belonging to log line parameters of a system's data source and from incoming data traffic, compute features from the log lines, apply an adaptive rules model with identified threat labels produce a features matrix, identify statistical outliers from execution of statistical outlier detection methods, and may generate an outlier scores matrix.
Abstract: Disclosed herein are a method and system for training a big data machine to defend, retrieve log lines belonging to log line parameters of a system's data source and from incoming data traffic, compute features from the log lines, apply an adaptive rules model with identified threat labels produce a features matrix, identify statistical outliers from execution of statistical outlier detection methods, and may generate an outlier scores matrix. Embodiments may combine a top scores model and a probability model to create a single top scores vector. The single top scores vector and the adaptive rules model may be displayed on a GUI for labeling of malicious or non-malicious scores. Labeled output may be transformed into a labeled features matrix to create a supervised learning module for detecting new threats in real time and reducing the time elapsed between threat detection of the enterprise or e-commerce system.

16 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: N-BaIoT as discussed by the authors is a network-based anomaly detection method for the IoT that extracts behavior snapshots of the network and uses deep autoencoders to detect anomalous network traffic from compromised IoT devices.
Abstract: The proliferation of IoT devices that can be more easily compromised than desktop computers has led to an increase in IoT-based botnet attacks. To mitigate this threat, there is a need for new methods that detect attacks launched from compromised IoT devices and that differentiate between hours- and milliseconds-long IoT-based attacks. In this article, we propose a novel network-based anomaly detection method for the IoT called N-BaIoT that extracts behavior snapshots of the network and uses deep autoencoders to detect anomalous network traffic from compromised IoT devices. To evaluate our method, we infected nine commercial IoT devices in our lab with two widely known IoT-based botnets, Mirai and BASHLITE. The evaluation results demonstrated our proposed methods ability to accurately and instantly detect the attacks as they were being launched from the compromised IoT devices that were part of a botnet.

603 citations

Posted Content
TL;DR: A structured and comprehensive overview of research methods in deep learning-based anomaly detection, grouped state-of-the-art research techniques into different categories based on the underlying assumptions and approach adopted.
Abstract: Anomaly detection is an important problem that has been well-studied within diverse research areas and application domains. The aim of this survey is two-fold, firstly we present a structured and comprehensive overview of research methods in deep learning-based anomaly detection. Furthermore, we review the adoption of these methods for anomaly across various application domains and assess their effectiveness. We have grouped state-of-the-art research techniques into different categories based on the underlying assumptions and approach adopted. Within each category we outline the basic anomaly detection technique, along with its variants and present key assumptions, to differentiate between normal and anomalous behavior. For each category, we present we also present the advantages and limitations and discuss the computational complexity of the techniques in real application domains. Finally, we outline open issues in research and challenges faced while adopting these techniques.

522 citations

Journal ArticleDOI
TL;DR: This paper provides a large survey of published studies within the last 8 years, focusing on high-class imbalance in big data in order to assess the state-of-the-art in addressing adverse effects due to class imbalance.
Abstract: In a majority–minority classification problem, class imbalance in the dataset(s) can dramatically skew the performance of classifiers, introducing a prediction bias for the majority class. Assuming the positive (minority) class is the group of interest and the given application domain dictates that a false negative is much costlier than a false positive, a negative (majority) class prediction bias could have adverse consequences. With big data, the mitigation of class imbalance poses an even greater challenge because of the varied and complex structure of the relatively much larger datasets. This paper provides a large survey of published studies within the last 8 years, focusing on high-class imbalance (i.e., a majority-to-minority class ratio between 100:1 and 10,000:1) in big data in order to assess the state-of-the-art in addressing adverse effects due to class imbalance. In this paper, two techniques are covered which include Data-Level (e.g., data sampling) and Algorithm-Level (e.g., cost-sensitive and hybrid/ensemble) Methods. Data sampling methods are popular in addressing class imbalance, with Random Over-Sampling methods generally showing better overall results. At the Algorithm-Level, there are some outstanding performers. Yet, in the published studies, there are inconsistent and conflicting results, coupled with a limited scope in evaluated techniques, indicating the need for more comprehensive, comparative studies.

426 citations

Journal ArticleDOI
TL;DR: This paper presents a hybrid technique that combines supervised and unsupervised techniques to improve the fraud detection accuracy and shows that the combination is efficient and does indeed improve the accuracy of the detection.

196 citations

Journal ArticleDOI
TL;DR: This paper provides a survey of prediction, and forecasting methods used in cyber security, and discusses machine learning and data mining approaches, that have gained a lot of attention recently and appears promising for such a constantly changing environment, which is cyber security.
Abstract: This paper provides a survey of prediction, and forecasting methods used in cyber security. Four main tasks are discussed first, attack projection and intention recognition, in which there is a need to predict the next move or the intentions of the attacker, intrusion prediction, in which there is a need to predict upcoming cyber attacks, and network security situation forecasting, in which we project cybersecurity situation in the whole network. Methods and approaches for addressing these tasks often share the theoretical background and are often complementary. In this survey, both methods based on discrete models, such as attack graphs, Bayesian networks, and Markov models, and continuous models, such as time series and grey models, are surveyed, compared, and contrasted. We further discuss machine learning and data mining approaches, that have gained a lot of attention recently and appears promising for such a constantly changing environment, which is cyber security. The survey also focuses on the practical usability of the methods and problems related to their evaluation.

171 citations