scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Multidimensional Data Mining for Anomaly Extraction

29 Aug 2013-pp 5-8
TL;DR: By applying multidimensional mining rule to extract anomaly, the technique effectively finds the flows associated with the anomalous events and can reduce the work-hours needed for analyzing alarms and making anomaly systems more effective.
Abstract: Due to heavy traffic the network monitoring is very difficult and cumbersome job, hence the probability of network attacks increases substantially. So there is the need of extraction anomalies. Anomaly extraction means to find flows associated with the anomalous events, in a large set of flows observed during an anomalous time interval. Anomaly extraction is very important for root-cause analysis, network forensics, attack mitigation and anomaly modeling. To identify the suspicious flows, we use meta-data provided by several histogram based detectors and then apply association rule with multidimensional mining concept to find and summarize anomalous flows. By taking rich traffic data from a backbone network, we show that our technique effectively finds the flows associated with the anomalous events. So by applying multidimensional mining rule to extract anomaly, we can reduce the work-hours needed for analyzing alarms and making anomaly systems more effective.
Citations
More filters
Dissertation
01 Jan 2017
TL;DR: Soft Clustering Data Normalization PreProcessing Stage 1 DGDS Big Data set from the SGSC Project is compared to real-time data sets using the Hadoop 2.0 architecture.
Abstract: ion: Soft Clustering Data Normalization PreProcessing Stage 1 DGDS Big Data set from the SGSC Project

4 citations


Cites background from "Multidimensional Data Mining for An..."

  • ...Experiments have inferred how chunks of data can be given a ‘birds-eye-view’ to obtain large scale trend information, bypassing computations involving every single reading [103, 104]....

    [...]

Book ChapterDOI
24 May 2017
TL;DR: How behavior patterns and related anomalies comprehensively define a CPS is demonstrated to capture the complex knowledge encompassed in these data flows.
Abstract: Preset day cyber-physical systems (CPS) are the confluence of very large data sets, tight time constraints, and heterogeneous hardware units, ridden with latency and volume constraints, demanding newer analytic perspectives. Their system logistics can be well-defined by the data-streams’ behavioral trends across various modalities, without numerical restrictions, favoring resource-saving over methods of investigating individual component features and operations. The aim of this paper is to demonstrate how behavior patterns and related anomalies comprehensively define a CPS. Tensor decompositions are hypothesized as the solution in the context of multimodal smart-grid-originated Big Data analysis. Tensorial data representation is demonstrated to capture the complex knowledge encompassed in these data flows. The uniqueness of this approach is highlighted in the modified multiway anomaly patterns models. In addition, higher-order data preparation schemes, design and implementation of tensorial frameworks and experimental-analysis are final outcomes.
Proceedings ArticleDOI
01 Dec 2018
TL;DR: Investigating the applicability of an arithmetic tool Tensor Decompositions and Factorizations in this scenario proved that Abnormal patterns detected in decomposed Tensor factors encompass deep information energy content from Big Data as efficiently as other Pattern Extraction and Knowledge Discovery frameworks, while salvaging time and resources.
Abstract: The world today, as we know it, is profuse with information about humans and objects. Datasets generated by cyber-physical systems are orders of magnitude larger than their current information processing capabilities. Tapping into these big data flows to uncover much deeper perceptions into the functioning, operational logic and smartness levels attainable has been investigated for quite a while. Knowledge Discovery & Representation capabilities across mutiple modalities holds much scope in this direction, with regards to their information holding potential. This paper investigates the applicability of an arithmetic tool Tensor Decompositions and Factorizations in this scenario. Higher order datasets are decomposed for Anomaly Pattern capture which encases intelligence along multiple modes of data flow. Preliminary investigations based on data derived from Smart Grid Smart City Project are compliant with our hypothesis. The results proved that Abnormal patterns detected in decomposed Tensor factors encompass deep information energy content from Big Data as efficiently as other Pattern Extraction and Knowledge Discovery frameworks, while salvaging time and resources.
References
More filters
Journal ArticleDOI
TL;DR: It is believed that frequent pattern mining research has substantially broadened the scope of data analysis and will have deep impact on data mining methodologies and applications in the long run, however, there are still some challenging research issues that need to be solved before frequent patternmining can claim a cornerstone approach in data mining applications.
Abstract: Frequent pattern mining has been a focused theme in data mining research for over a decade. Abundant literature has been dedicated to this research and tremendous progress has been made, ranging from efficient and scalable algorithms for frequent itemset mining in transaction databases to numerous research frontiers, such as sequential pattern mining, structured pattern mining, correlation mining, associative classification, and frequent pattern-based clustering, as well as their broad applications. In this article, we provide a brief overview of the current status of frequent pattern mining and discuss a few promising research directions. We believe that frequent pattern mining research has substantially broadened the scope of data analysis and will have deep impact on data mining methodologies and applications in the long run. However, there are still some challenging research issues that need to be solved before frequent pattern mining can claim a cornerstone approach in data mining applications.

1,448 citations

ReportDOI
26 Jan 1998
TL;DR: An agent-based architecture for intrusion detection systems where the learning agents continuously compute and provide the updated (detection) models to the detection agents is proposed.
Abstract: In this paper we discuss our research in developing general and systematic methods for intrusion detection. The key ideas are to use data mining techniques to discover consistent and useful patterns of system features that describe program and user behavior, and use the set of relevant system features to compute (inductively learned) classifiers that can recognize anomalies and known intrusions. Using experiments on the sendmail system call data and the network tcpdump data, we demonstrate that we can construct concise and accurate classifiers to detect anomalies. We provide an overview on two general data mining algorithms that we have implemented: the association rules algorithm and the frequent episodes algorithm. These algorithms can be used to compute the intra-and inter-audit record patterns, which are essential in describing program or user behavior. The discovered patterns can guide the audit data gathering process and facilitate feature selection. To meet the challenges of both efficient learning (mining) and real-time detection, we propose an agent-based architecture for intrusion detection systems where the learning agents continuously compute and provide the updated (detection) models to the detection agents.

1,353 citations

Proceedings ArticleDOI
06 Nov 2002
TL;DR: This paper reports results of signal analysis of four classes of network traffic anomalies: outages, flash crowds, attacks and measurement failures, and shows that wavelet filters are quite effective at exposing the details of both ambient and anomalous traffic.
Abstract: Identifying anomalies rapidly and accurately is critical to the efficient operation of large computer networks. Accurately characterizing important classes of anomalies greatly facilitates their identification; however, the subtleties and complexities of anomalous traffic can easily confound this process. In this paper we report results of signal analysis of four classes of network traffic anomalies: outages, flash crowds, attacks and measurement failures. Data for this study consists of IP flow and SNMP measurements collected over a six month period at the border router of a large university. Our results show that wavelet filters are quite effective at exposing the details of both ambient and anomalous traffic. Specifically, we show that a pseudo-spline filter tuned at specific aggregation levels will expose distinct characteristics of each class of anomaly. We show that an effective way of exposing anomalies is via the detection of a sharp increase in the local variance of the filtered data. We evaluate traffic anomaly signals at different points within a network based on topological distance from the anomaly source or destination. We show that anomalies can be exposed effectively even when aggregated with a large amount of additional traffic. We also compare the difference between the same traffic anomaly signals as seen in SNMP and IP flow data, and show that the more coarse-grained SNMP data can also be used to expose anomalies effectively.

919 citations


"Multidimensional Data Mining for An..." refers background in this paper

  • ...Compared to these studies, we learn that intelligently combining multidimensional heavy-hitters with anomaly detection enables us to extract anomalous flows....

    [...]

Proceedings ArticleDOI
19 Oct 2005
TL;DR: In this paper, a behavior-based anomaly detection method that detects network anomalies by comparing the current network traffic against a baseline distribution is proposed, which provides a flexible and fast approach to estimate the baseline distribution.
Abstract: We develop a behavior-based anomaly detection method that detects network anomalies by comparing the current network traffic against a baseline distribution. The Maximum Entropy technique provides a flexible and fast approach to estimate the baseline distribution, which also gives the network administrator a multi-dimensional view of the network traffic. By computing a measure related to the relative entropy of the network traffic under observation with respect to the baseline distribution, we are able to distinguish anomalies that change the traffic either abruptly or slowly. In addition, our method provides information revealing the type of the anomaly detected. It requires a constant memory and a computation time proportional to the traffic rate.

379 citations