scispace - formally typeset
Open AccessProceedings ArticleDOI

A signal analysis of network traffic anomalies

Reads0
Chats0
TLDR
This paper reports results of signal analysis of four classes of network traffic anomalies: outages, flash crowds, attacks and measurement failures, and shows that wavelet filters are quite effective at exposing the details of both ambient and anomalous traffic.
Abstract
Identifying anomalies rapidly and accurately is critical to the efficient operation of large computer networks. Accurately characterizing important classes of anomalies greatly facilitates their identification; however, the subtleties and complexities of anomalous traffic can easily confound this process. In this paper we report results of signal analysis of four classes of network traffic anomalies: outages, flash crowds, attacks and measurement failures. Data for this study consists of IP flow and SNMP measurements collected over a six month period at the border router of a large university. Our results show that wavelet filters are quite effective at exposing the details of both ambient and anomalous traffic. Specifically, we show that a pseudo-spline filter tuned at specific aggregation levels will expose distinct characteristics of each class of anomaly. We show that an effective way of exposing anomalies is via the detection of a sharp increase in the local variance of the filtered data. We evaluate traffic anomaly signals at different points within a network based on topological distance from the anomaly source or destination. We show that anomalies can be exposed effectively even when aggregated with a large amount of additional traffic. We also compare the difference between the same traffic anomaly signals as seen in SNMP and IP flow data, and show that the more coarse-grained SNMP data can also be used to expose anomalies effectively.

read more

Content maybe subject to copyright    Report

A Signal
Analysis of Network Traffic Anomalies
Paul Barford, Jeffery Kline, David Plonka and Amos Ron
Abstract--Identifying
anomalies rapidly and accurately is critical to the
efficient operation of large computer networks. Accurately characterizing
important classes of anomalies greatly facilitates their identification; how-
ever, the subtleties and complexities of anomalous traffic can easily con-
found this process. In this paper we report results of signal analysis of four
classes of network traffic anomalies: outages, flash crowds, attacks and
measurement failures. Data for this study consists of IP flow and SNMP
measurements collected over a six month period at the border router of a
large university. Our results show that wavelet filters are quite effective at
exposing the details of both ambient and anomalous traffic. Specifically,
we show that a pseudo-spline filter tuned at specific aggregation levels will
expose distinct characteristics of each class of anomaly. We show that an
effective way of exposing anomalies is via the detection of a sharp increase
in the local variance of the filtered data. We evaluate traffic anomaly sig-
nals at different points within a network based on topological distance from
the anomaly source or destination. We show that anomalies can be exposed
effectively even when aggregated with a large amount of additional traffic.
We also compare the difference between the same traffic anomaly signals
as seen in SNMP and IP flow data, and show that the more coarse-grained
SNMP data can also be used to expose anomalies effectively.
I. INTRODUCTION
Traffic anomalies such as failures and attacks are common-
place in today's computer networks. Identifying, diagnosing and
treating anomalies in a timely fashion is a fundamental part of
day to day network operations. Without this kind of capability,
networks are not able operate efficiently or reliably. Accurate
identification and diagnosis of anomalies first depends on robust
and timely data, and second on establishedmethods for isolating
anomalous signals within that data.
Network operators principally use data from two sources to
isolate and identify traffic anomalies. The first is data available
from Simple Network Management Protocol (SNMP) queries to
network nodes. This Management Information Base (MIB) data
is quite broad, and mainly consists of counts of activity (such as
number of packets transmitted) on a node. The second type of
data available is from IP flow monitors. This data includes pro-
tocol level information about specific end-to-end packet flows
which make it more specific than SNMP data. The combination
of these types of data provides a reasonably solid measurement
foundation for anomaly identification.
Unfortunately, current best practices for identifying and di-
agnosing traffic anomalies are almost all ad hoc. These con-
sist mainly of visualizing traffic from different perspectives and
P. Barford and A. Ron are members of the Computer Sciences Department at
the University of Wisconsin, Madison. E-mail: pb,amos@cs.wisc.edu. J. Kline
is a member of the Mathematics Department at the University of Wisconsin,
Madison. E-mail: kline@math.wisc.edu. D. Plonka is a member of the Divi-
sion of Information Technology at University of Wisconsin, Madison. E-mail:
plonka@ doit.wisc.edu
Permission
to make digital or hard copies of all or part of this work
for
personal or classroom use is granted without fee provided that copies
are
not made or distributed for profit or commercial advantage and that
copies bear this notice and the full citation on the first page. To copy
otherwise, to republish, to post on servers or to redistribute to lists,
requires prior specific permission
and/or a
fee.
IMW'02,
Nov. 6-8, 2002, Marseille, France
Copyright 2002 ACM ISBN 1-58113-603-X/02/0011 ...$5.00
identifying anomalies from prior experience. While a variety
of tools have been developed to automatically generate alerts to
failures, it has generally been difficult to automate the anomaly
identification process. An important step in improving the capa-
bility of identifying anomalies is to isolate and characterize their
important features.
A road map for characterizing broad aspects of network traffic
was outlined in [ 1]. In this paper, we restrict our focus to one as-
pect of that work and report results of a detailed
signal analysis
of network traffic anomalies. Our analysis considers the time-
frequency characteristics of IP flow and SNMP data collected at
the border router of the University of Wisconsin-Madison over
a 6 month period. Included with these data is a catalog of 109
distinct traffic anomalies identified by the campus network en-
gineering group during the data collection period. This combi-
nation of data enabled us to focus our efforts on how to employ
filtering techniques to most effectively expose local frequency
details of anomalies.
To facilitate this work, we developed the Integrated Measure-
ment Analysis Platform for Internet Traffic (IMAPIT). IMAPIT
contains a data management system which supports and inte-
grates IP flow, SNMP and anomaly identification data. IMAPIT
also includes a robust signal analysis utility which enables the
network traffic data to be decomposed into its frequency com-
ponents using a number of wavelet and framelet systems. More
details of IMAPIT are given in Sections IV and V.
Initially, we analyzed a variety of traffic signals by applying
general wavelet filters to the data. Wavelets provide a power-
ful means for isolating characteristics of signals via a combined
time-frequency representation 1. We tested the wavelet analy-
sis by applying many different wavelet systems to traffic signals
to determine how to best expose the characteristics of anoma-
lies recorded therein. We accepted the constraint that our flow
and SNMP data was collected at five minute intervals, thereby
precluding analysis on finer timescales. Nevertheless, we were
able to select a wavelet system and develop algorithms that ef-
fectively expose the underlying features of both ambient and
anomalous traffic.
Not surprisingly, our analysis shows clear daily and weekly
traffic cycles. It is important to be able to expose these com-
ponents so that anomalous traffic can be effectively isolated.
Our analysis then focused on anomalies by separating them
into two groups based on their observed duration. The first
group consisted of flash crowd events which were the only
long-lived events in our data set - these typically span up to a
week. Flash crowd anomalies are effectively exposed using the
low frequency representation in our system. The second group
of anomalies are those that were short-lived and consisted of
network failures, attacks, and other events. These short-lived
1 Standard Fourier analysis only enables localization by frequency.
71

anomalies are more difficult to expose in data due to their sim-
ilarity to normal bursty network behavior. We found that these
signals could be effectively exposed by combining data from
the mid and high frequency levels. Our investigation of which
combinations of data best expose anomalies included compar-
ing Si.V/VlP to IP flow data, breaking down flow data by packet,
byte and flow metrics, and measuring variations in the packets'
average size.
One important test that we developed for exposing short-lived
event,; was based in computing the normalized local variance
of the mid and high frequency components of the signal. The
intuition for this approach is that the "local deviation" in the
high frequency representation exposes the beginning and end
of short-lived events and the local variability in the mid fre-
quency filters expose their duration. Large values of these local
variances indicates a sharp unpredictable change in the volume
of the measured quantity. Our Deviation Scoring method de-
scribed in Section IV is a first step at attempting to automate
anomaly detection (which is now typically done by visual in-
spection of time-series network traffic plots) through the use
of multi-resolution techniques. Employing this method on our
data over a number of weeks actually exposed a number of true
anomalies (verified post-mortem by network engineers) that had
not been cataloged previously.
While the majority of our work focused on identifying
anomalies in aggregate traffic at the campus border router, the
source and destination address in the IP flow data allows us to
isolate anomalies at different points in the network (by prun-
ing away traffic from various subnets). As you move closer
to the source of an anomaly, the event typically becomes more
pronounced in the data and thus easier to expose. However, if
the event takes place at a point in the network where there is
lower aggregation of traffic then there is typically more variabil-
ity in the ambient traffic and, as a result, the task of isolating
the anomaly signal becomes more difficult. We show that our
methods work well whether the measurement point is close to
or dist~mt from the point of the anomaly.
This paper is organized as follows. In Section III we describe
the data sets we use in this work. We also describe current best
practices employed by network operators for general anomaly
detection. In Section IV we describe our signal analysis meth-
ods and the IMAPIT framework. In Section V we present the
results of our analysis and discuss their implications. We evalu-
ate the performance of our anomaly detection method in Section
VI, and then summarize, conclude and discuss future work in
Section VII.
II. RELATED WORK
General properties of network packet traffic have been stud-
ies intensely for many years - standard references include [2],
[3], [4], [5]. Many different analysis techniques have been em-
ployed in these and other studies including wavelets in [6]. The
majority of these traffic analysis studies have been focused on
the typical, packet level and end-to-end behavior (a notable ex-
ception being [7]). Our focus is mainly at the flow level and
on identifying frequency characteristics of anomalous network
traffic.
There have been many prior studies of network fault detection
methods. Example include [8], [9], [10]. Feather et al. use sta-
tistical deviations from normal traffic behavior to identify faults
[ I 1] while a method of identifying faults by applying thresholds
in time series models of network traffic is developed in [12].
These studies focus on accurate detection of deviations from
normal behavior. Our work is focused on identifying anoma-
lies by removing first from the signal its predictable, ambient
part, and only then employing statistical methods. Wavelet are
used for the former task.
Detection of black-hat activity including denial-of-service
(DOS) attacks and port scan attacks has also been treated widely.
Methods for detecting intrusions include clustering [13], neural
networks [14] and Markov models [15]. Moore et al. show
that flow data can be effective for identifying DoS attacks [16].
A number of intrusion detection tools have been developed in
recent years in response to the rise in black-hat activity. An ex-
ample is Bro [ 17] which provides an extensible environment for
identifying intrusion and attack activity. Our work complements
this work by providing another means for identifying a variety
of anomalous behaviors including attacks.
We identify flash crowds as an important anomaly category.
The events of September 11, 2001 and the inability of most on-
line news services to deal with the offered demand is the most
extreme example of this kind of behavior. While infrastructure
such as content delivery networks (CDNs) have been developed
to mitigate the impact of flash crowds, almost no studies of their
characteristics exist. A recent study on flash crowds is by Jung
et al. in [18]. That work considers flash crowds (and DoS at-
tacks) from the perspective of Web servers logs whereas ours
is focused on network traffic. Finally, cooperative pushback is
proposed in [19] as a means for detection and control of events
such as flash crowds.
III. DATA
A. The Measurement Data
Our analysis is based on two types of network traffic data
types: SNMP data and IP flow data. The source of both was
a Juniper M10 router which handled all traffic that crossed the
University of Wisconsin-Madison campus network's border as
it was exchanged with the outside world. The campus network
consists primarily of four IPv4 class B networks or roughly
256,000 IP addresses of which fewer than half are utilized. The
campus has IP connectivity to the commodity Internet and to re-
search networks via about 15 discrete wide-area transit and peer-
ing links all of which terminate into the aforementioned router.
The SNMP data was gathered by MRTG [20] at a five minute
sampling interval which is commonly used by network opera-
tors. The SNMP data consists of the High Capacity interface
statistics defined by RFC2863 [21] which were polled using
SNMP version 2c. This analysis used the byte and packet coun-
ters for each direction of each wide-area link, specifically these
64-bit counters: ifHCInOctets, ifHCOutOctets, ifHCInUcastP-
kts, and ifHCOutUcastPkts.
The flow data was gathered using flow-tools [22] and was
post-processed using FlowScan [23]. The Juniper M10 router
was running JUNOS 5.0R1.4, and later JUNOS 5.2R1.4, and
was configured to perform "cflowd" flow export with a packet
72

sampling rate of 96. This caused 1 of 96 forwarded packets to
be sampled, and subsequently assembled into flow records simi-
lar to those defined by Cisco's NetFlow [24] version 5 with sim-
ilar packet-sampling-interval and 1 minute flow active-timeout.
The packet and byte counts computed from those flow records
were then multiplied by the sampling rate to approximate the
actual byte and packet rates. We have not attempted to formally
determine the accuracy of packet-sampling-based flow measure-
ments as compared with the SNMP measurements. However, it
is common to use such measurements in network operations.
Both the SNMP and flow data were post-processed to pro-
duce rate values and stored using the RRDTOOL [20] time-
series database. The archives were configured to retain values at
five minute granularity from September 25, 2001 through April
4, 2002. Internet service providers which bill customers based
upon 95th percentile peak interface usage have a similar reten-
tion policy for SNMP data. However, most other network oper-
ators retain the five-minute granularity data for only about two
days (50 hours for MRTG, by default), after which that data is
coalesced into averages over a increasingly longer time inter-
vals; typically 30 minute, 2 hour, and 24 hour averages. For the
campus, with approximately 600 IP subnets, this set of data re-
suited in a database of approximately 4GB in size. The collected
flow records were retained to validate the results of the analysis.
They were collected at five minute intervals resulting in about
60,000 compressed files of approximately 100GB in total com-
bined size.
Though uncommon, our analysis also considered the average
IP packet size, as computed from corresponding byte and packet
rates. Because many applications have typical packet sizes, of-
ten bimodal with respect to requests and responses or data and
acknowledgments, analysis of this metric occasionally exposes
application usage even when only SNMP-based byte and packet
interface rate statistics are available.
In parallel with the collection of the measurement data, a jour-
nal of known anomalies and network events was maintained.
The log entries in this joumal noted the event's date and time,
and a one-line characterization of the anomaly. Furthermore, a
simple nomenclature was used to label events as one of these
types:
Network: A network failure event or temporary misconfig-
uration resulting in a problem or outage. For instance: router
software spontaneously stopped advertising one of the campus
class B networks to campus BGP peers.
Attack: Typically a Denial-of-Service event, usually flood-
based. For instance: an outbound flood of 40-byte TCP packets
from a campus host that has had its security compromised and
is being remotely controlled by a malicious party.
Flash: A flash crowd [18] event. For instance: the increase in
outbound traffic from a campus £tp mirror server following a
release of RedHat Linux.
Measurement: An anomaly that we determined not to be
due to network infrastructure problems nor abusive network us-
age. For example: a campus host participating in TCP bulk
data transfer with a host at another campus as part of a research
project. Problems with the data collection infrastructure itself
were also categorized as "Measurement" anomalies. These in-
clude loss of flow data due to router overload or unreliable UDP
TABLE I
TYPES AND COUNTS OF NETWORK ANOMALY EVENTS IN THE TRAFFIC
DATABASE USED IN THIS STUDY.
non y y0e IJ Count I
Network 41
Attack 46
Flash Crowd 4
Measurement 18
Total 109
NetFlow transport to the collector.
In this way, a total of 168 events were identified and a subset
researched and tagged by the engineers operating the campus
network. Table I shows the distribution of types among the 109
tagged events. All flash crowd events occuring during the mea-
surement period were selected along with a sampling of anoma-
lies in the other three categories based on those that had the most
detailed description in the operator's journal. While the journal
did not record every traffic anomaly during the measurement pe-
riod, it acted as a unique road map for exploring the raw traffic
measurement data, and provided a basis for determining if the
anomalies could detected or characterized automatically.
B. Best Current Practice
Experienced network operators often employ effective, but ad
hoc, methods of problem determination and anomaly detection.
These techniques rely heavily on an operator's experience and
persistent personal attention.
Modern network management systems (NMS) software pro-
vides two common tools for handling SNMP data. The first, and
ostensibly most-used, is a graphing tool capable of continuously
collecting and plotting values from the MIBs. It is not uncom-
mon for a network operators to fill their workstations' screens
with plots of traffic as it passes through various network ele-
ments.
The second is an alarm tool, which periodically performs
tests on collected values and notifies operators accordingly.
Such tools are based on locally authored rules, perhaps aug-
mented by heuristics provided by the NMS vendor. These rules
are often rudimentary conditional threshold tests such as, "if
(router.interfacel.utilization > 50%) then notify". The result-
ing expert knowledge expressed in these rules is not necessarily
portable to any other network environment.
Tools for handling flow data are less mature. Freely-available
tools such as those employed in our work, have achieved a cer-
tain level of popularity among operators of enterprise and large
networks. These tools leverage existing SNMP experience by
converting detailed flow-export records into familiar time-series
data. Tabular data, compiled by either commercial and freely-
available tools, is occasionally used as well.
The major deficiency of these tools is the amount of expert
local knowledge and time required to setup and use them perva-
sively. For instance, we collected about 6,000 unique time-series
metrics from just a single network element, namely our campus
border router. This amount of data prohibits visual inspection
of graphs containing plots of all but a small subset of those met-
73

tics.
IV. METHODS
A. Vckvelet Analysis
A typical input of our analysis platform is a string of Internet
traffic: measurements. One of the basic principles of our method-
ology is the treatment of the measurement string as a
generic
signal,
ignoring, at least to a large degree, the semantics of sig-
nal (such as the content of the packet header), the instrumenta-
tion used (e.g. SNMP vs. FlowScan), the quantity which is being
measured (packet count, byte count, incoming traffic, outgoing
traffic:), or the actual subnet which is targeted. We do pay careful
attention to time aggregation of the measurement (one measure-
ment for each five minutes) in order to capture daily and weekly
patterns or inconsistencies. An approach like the above is impor-
tant since we would like to build a platform which is
portable,
and that can be
automated.
Any analysis tool that depends heav-
ily on the nature of a particular local subnet will almost surely
not be portable to other locations in the due to the heterogeneity
of Internet traffic.
The basic tool we employ is
wavelet analysis.
The wavelet
tool organizes the data into
strata,
a hierarchy of component
"signals", each of which maintains
time
as its independent vari-
able. The lower strata contain very sparse filtered information
that can be thought of as sophisticated aggregations of the orig-
inal data. We refer to that part of the representation as the
low-frequency
representation. In our algorithm, we derive one
such :representation,
i.e.
a single dataset that extracts the gen-
eral slow-varying trends of the original signal. In contrast, the
very high strata in the hierarchy capture fine-grained details of
the data, such as spontaneous variations. These are referred to
as the
high-frequency
strata.
Let: us review with a bit more detail the so-called "wavelet
processing". This processing is actually made of two comple-
mentary steps: the first is
analysis~decomposition
and the other
is its .inverse, the
reconstruction~synthesis
process.
Analysis:
The goal of the analysis process is to extract from
the original signal the aforementioned hierarchy of derived sig-
nals. This is done as an iterative process. The input for each
iteration is a signal x of length N. The output is a collection
of two or more derived signals, each of which is of length
N/2.
Each output signal is obtained by convolving x with an specially
desiguedfiher F
and then decimating every other coefficient of
that convolution product. We denote by
F(x) the
output sig-
nal so obtained. One of the special filters, denoted herein as L,
has a smoothing/averaging effect, and its corresponding optput
L(x)
is the
low-frequency
output. The other filters, H1,. •, Hr
(r _> 1) are best thought of as "discrete differentiation", and a
typical output Hi (x) should capture only the "fine-grained de-
tails",
i.e.
of the high-frequency content of the signal x. The
iterations proceed with the further decomposition of
L(x)',
cre-
ating the (shorter) signals L2(x),
H1L(X),..., HrL(x).
Con-
tinuing in this manner, we obtain a family of-output signals of
the form
HiLJ-l(x).
The index j counts the number of low-
pass filtering iterations applied to obtain the output signal: the
larger the value of j, the lower the derived signal is in our hierar-
chy. Indeed, we refer to
HiL j- 1 (x)
as belonging to the jth fre-
quency level, and consider a higher value of j to corresponding
to a lower frequency. If our original signal z consists of mea-
surements taken at five minute intervals, then the derived signal
HiL j-1 (x)
consists of data values that are 2 j × 5 minutes apart
one from the other. Thus, as j grows, the corresponding output
signal becomes shorter and records a smoother part of the sig-
nal. The values of the derived signals
HiL j-
1 (x) (i = 1,..., r,
j > 1) are known as the
wavelet coefficients.
For example, let us consider the case j = 6. At that level, the
derived signal HiL 6 (x) contains aggregated data values that are
26 x 5 = 320 minutes apart. At that aggregation level (if done
correctly) we should not anticipate seeing subtle variations that
evolve along, say, two or three hour duration; we will see, at
best, a very blurry time-stamp of such variations. On the other
hand, the coefficients at that level might capture well the varia-
tions between day and night traffic.
The
synthesis iterations
perform the inverse of the anal-
ysis: at each step the input signals for the iteration are
LJ(x), H1LJ-i(x),..., HrLJ-I(x),
and the output is the sig-
nal L j-1 (x). This is exactly the inverse of the jth iteration of
the analysis algorithm. By employing that step sufficiently many
times, one recaptures the original signal.
One possible way of using the wavelet process is in
"detection-only" mode. In this mode, one examines the vari-
ous derived signals of the decomposition, and tries to infer from
them information about the original signal.
Wavelet-based algorithms are
usually more sophisticated and
attempt to assemble a new signal from the various pieces in the
decomposition. This is done by altering some of the values of
some of the derived signals of the decomposition step and then
applying reconstruction. The general idea is to suppress all the
values that carry information that we would like to ignore. For
example, if we wish only to view the fine-grained spontaneous
changes in the data, we will apply a
threshold
to the entries in
all the low-frequency levels,
i.e.
replace them by zeros.
The above description falls short of resulting in a well-defined
algorithm. For example, suppose that we would like to suppress
the day/night variation in the traffic. We mentioned before that
such variations appear in frequency level 6 (and definitely in
lower levels as well). But, perhaps that are also recorded in
the derived signal at level 5? It turns out that there is no sim-
ple answer here. The wavelet tranform we describe here is one
of many possible wavelet systems, each of which might pro-
vide a unique decomposition of the data. Unfortunately, choos-
ing among the subtle details of each wavelet transform often
requires an expert understanding of the performance of those
wavelet decompositions. Their ultimate success depends on se-
lecting a wavelet transform that suits the given application.
lime frequency-localization: approximation orders and van-
ishing moments.
In a highly qualitative description, the selection
of the wavelet transform should be based on a careful balance
between it's
time localization
characteristics, and its
frequency
localization
characteristics.
Time localization
is a relatively simple notion that is primar-
ily measured by the length of the filters that are employed in the
transform. Long filters lead to excessive blurring in the time do-
main. For example, the use of long filters denies us the ability to
easily distinguish between a very strong short-duration change
74

in traffic volume, as opposed to a milder change of longer du-
ration. Since, for anomaly detection, the ability to answer ac-
curately the question "when?" is critical, we chose a wavelet
system for very short filters.
Frequency localization.
In our context, there are two charac-
teristics of the wavelet system that may be regarded as belonging
to this class.
One way to measure frequency localization is by measunng
the number of
vanishing moments
that the analysis filters Hi
possess. We say that the filter Hi has k vanishing moments if
H(0) = H'(0) ..... _ff/(k-1)(0) = 0 where H is the Fourier
series of H. In every wavelet system, every filter
Hi
has at least
one vanishing moment. Filters with a low number (usually one
or two) of vanishing moments may lead to the appearance of
large wavelet coefficients at times when no significant event is
occurring thus resulting in an increasing in the number of false
positive alerts. In order to create a wavelet transform with high
number of vanishing moments, one needs to select longer filters.
Another closely related way to measure frequency localiza-
tion is via the
approximation order
of the system. We forgo
explaining the details of this notion and mention only that the
decision to measure frequency localization either via vanishing
moments or by approximation order depends primarily on the
objective and the nature of algorithm that is employed.
The last issue in this context is the
artifact freeness
of the
transform. For many wavelet systems, the reconstructed (mod-
ified) signal shows "features" that have nothing to do with
the original signal, and are artifacts of the of the filters used.
Wavelet filters that are reasonably short and do not create
such undesired artifacts are quite rare; thus, our need for good
time localization together with our insistence on an artifact-free
wavelet system narrowed the search for the "optimal" system in
a substantial way.
The wavelet system we employ:
We use a bi-frame version
of a system known as PS(4,1)Type II (cf. [25]). This is a
framelet
system,
i.e.
a redundant wavelet system (which es-
sentially means that r, the number of high-pass filters, is larger
than 1; a simple count shows that, if r > 1, the total number of
wavelet coefficients exceeds the length of the original signal). In
our work, the redundancy itself is not considered a virtue. How-
ever, the redundancy provides us with added flexibility: it allows
us to construct relatively short filters with very good frequency
localization.
In our chosen system, there is one low-pass filter L and three
high-pass filters H1, H2, Ha. The analysis filters are all 7-tap
(i.e. each have 7 non-zero coefficients), while the synthesis fil-
ters are all 5-tap. The vanishing moments of the high-pass anal-
ysis filters are 2, 3, 4, respectively, while the approximation or-
der of the system is 42. The "artifact freeness" of our system is
guaranteed since our low-pass filters deviate only mildly from
spline-filters (that perform pure multiple averages, and are the
ideal artifact-free low-pass filters).
The analysis platform.
We derive from a given signal x (that
20ur initial assumption was that the Intemet traffic is not smooth, and there
might not be enough gain in using a system of approximation order 4 (had we
switched to a system with approximation order 2, we could have used shorter
filters). However, comparisons between the performance of the PS(4,1) Type II
to the system RS4 (whose filters are all 5-tap, but whose approximation order is
only 2), yielded a signifi cant difference in performance.
represents five-minute average measurements) three output sig-
nals, as follows. The description here fits a signal that has been
measured for two months. Slightly different rules were em-
ployed for shorter duration signals
(e.g.
a signal measured for a
week).
The L(ow frequency)-part of the signal, obtained by synthe-
sizing all the low-frequency wavelet coefficients from levels 9
and up. The L-part of the signal should capture patterns and
anomalies of very long duration: several days and up. The sig-
nal here is very sparse (its number of data elements is approxi-
mately 0.4% of those in the original signal), and captures weekly
patterns in the data quite well. For many different types of In-
ternet data, the L-part of the signal reveals a very high degree
of regularity and consistency in the traffic, hence can reliably
capture anomalies of long duration (albeit it may blur various
characteristics of the abnormal behavior of the traffic.)
The M(id frequency)-part of the signal, obtained by synthesiz-
ing the wavelets coefficients from frequency levels 6, 7, 8. The
signal here has zero-mean, and is supposed to capture mainly
the daily variations in the data. Its data elements number about
3% of those in the original signal.
The H(igh frequency)-part of the signal is obtained by thresh-
olding the wavelet coefficients in the first 5 frequency levels,
i.e.
setting to zero all coefficients whose absolute value falls below a
chosen threshold (and setting to zero all the coefficients in level
6 and up). The need for thresholding stems from the fact that
most of the data in the H-part consists of small short-term varia-
tions, variations that we think of as "noise" and do not aid us in
our anomaly detection objective.
We close this section with two technical comments: while the
theory of thresholding redundant representations is still in rudi-
mentary form, it is evident to us that we should vary the thresh-
olding level according to the number of vanishing moments in
the filter (decreasing the threshold for the filter with high van-
ishing moments.) We have not yet implemented this technique.
Finally, due to its high approximation order, our system cannot
capture accurately sharp discontinuities in the data.
Detection of anomalies. While it is unlikely that a single
method for detecting anomalies will be ever found 3, we have
taken a first step at developing an automated method for identi-
fying irregularities in the measured data. Our algorithm, which
we call a
deviation score,
has the following ingredients:
1. Normalize the H- and M-parts to have variance one. Com-
pute the local variability of the (normalized) H- and M-parts
by computing the variance of the data falling within a moving
window of specified size. The length of this moving window
should depend on the duration of the anomalies that we wish to
captured. If we denote the duration of the anomaly by to and the
time length of the window for the local deviation by t 1, we need,
in the ideal situation, to have q :=
to/t1
~ 1. If the quotient q is
too small, the anomaly may be blurred and lost. If the quotient
is too large, we may be overwhelmed by "anomalies" that are
of very little interest to the network operators. Our current ex-
periment focuses on anomalies of duration 1-4 hours, and uses
a moving 3-hour local deviation window. Shorter anomalies of
3After all,
there is not a single deft nition of '~nomaly". Should we consider
any change in the measured data to be an anomaly or only those that correspond
to an identifi able change in network state?
75

Citations
More filters
Journal ArticleDOI

A taxonomy of DDoS attack and DDoS defense mechanisms

TL;DR: This paper presents two taxonomies for classifying attacks and defenses in distributed denial-of-service (DDoS) and provides researchers with a better understanding of the problem and the current solution space.
Proceedings ArticleDOI

Mining anomalies using traffic feature distributions

TL;DR: It is argued that the distributions of packet features observed in flow traces reveals both the presence and the structure of a wide range of anomalies, and that using feature distributions, anomalies naturally fall into distinct and meaningful clusters that can be used to automatically classify anomalies and to uncover new anomaly types.
Proceedings ArticleDOI

Diagnosing network-wide traffic anomalies

TL;DR: A general method based on a separation of the high-dimensional space occupied by a set of network traffic measurements into disjoint subspaces corresponding to normal and anomalous network conditions to diagnose anomalies is proposed.
Proceedings ArticleDOI

Accurate, scalable in-network identification of p2p traffic using application signatures

TL;DR: In this article, the authors identify the application level signatures by examining some available documentations, and packet-level traces, and then utilize the identified signatures to develop online filters that can efficiently and accurately track the P2P traffic even on high-speed network links.
Proceedings ArticleDOI

Structural analysis of network traffic flows

TL;DR: This work presents the first analysis of complete sets of OD flow time-series, taken from two different backbone networks (Abilene and Sprint-Europe) and finds that the set of OD flows has small intrinsic dimension, and shows how to use PCA to systematically decompose the structure ofOD flow timeseries into three main constituents.
References
More filters
Journal ArticleDOI

Introduction to Time Series and Forecasting.

Peter J. Brockwell, +1 more
- 01 Sep 1998 - 
TL;DR: A general approach to Time Series Modelling and ModeLLing with ARMA Processes, which describes the development of a Stationary Process in Terms of Infinitely Many Past Values and the Autocorrelation Function.
Book

Introduction to time series and forecasting

TL;DR: In this paper, the authors present a general approach to time series analysis based on simple time series models and the Autocorrelation Function (AFF) and the Wold Decomposition.
Proceedings Article

Bro: a system for detecting network intruders in real-time

TL;DR: Bro as mentioned in this paper is a stand-alone system for detecting network intruders in real-time by passively monitoring a network link over which the intruder's traffic transits, which emphasizes high-speed (FDDI-rate) monitoring, realtime notification, clear separation between mechanism and policy and extensibility.
Journal ArticleDOI

Bro: a system for detecting network intruders in real-time

TL;DR: An overview of the Bro system's design, which emphasizes high-speed (FDDI-rate) monitoring, real-time notification, clear separation between mechanism and policy, and extensibility, is given.
Journal ArticleDOI

Self-similarity through high-variability: statistical analysis of Ethernet LAN traffic at the source level

TL;DR: In this article, the authors provide a plausible physical explanation for the occurrence of self-similarity in local-area network (LAN) traffic, based on convergence results for processes that exhibit high variability and is supported by detailed statistical analyzes of real-time traffic measurements from Ethernet LANs at the level of individual sources.