A signal analysis of network traffic anomalies

doi:10.1145/637201.637210

A Signal

Analysis of Network Traffic Anomalies

Paul Barford, Jeffery Kline, David Plonka and Amos Ron

Abstract--Identifying

anomalies rapidly and accurately is critical to the

efficient operation of large computer networks. Accurately characterizing

important classes of anomalies greatly facilitates their identification; how-

ever, the subtleties and complexities of anomalous traffic can easily con-

found this process. In this paper we report results of signal analysis of four

classes of network traffic anomalies: outages, flash crowds, attacks and

measurement failures. Data for this study consists of IP flow and SNMP

measurements collected over a six month period at the border router of a

large university. Our results show that wavelet filters are quite effective at

exposing the details of both ambient and anomalous traffic. Specifically,

we show that a pseudo-spline filter tuned at specific aggregation levels will

expose distinct characteristics of each class of anomaly. We show that an

effective way of exposing anomalies is via the detection of a sharp increase

in the local variance of the filtered data. We evaluate traffic anomaly sig-

nals at different points within a network based on topological distance from

the anomaly source or destination. We show that anomalies can be exposed

effectively even when aggregated with a large amount of additional traffic.

We also compare the difference between the same traffic anomaly signals

as seen in SNMP and IP flow data, and show that the more coarse-grained

SNMP data can also be used to expose anomalies effectively.

I. INTRODUCTION

Traffic anomalies such as failures and attacks are common-

place in today's computer networks. Identifying, diagnosing and

treating anomalies in a timely fashion is a fundamental part of

day to day network operations. Without this kind of capability,

networks are not able operate efficiently or reliably. Accurate

identification and diagnosis of anomalies first depends on robust

and timely data, and second on establishedmethods for isolating

anomalous signals within that data.

Network operators principally use data from two sources to

isolate and identify traffic anomalies. The first is data available

from Simple Network Management Protocol (SNMP) queries to

network nodes. This Management Information Base (MIB) data

is quite broad, and mainly consists of counts of activity (such as

number of packets transmitted) on a node. The second type of

data available is from IP flow monitors. This data includes pro-

tocol level information about specific end-to-end packet flows

which make it more specific than SNMP data. The combination

of these types of data provides a reasonably solid measurement

foundation for anomaly identification.

Unfortunately, current best practices for identifying and di-

agnosing traffic anomalies are almost all ad hoc. These con-

sist mainly of visualizing traffic from different perspectives and

P. Barford and A. Ron are members of the Computer Sciences Department at

the University of Wisconsin, Madison. E-mail: pb,amos@cs.wisc.edu. J. Kline

is a member of the Mathematics Department at the University of Wisconsin,

Madison. E-mail: kline@math.wisc.edu. D. Plonka is a member of the Divi-

sion of Information Technology at University of Wisconsin, Madison. E-mail:

plonka@ doit.wisc.edu

Permission

to make digital or hard copies of all or part of this work

for

personal or classroom use is granted without fee provided that copies

are

not made or distributed for profit or commercial advantage and that

copies bear this notice and the full citation on the first page. To copy

otherwise, to republish, to post on servers or to redistribute to lists,

requires prior specific permission

and/or a

fee.

IMW'02,

Nov. 6-8, 2002, Marseille, France

identifying anomalies from prior experience. While a variety

of tools have been developed to automatically generate alerts to

failures, it has generally been difficult to automate the anomaly

identification process. An important step in improving the capa-

bility of identifying anomalies is to isolate and characterize their

important features.

A road map for characterizing broad aspects of network traffic

was outlined in [ 1]. In this paper, we restrict our focus to one as-

pect of that work and report results of a detailed

signal analysis

of network traffic anomalies. Our analysis considers the time-

frequency characteristics of IP flow and SNMP data collected at

the border router of the University of Wisconsin-Madison over

a 6 month period. Included with these data is a catalog of 109

distinct traffic anomalies identified by the campus network en-

gineering group during the data collection period. This combi-

nation of data enabled us to focus our efforts on how to employ

filtering techniques to most effectively expose local frequency

details of anomalies.

To facilitate this work, we developed the Integrated Measure-

ment Analysis Platform for Internet Traffic (IMAPIT). IMAPIT

contains a data management system which supports and inte-

grates IP flow, SNMP and anomaly identification data. IMAPIT

also includes a robust signal analysis utility which enables the

network traffic data to be decomposed into its frequency com-

ponents using a number of wavelet and framelet systems. More

details of IMAPIT are given in Sections IV and V.

Initially, we analyzed a variety of traffic signals by applying

general wavelet filters to the data. Wavelets provide a power-

ful means for isolating characteristics of signals via a combined

time-frequency representation 1. We tested the wavelet analy-

sis by applying many different wavelet systems to traffic signals

to determine how to best expose the characteristics of anoma-

lies recorded therein. We accepted the constraint that our flow

and SNMP data was collected at five minute intervals, thereby

precluding analysis on finer timescales. Nevertheless, we were

able to select a wavelet system and develop algorithms that ef-

fectively expose the underlying features of both ambient and

anomalous traffic.

Not surprisingly, our analysis shows clear daily and weekly

traffic cycles. It is important to be able to expose these com-

ponents so that anomalous traffic can be effectively isolated.

Our analysis then focused on anomalies by separating them

into two groups based on their observed duration. The first

group consisted of flash crowd events which were the only

long-lived events in our data set - these typically span up to a

week. Flash crowd anomalies are effectively exposed using the

low frequency representation in our system. The second group

of anomalies are those that were short-lived and consisted of

network failures, attacks, and other events. These short-lived

1 Standard Fourier analysis only enables localization by frequency.

71

anomalies are more difficult to expose in data due to their sim-

ilarity to normal bursty network behavior. We found that these

signals could be effectively exposed by combining data from

the mid and high frequency levels. Our investigation of which

combinations of data best expose anomalies included compar-

ing Si.V/VlP to IP flow data, breaking down flow data by packet,

byte and flow metrics, and measuring variations in the packets'

average size.

One important test that we developed for exposing short-lived

event,; was based in computing the normalized local variance

of the mid and high frequency components of the signal. The

intuition for this approach is that the "local deviation" in the

high frequency representation exposes the beginning and end

of short-lived events and the local variability in the mid fre-

quency filters expose their duration. Large values of these local

variances indicates a sharp unpredictable change in the volume

of the measured quantity. Our Deviation Scoring method de-

scribed in Section IV is a first step at attempting to automate

anomaly detection (which is now typically done by visual in-

spection of time-series network traffic plots) through the use

of multi-resolution techniques. Employing this method on our

data over a number of weeks actually exposed a number of true

anomalies (verified post-mortem by network engineers) that had

not been cataloged previously.

While the majority of our work focused on identifying

anomalies in aggregate traffic at the campus border router, the

source and destination address in the IP flow data allows us to

isolate anomalies at different points in the network (by prun-

ing away traffic from various subnets). As you move closer

to the source of an anomaly, the event typically becomes more

pronounced in the data and thus easier to expose. However, if

the event takes place at a point in the network where there is

lower aggregation of traffic then there is typically more variabil-

ity in the ambient traffic and, as a result, the task of isolating

the anomaly signal becomes more difficult. We show that our

methods work well whether the measurement point is close to

or dist~mt from the point of the anomaly.

This paper is organized as follows. In Section III we describe

the data sets we use in this work. We also describe current best

practices employed by network operators for general anomaly

detection. In Section IV we describe our signal analysis meth-

ods and the IMAPIT framework. In Section V we present the

results of our analysis and discuss their implications. We evalu-

ate the performance of our anomaly detection method in Section

VI, and then summarize, conclude and discuss future work in

Section VII.

II. RELATED WORK

General properties of network packet traffic have been stud-

ies intensely for many years - standard references include [2],

[3], [4], [5]. Many different analysis techniques have been em-

ployed in these and other studies including wavelets in [6]. The

majority of these traffic analysis studies have been focused on

the typical, packet level and end-to-end behavior (a notable ex-

ception being [7]). Our focus is mainly at the flow level and

on identifying frequency characteristics of anomalous network

traffic.

There have been many prior studies of network fault detection

methods. Example include [8], [9], [10]. Feather et al. use sta-

tistical deviations from normal traffic behavior to identify faults

[ I 1] while a method of identifying faults by applying thresholds

in time series models of network traffic is developed in [12].

These studies focus on accurate detection of deviations from

normal behavior. Our work is focused on identifying anoma-

lies by removing first from the signal its predictable, ambient

part, and only then employing statistical methods. Wavelet are

used for the former task.

Detection of black-hat activity including denial-of-service

(DOS) attacks and port scan attacks has also been treated widely.

Methods for detecting intrusions include clustering [13], neural

networks [14] and Markov models [15]. Moore et al. show

that flow data can be effective for identifying DoS attacks [16].

A number of intrusion detection tools have been developed in

recent years in response to the rise in black-hat activity. An ex-

ample is Bro [ 17] which provides an extensible environment for

identifying intrusion and attack activity. Our work complements

this work by providing another means for identifying a variety

of anomalous behaviors including attacks.

We identify flash crowds as an important anomaly category.

The events of September 11, 2001 and the inability of most on-

line news services to deal with the offered demand is the most

extreme example of this kind of behavior. While infrastructure

such as content delivery networks (CDNs) have been developed

to mitigate the impact of flash crowds, almost no studies of their

characteristics exist. A recent study on flash crowds is by Jung

et al. in [18]. That work considers flash crowds (and DoS at-

tacks) from the perspective of Web servers logs whereas ours

is focused on network traffic. Finally, cooperative pushback is

proposed in [19] as a means for detection and control of events

such as flash crowds.

III. DATA

A. The Measurement Data

Our analysis is based on two types of network traffic data

types: SNMP data and IP flow data. The source of both was

a Juniper M10 router which handled all traffic that crossed the

University of Wisconsin-Madison campus network's border as

it was exchanged with the outside world. The campus network

consists primarily of four IPv4 class B networks or roughly

256,000 IP addresses of which fewer than half are utilized. The

campus has IP connectivity to the commodity Internet and to re-

search networks via about 15 discrete wide-area transit and peer-

ing links all of which terminate into the aforementioned router.

The SNMP data was gathered by MRTG [20] at a five minute

sampling interval which is commonly used by network opera-

tors. The SNMP data consists of the High Capacity interface

statistics defined by RFC2863 [21] which were polled using

SNMP version 2c. This analysis used the byte and packet coun-

ters for each direction of each wide-area link, specifically these

64-bit counters: ifHCInOctets, ifHCOutOctets, ifHCInUcastP-

kts, and ifHCOutUcastPkts.

The flow data was gathered using flow-tools [22] and was

post-processed using FlowScan [23]. The Juniper M10 router

was running JUNOS 5.0R1.4, and later JUNOS 5.2R1.4, and

was configured to perform "cflowd" flow export with a packet

72

sampling rate of 96. This caused 1 of 96 forwarded packets to

be sampled, and subsequently assembled into flow records simi-

lar to those defined by Cisco's NetFlow [24] version 5 with sim-

ilar packet-sampling-interval and 1 minute flow active-timeout.

The packet and byte counts computed from those flow records

were then multiplied by the sampling rate to approximate the

actual byte and packet rates. We have not attempted to formally

determine the accuracy of packet-sampling-based flow measure-

ments as compared with the SNMP measurements. However, it

is common to use such measurements in network operations.

Both the SNMP and flow data were post-processed to pro-

duce rate values and stored using the RRDTOOL [20] time-

series database. The archives were configured to retain values at

five minute granularity from September 25, 2001 through April

4, 2002. Internet service providers which bill customers based

upon 95th percentile peak interface usage have a similar reten-

tion policy for SNMP data. However, most other network oper-

ators retain the five-minute granularity data for only about two

days (50 hours for MRTG, by default), after which that data is

coalesced into averages over a increasingly longer time inter-

vals; typically 30 minute, 2 hour, and 24 hour averages. For the

campus, with approximately 600 IP subnets, this set of data re-

suited in a database of approximately 4GB in size. The collected

flow records were retained to validate the results of the analysis.

They were collected at five minute intervals resulting in about

60,000 compressed files of approximately 100GB in total com-

bined size.

Though uncommon, our analysis also considered the average

IP packet size, as computed from corresponding byte and packet

rates. Because many applications have typical packet sizes, of-

ten bimodal with respect to requests and responses or data and

acknowledgments, analysis of this metric occasionally exposes

application usage even when only SNMP-based byte and packet

interface rate statistics are available.

In parallel with the collection of the measurement data, a jour-

nal of known anomalies and network events was maintained.

The log entries in this joumal noted the event's date and time,

and a one-line characterization of the anomaly. Furthermore, a

simple nomenclature was used to label events as one of these

types:

• Network: A network failure event or temporary misconfig-

uration resulting in a problem or outage. For instance: router

software spontaneously stopped advertising one of the campus

class B networks to campus BGP peers.

• Attack: Typically a Denial-of-Service event, usually flood-

based. For instance: an outbound flood of 40-byte TCP packets

from a campus host that has had its security compromised and

is being remotely controlled by a malicious party.

• Flash: A flash crowd [18] event. For instance: the increase in

outbound traffic from a campus £tp mirror server following a

release of RedHat Linux.

• Measurement: An anomaly that we determined not to be

due to network infrastructure problems nor abusive network us-

age. For example: a campus host participating in TCP bulk

data transfer with a host at another campus as part of a research

project. Problems with the data collection infrastructure itself

were also categorized as "Measurement" anomalies. These in-

clude loss of flow data due to router overload or unreliable UDP

TABLE I

TYPES AND COUNTS OF NETWORK ANOMALY EVENTS IN THE TRAFFIC

DATABASE USED IN THIS STUDY.

non y y0e IJ Count I

Network 41

Attack 46

Flash Crowd 4

Measurement 18

Total 109

NetFlow transport to the collector.

In this way, a total of 168 events were identified and a subset

researched and tagged by the engineers operating the campus

network. Table I shows the distribution of types among the 109

tagged events. All flash crowd events occuring during the mea-

surement period were selected along with a sampling of anoma-

lies in the other three categories based on those that had the most

detailed description in the operator's journal. While the journal

did not record every traffic anomaly during the measurement pe-

riod, it acted as a unique road map for exploring the raw traffic

measurement data, and provided a basis for determining if the

anomalies could detected or characterized automatically.

B. Best Current Practice

Experienced network operators often employ effective, but ad

hoc, methods of problem determination and anomaly detection.

These techniques rely heavily on an operator's experience and

persistent personal attention.

Modern network management systems (NMS) software pro-

vides two common tools for handling SNMP data. The first, and

ostensibly most-used, is a graphing tool capable of continuously

collecting and plotting values from the MIBs. It is not uncom-

mon for a network operators to fill their workstations' screens

with plots of traffic as it passes through various network ele-

ments.

The second is an alarm tool, which periodically performs

tests on collected values and notifies operators accordingly.

Such tools are based on locally authored rules, perhaps aug-

mented by heuristics provided by the NMS vendor. These rules

are often rudimentary conditional threshold tests such as, "if

(router.interfacel.utilization > 50%) then notify". The result-

ing expert knowledge expressed in these rules is not necessarily

portable to any other network environment.

Tools for handling flow data are less mature. Freely-available

tools such as those employed in our work, have achieved a cer-

tain level of popularity among operators of enterprise and large

networks. These tools leverage existing SNMP experience by

converting detailed flow-export records into familiar time-series

data. Tabular data, compiled by either commercial and freely-

available tools, is occasionally used as well.

The major deficiency of these tools is the amount of expert

local knowledge and time required to setup and use them perva-

sively. For instance, we collected about 6,000 unique time-series

metrics from just a single network element, namely our campus

border router. This amount of data prohibits visual inspection

of graphs containing plots of all but a small subset of those met-

73

tics.

IV. METHODS

A. Vckvelet Analysis

A typical input of our analysis platform is a string of Internet

traffic: measurements. One of the basic principles of our method-

ology is the treatment of the measurement string as a

generic

signal,

ignoring, at least to a large degree, the semantics of sig-

nal (such as the content of the packet header), the instrumenta-

tion used (e.g. SNMP vs. FlowScan), the quantity which is being

measured (packet count, byte count, incoming traffic, outgoing

traffic:), or the actual subnet which is targeted. We do pay careful

attention to time aggregation of the measurement (one measure-

ment for each five minutes) in order to capture daily and weekly

patterns or inconsistencies. An approach like the above is impor-

tant since we would like to build a platform which is

portable,

and that can be

automated.

Any analysis tool that depends heav-

ily on the nature of a particular local subnet will almost surely

not be portable to other locations in the due to the heterogeneity

of Internet traffic.

The basic tool we employ is

wavelet analysis.

The wavelet

tool organizes the data into

strata,

a hierarchy of component

"signals", each of which maintains

time

as its independent vari-

able. The lower strata contain very sparse filtered information

that can be thought of as sophisticated aggregations of the orig-

inal data. We refer to that part of the representation as the

low-frequency

representation. In our algorithm, we derive one

such :representation,

i.e.

a single dataset that extracts the gen-

eral slow-varying trends of the original signal. In contrast, the

very high strata in the hierarchy capture fine-grained details of

the data, such as spontaneous variations. These are referred to

as the

high-frequency

strata.

Let: us review with a bit more detail the so-called "wavelet

processing". This processing is actually made of two comple-

mentary steps: the first is

analysis~decomposition

and the other

is its .inverse, the

reconstruction~synthesis

process.

Analysis:

The goal of the analysis process is to extract from

the original signal the aforementioned hierarchy of derived sig-

nals. This is done as an iterative process. The input for each

iteration is a signal x of length N. The output is a collection

of two or more derived signals, each of which is of length

N/2.

Each output signal is obtained by convolving x with an specially

desiguedfiher F

and then decimating every other coefficient of

that convolution product. We denote by

F(x) the

output sig-

nal so obtained. One of the special filters, denoted herein as L,

has a smoothing/averaging effect, and its corresponding optput

L(x)

is the

low-frequency

output. The other filters, H1,. • •, Hr

(r _> 1) are best thought of as "discrete differentiation", and a

typical output Hi (x) should capture only the "fine-grained de-

tails",

i.e.

of the high-frequency content of the signal x. The

iterations proceed with the further decomposition of

L(x)',

cre-

ating the (shorter) signals L2(x),

H1L(X),..., HrL(x).

Con-

tinuing in this manner, we obtain a family of-output signals of

the form

HiLJ-l(x).

The index j counts the number of low-

pass filtering iterations applied to obtain the output signal: the

larger the value of j, the lower the derived signal is in our hierar-

chy. Indeed, we refer to

HiL j- 1 (x)

as belonging to the jth fre-

quency level, and consider a higher value of j to corresponding

to a lower frequency. If our original signal z consists of mea-

surements taken at five minute intervals, then the derived signal

HiL j-1 (x)

consists of data values that are 2 j × 5 minutes apart

one from the other. Thus, as j grows, the corresponding output

signal becomes shorter and records a smoother part of the sig-

nal. The values of the derived signals

HiL j-

1 (x) (i = 1,..., r,

j > 1) are known as the

wavelet coefficients.

For example, let us consider the case j = 6. At that level, the

derived signal HiL 6 (x) contains aggregated data values that are

26 x 5 = 320 minutes apart. At that aggregation level (if done

correctly) we should not anticipate seeing subtle variations that

evolve along, say, two or three hour duration; we will see, at

best, a very blurry time-stamp of such variations. On the other

hand, the coefficients at that level might capture well the varia-

tions between day and night traffic.

The

synthesis iterations

perform the inverse of the anal-

ysis: at each step the input signals for the iteration are

LJ(x), H1LJ-i(x),..., HrLJ-I(x),

and the output is the sig-

nal L j-1 (x). This is exactly the inverse of the jth iteration of

the analysis algorithm. By employing that step sufficiently many

times, one recaptures the original signal.

One possible way of using the wavelet process is in

"detection-only" mode. In this mode, one examines the vari-

ous derived signals of the decomposition, and tries to infer from

them information about the original signal.

Wavelet-based algorithms are

usually more sophisticated and

attempt to assemble a new signal from the various pieces in the

decomposition. This is done by altering some of the values of

some of the derived signals of the decomposition step and then

applying reconstruction. The general idea is to suppress all the

values that carry information that we would like to ignore. For

example, if we wish only to view the fine-grained spontaneous

changes in the data, we will apply a

threshold

to the entries in

all the low-frequency levels,

i.e.

replace them by zeros.

The above description falls short of resulting in a well-defined

algorithm. For example, suppose that we would like to suppress

the day/night variation in the traffic. We mentioned before that

such variations appear in frequency level 6 (and definitely in

lower levels as well). But, perhaps that are also recorded in

the derived signal at level 5? It turns out that there is no sim-

ple answer here. The wavelet tranform we describe here is one

of many possible wavelet systems, each of which might pro-

vide a unique decomposition of the data. Unfortunately, choos-

ing among the subtle details of each wavelet transform often

requires an expert understanding of the performance of those

wavelet decompositions. Their ultimate success depends on se-

lecting a wavelet transform that suits the given application.

lime frequency-localization: approximation orders and van-

ishing moments.

In a highly qualitative description, the selection

of the wavelet transform should be based on a careful balance

between it's

time localization

characteristics, and its

frequency

localization

characteristics.

Time localization

is a relatively simple notion that is primar-

ily measured by the length of the filters that are employed in the

transform. Long filters lead to excessive blurring in the time do-

main. For example, the use of long filters denies us the ability to

easily distinguish between a very strong short-duration change

74

in traffic volume, as opposed to a milder change of longer du-

ration. Since, for anomaly detection, the ability to answer ac-

curately the question "when?" is critical, we chose a wavelet

system for very short filters.

Frequency localization.

In our context, there are two charac-

teristics of the wavelet system that may be regarded as belonging

to this class.

One way to measure frequency localization is by measunng

the number of

vanishing moments

that the analysis filters Hi

possess. We say that the filter Hi has k vanishing moments if

H(0) = H'(0) ..... _ff/(k-1)(0) = 0 where H is the Fourier

series of H. In every wavelet system, every filter

Hi

has at least

one vanishing moment. Filters with a low number (usually one

or two) of vanishing moments may lead to the appearance of

large wavelet coefficients at times when no significant event is

occurring thus resulting in an increasing in the number of false

positive alerts. In order to create a wavelet transform with high

number of vanishing moments, one needs to select longer filters.

Another closely related way to measure frequency localiza-

tion is via the

approximation order

of the system. We forgo

explaining the details of this notion and mention only that the

decision to measure frequency localization either via vanishing

moments or by approximation order depends primarily on the

objective and the nature of algorithm that is employed.

The last issue in this context is the

artifact freeness

of the

transform. For many wavelet systems, the reconstructed (mod-

ified) signal shows "features" that have nothing to do with

the original signal, and are artifacts of the of the filters used.

Wavelet filters that are reasonably short and do not create

such undesired artifacts are quite rare; thus, our need for good

time localization together with our insistence on an artifact-free

wavelet system narrowed the search for the "optimal" system in

a substantial way.

The wavelet system we employ:

We use a bi-frame version

of a system known as PS(4,1)Type II (cf. [25]). This is a

framelet

system,

i.e.

a redundant wavelet system (which es-

sentially means that r, the number of high-pass filters, is larger

than 1; a simple count shows that, if r > 1, the total number of

wavelet coefficients exceeds the length of the original signal). In

our work, the redundancy itself is not considered a virtue. How-

ever, the redundancy provides us with added flexibility: it allows

us to construct relatively short filters with very good frequency

localization.

In our chosen system, there is one low-pass filter L and three

high-pass filters H1, H2, Ha. The analysis filters are all 7-tap

(i.e. each have 7 non-zero coefficients), while the synthesis fil-

ters are all 5-tap. The vanishing moments of the high-pass anal-

ysis filters are 2, 3, 4, respectively, while the approximation or-

der of the system is 42. The "artifact freeness" of our system is

guaranteed since our low-pass filters deviate only mildly from

spline-filters (that perform pure multiple averages, and are the

ideal artifact-free low-pass filters).

The analysis platform.

We derive from a given signal x (that

20ur initial assumption was that the Intemet traffic is not smooth, and there

might not be enough gain in using a system of approximation order 4 (had we

switched to a system with approximation order 2, we could have used shorter

filters). However, comparisons between the performance of the PS(4,1) Type II

to the system RS4 (whose filters are all 5-tap, but whose approximation order is

only 2), yielded a signifi cant difference in performance.

represents five-minute average measurements) three output sig-

nals, as follows. The description here fits a signal that has been

measured for two months. Slightly different rules were em-

ployed for shorter duration signals

(e.g.

a signal measured for a

week).

• The L(ow frequency)-part of the signal, obtained by synthe-

sizing all the low-frequency wavelet coefficients from levels 9

and up. The L-part of the signal should capture patterns and

anomalies of very long duration: several days and up. The sig-

nal here is very sparse (its number of data elements is approxi-

mately 0.4% of those in the original signal), and captures weekly

patterns in the data quite well. For many different types of In-

ternet data, the L-part of the signal reveals a very high degree

of regularity and consistency in the traffic, hence can reliably

capture anomalies of long duration (albeit it may blur various

characteristics of the abnormal behavior of the traffic.)

• The M(id frequency)-part of the signal, obtained by synthesiz-

ing the wavelets coefficients from frequency levels 6, 7, 8. The

signal here has zero-mean, and is supposed to capture mainly

the daily variations in the data. Its data elements number about

3% of those in the original signal.

• The H(igh frequency)-part of the signal is obtained by thresh-

olding the wavelet coefficients in the first 5 frequency levels,

i.e.

setting to zero all coefficients whose absolute value falls below a

chosen threshold (and setting to zero all the coefficients in level

6 and up). The need for thresholding stems from the fact that

most of the data in the H-part consists of small short-term varia-

tions, variations that we think of as "noise" and do not aid us in

our anomaly detection objective.

We close this section with two technical comments: while the

theory of thresholding redundant representations is still in rudi-

mentary form, it is evident to us that we should vary the thresh-

olding level according to the number of vanishing moments in

the filter (decreasing the threshold for the filter with high van-

ishing moments.) We have not yet implemented this technique.

Finally, due to its high approximation order, our system cannot

capture accurately sharp discontinuities in the data.

Detection of anomalies. While it is unlikely that a single

method for detecting anomalies will be ever found 3, we have

taken a first step at developing an automated method for identi-

fying irregularities in the measured data. Our algorithm, which

we call a

deviation score,

has the following ingredients:

1. Normalize the H- and M-parts to have variance one. Com-

pute the local variability of the (normalized) H- and M-parts

by computing the variance of the data falling within a moving

window of specified size. The length of this moving window

should depend on the duration of the anomalies that we wish to

captured. If we denote the duration of the anomaly by to and the

time length of the window for the local deviation by t 1, we need,

in the ideal situation, to have q :=

to/t1

~ 1. If the quotient q is

too small, the anomaly may be blurred and lost. If the quotient

is too large, we may be overwhelmed by "anomalies" that are

of very little interest to the network operators. Our current ex-

periment focuses on anomalies of duration 1-4 hours, and uses

a moving 3-hour local deviation window. Shorter anomalies of

3After all,

there is not a single deft nition of '~nomaly". Should we consider

any change in the measured data to be an anomaly or only those that correspond

to an identifi able change in network state?

75

A signal analysis of network traffic anomalies

Citations

A taxonomy of DDoS attack and DDoS defense mechanisms

Mining anomalies using traffic feature distributions

Diagnosing network-wide traffic anomalies

Accurate, scalable in-network identification of p2p traffic using application signatures

Structural analysis of network traffic flows

References

Introduction to Time Series and Forecasting.

Introduction to time series and forecasting

Bro: a system for detecting network intruders in real-time

Bro: a system for detecting network intruders in real-time

Self-similarity through high-variability: statistical analysis of Ethernet LAN traffic at the source level

Related Papers (5)

Diagnosing network-wide traffic anomalies

Mining anomalies using traffic feature distributions

Anomaly detection in IP networks

Structural analysis of network traffic flows

Inferring internet denial-of-service activity