scispace - formally typeset
Open AccessProceedings ArticleDOI

Statistical approaches to DDoS attack detection and response

L. Feinstein, +3 more
- Vol. 1, pp 303-314
Reads0
Chats0
TLDR
Methods to identify DDoS attacks by computing entropy and frequency-sorted distributions of selected packet attributes and how the detectors can be extended to make effective response decisions are presented.
Abstract
The nature of the threats posed by distributed denial of service (DDoS) attacks on large networks, such as the Internet, demands effective detection and response methods. These methods must be deployed not only at the edge but also at the core of the network This paper presents methods to identify DDoS attacks by computing entropy and frequency-sorted distributions of selected packet attributes. The DDoS attacks show anomalies in the characteristics of the selected packet attributes. The detection accuracy and performance are analyzed using live traffic traces from a variety of network environments ranging from points in the core of the Internet to those inside an edge network The results indicate that these methods can be effective against current attacks and suggest directions for improving detection of more stealthy attacks. We also describe our detection-response prototype and how the detectors can be extended to make effective response decisions.

read more

Content maybe subject to copyright    Report

Statistical Approaches to DDoS Attack Detection and Response
1
1
This research was supported by DARPA under contract N66001-01-C-8048.
Laura Feinstein, Dan Schnackenberg
The Boeing Company, Phantom Works
Laura.C.Feinstein@boeing.com
Daniel.D.Schnackenberg@boeing.com
Ravindra Balupari, Darrell Kindred
Network Associates Laboratories
Ravindra_Balupari@nai.com
Darrell_Kindred@nai.com
Abstract
The nature of the threats posed by Distributed Denial of
Service (DDoS) attacks on large networks, such as the
Internet, demands effective detection and response
methods. These methods must be deployed not only at
the edge but also at the core of the network. This paper
presents methods to identify DDoS attacks by comput-
ing entropy and frequency-sorted distributions of
selected packet attributes. The DDoS attacks show
anomalies in the characteristics of the selected packet
attributes. The detection accuracy and performance are
analyzed using live traffic traces from a variety of
network environments ranging from points in the core
of the Internet to those inside an edge network. The
results indicate that these methods can be effective
against current attacks and suggest directions for
improving detection of more stealthy attacks. We also
describe our detection-response prototype and how the
detectors can be extended to make effective response
decisions.
1. Introduction
Powerful DDoS toolkits are available to potential
attackers, and essential networks are ill prepared for
defense. The security community has long known that
DDoS attacks are possible, but only in the past three
years have such attacks become popular with hackers.
As ominous as the threat is today, it will only worsen as
tools are built to evade defenses. Soon, DDoS floods
will appear that are difficult to distinguish from legiti-
mate traffic, and packet rates from individual flood
sources will be low enough to escape notice by local
administrators. To meet the increasing need for detec-
tion and response, researchers face these major issues:
A stand-alone router on the attack path should
automatically recognize that the network is under
attack and adjust its traffic flow to ease the attack
impact downstream.
The detection and response techniques should be
adaptable to a wide range of network environ-
ments, preferably without significant manual tun-
ing.
Attack detection should be as accurate as possible.
False positives can lead to inappropriate responses
that cause denial of service to legitimate users.
False negatives result in attacks going unnoticed.
Attack response should employ intelligent packet
discard mechanisms to reduce the downstream im-
pact of the flood while preserving and routing the
non-attack packets.
The detection method should be effective against a
variety of attack tools available today and also ro-
bust against future attempts by attackers to evade
detection.
These are demanding goals, but we contend that
there are several reasons to believe that satisfactory
detection and response methods can be designed. DDoS
traffic generated by todays tools often has packet-
crafting characteristics that make it possible to distin-
guish from normal traffic. For example, in some con-
figurations the Stacheldraht attack tool crafts packets so
that the source port is random and the destination port
is sequentially increased from one packet to the next
[1],[10]. Future DDoS tools may include improvements
to packet crafting. However, we claim that these tools
are unlikely to model legitimate traffic closely enough
to produce crafted packets that do not distort statistical
measurements of the composition of the traffic. Our
hypothesis is that relatively simple statistical measures
can be used to discriminate DDoS traffic from legiti-
mate traffic in core routers with sufficient accuracy to
mitigate the effect of the attack downstream.
Research conducted by other organizations suggests
that statistical measurements and statistical processing
are an effective approach to the DDoS problem. The
EMERALD project at SRI International uses intrusion
Proceedings of the DARPA Information Survivability Conference and Exposition (DISCEX’03)
0-7695-1897-4/03 $17.00 © 2003 IEEE

detection signatures with Bayesian inference to detect
distributed attacks [12].
Researchers at Florida Institute of Technology have
created an intrusion detection system (IDS) that is non-
stationary and models probabilities based on time since
the last event rather than on average rate [6]. This IDS
operates on many of the same fields our detector
monitors and has similar training requirements to set up
initial thresholds and baselines. The system has two
components, PHAD and ALAD. PHAD operates on the
packet header while ALAD operates on an incoming
server TCP connection. The PHAD component clusters
observed values and then compares the size of the
clusters to accepted thresholds to determine anomalies.
Mazu Networks uses a similar architecture to PHAD
and our chi-square detector. The Mazu system collects
network statistics through a monitoring device and
similarly sorts the collected items into buckets [3]. An
algorithm determines whether buckets should be
divided or combined and a threshold detects anomalies
depending on the number and size of the buckets.
We have imposed some significant constraints on
our DDoS defense development: no explicit coordina-
tion (e.g., pushback [7]) between defending network
components, no built-in knowledge of applications or
protocols, and no instrumentation at end hosts. These
approaches are being actively explored in other re-
search, and we believe that the techniques described
here can complement these others in a comprehensive
DDoS defense solution.
2. Detection Algorithms
Our detection algorithms measure statistical proper-
ties of specific fields in the packet headers at various
points in the Internet. For instance, if a detector cap-
tures 1000 consecutive packets at a peering point and
computes the frequency of occurrence of each unique
source IP address in those 1000 packets, then the
detector will have a model of the distribution of the
source address. Further computations with this distribu-
tion allow us to measure the randomness or uniformity
of the addresses as well as the goodness-of-fitof the
distribution with respect to prior measurements.
2.1. Entropy
Let an information source have n independent sym-
bols each with probability of choice p
i
. Then, the
entropy H is defined as [17]:
i
n
i
i
p
H
2
1
log
=
=
Hence, entropy can be computed on a sample of con-
secutive packets. Comparing the value for entropy of
some sample of packet header fields to that of another
sample of packet header fields from the same peering
point provides a mechanism for detecting changes in
the randomness. We have observed through experimen-
tation that while a network is not under attack, the
entropy values for various header fields each fall in a
narrow range. While the network is under attack with
current attack tools, these entropy values exceed these
ranges in a detectable manner.
The algorithm to compute entropy can be optimized
to perform only a few simple computations per packet.
In our implementation, the entropy of a source will be
calculated through a sliding window of fixed width, W.
The probability value p
i
in this algorithm is actually the
frequency of occurrence of each unique symbol divided
by the total number of symbols in the sample. The
process of computing entropy of W packets is as fol-
lows:
1. Compute the entropy of the first W packets with
reference to a specific header parameter (e.g.
source IP address).
2. Isolate the term in the summation corresponding to
the probability of the first symbol in the window
(label this symbol with i=1) and also the value for
the corresponding probability (p
i-1
).
3. Slide the window so the new first term was previ-
ously the second term and the next W-1 consecu-
tive terms are contained in the window.
4. Isolate the term in the summation corresponding to
the probability of the symbol acquired from shift-
ing the window.
5. Subtract off the terms isolated in steps 2 and 4
from the value computed in step 1.
6. Recompute the affected probabilities for the
current window of data. That is, recompute p
i-1
and
the probability of the symbol that was added by
sliding the window.
7. Using the values computed in step 6, add the two
terms missing from the entropy summation back in
and compare this new entropy value to the previous
entropy computations.
8. Repeat steps 2-7 to determine subsequent entropy
values.
A sophisticated attacker would likely attempt to de-
feat the detection algorithm by creating stealthy traffic
floods that mimic the legitimate traffic the detector
would expect. An attacker who knew that the entropy
of various packet attributes was being monitored could
build an attack tool that generates floods with tunable
entropy levels. Through guesswork, penetration, or trial
and error, the attacker could determine typical entropy
levels seen at the detector and tune the flood to match.
This may not be as easy as it sounds, particularly if
there are multiple detectors deployed between the flood
sources and the targets, as the typical entropy values
Proceedings of the DARPA Information Survivability Conference and Exposition (DISCEX’03)
0-7695-1897-4/03 $17.00 © 2003 IEEE

seen by detectors in different network environments are
likely to differ. Stealthy attacks are explored further in
Section 3.4.
The window size, W, is a tunable parameter that con-
trols how much smoothing of short-term fluctuations
the detector will do. Increasing W will reduce the
variation in entropy and may reduce the rate of false-
positives resulting from brief and presumably insignifi-
cant anomalies. However, W should be kept small
enough that attacks are detected quickly. We have
found that a window size of 10,000 packets is a reason-
able compromise in the network environments we have
explored.
2.2. Chi-Square Statistic
Pearsons chi-square (
χ
2
) Test is used for distribu-
tion comparison in cases where the measurements
involved are discrete values. For example, it could be
used to test the distribution of TCP SYN flag values (0
or 1) or protocol numbers. The test works best when the
number of possible values is small. In particular, a rule
of thumb is that the expected number of packets in a
sample having each possible value be at least five.
However, this can often be achieved through binning,
that is combining a set or range of possible values and
treating them as one. For example, the chi-square test
can be applied to service ports by considering four
values: HTTP, FTP, DNS, and other.Similarly,
packet lengths can be binned into ranges such as 0-64
bytes, 65-128 bytes, 129-255 bytes, etc.
For a sample of N packets, let B be the number of
available bins. Define N
i
as the number of packets
whose value falls in the ith bin and n
i
as the expected
number of packets in the ith bin under the typical
distribution. Then the chi-square statistic is computed
as follows:
=
=
B
i
i
i
i
n
n
1
2
2
)
(
χ
.
When the N
i
and n
i
values are large and the N meas-
urements are independent and drawn from the expected
distribution, this value follows the well-known chi-
square distribution with B-1 degrees of freedom. These
assumptions (in particular, independence) do not
typically hold for packet field values even under normal
conditions. Hence, comparison with the chi-square
distribution is of limited utility. However, the chi-
square statistic does provide a useful measure of the
deviation of a current traffic profile from the baseline.
A current-traffic profile, mapping packet attribute
values to frequencies, is maintained as follows:
1. For each packet that arrives, extract the value, v,
of the desired attribute (e.g., source address).
2. Apply exponential decay to the stored frequency
for v based on its age (time since last update).
The stored frequency is multiplied by
halflife
age
)
5
.
0
ln(
exp
.
3. Increment the frequency for v and store the cur-
rent time (or packet count) as its last-update
time.
Periodically, this current-traffic profile is compared
with a baseline profile using the chi-square statistic, as
follows:
1. Apply exponential decay to the stored current-
traffic frequencies, as above.
2. Group the attribute values into bins based on
frequency. For example, the 16 most common
values might go in one bin, the next 64 in an-
other, the next 256 in another, and the rest in
another.
3. Calculate the total frequency for each bin.
4. Calculate the chi-square statistic, comparing
these bin-frequency totals with the bin-
frequency values in the baseline profile.
The baseline profile can be maintained as decaying
averages of the current-traffic bin frequencies. Each
time the current-traffic bin frequencies are computed,
the average is updated as follows:
1. Exponential decay is applied to the stored bin-
frequency averages, using a significantly longer
half-life than is used for the current-traffic pro-
file.
2. The new set of bin frequencies is multiplied by
halflife
baseline
age
_
)
5
.
0
ln(
exp
1
and the result is added to the decayed average.
The user can tune the detector by modifying the fol-
lowing parameters: traffic profile half-life, baseline
profile half-life, bin definitions and hash function
range. Values in the current-traffic profile whose
frequencies decay below a certain threshold can be
purged without substantially affecting the chi-square
computation. This purging reduces memory consump-
tion and processing requirements. For packet attributes
such as IP addresses that have a very large range, a
hash of the attribute's value may be used instead of the
value itself in order to reduce memory consumption and
processing requirements in the worst case (many
distinct values). When the baseline frequency value for
a given bin is very low, the chi-square statistic may be
excessively influenced by that bin's value. Ideally, the
bins will be defined such that this is unlikely, but as a
fallback, low-value bins can be automatically merged
with adjoining bins prior to computing the chi-square
statistic.
Proceedings of the DARPA Information Survivability Conference and Exposition (DISCEX’03)
0-7695-1897-4/03 $17.00 © 2003 IEEE

It is unlikely that an outside attacker without access
to the detector itself or a large fraction of its network
neighbors will know the exact characteristics of net-
work traffic typically seen by the detector. Therefore,
we hypothesize that the attack traffic will differ from
typical traffic in measurable ways.
3. Detector Evaluation
In order to evaluate thoroughly the potential effec-
tiveness of DDoS detection methods such as those
described in Section 2, we must address the following
questions.
How well can the method distinguish attack condi-
tions from normal conditions? To answer this question,
we must determine what kinds of DDoS attacks the
method can detect, and what fraction of the monitored
traffic the attacks must comprise in order to be detected.
Ideally, a detector should pick up not only attacks
generated by tools found in the wildto date, but also
more stealthy attacks using more sophisticated tools
wielded by attackers familiar with the detection method
and detectors network environment. Finally, we must
assess the frequency and consequences of false-
positives, ordinary fluctuations in legitimate traffic
interpreted by the detector as attacks.
To what network environments and platforms is the
method best suited? Characteristics of the monitored
network traffic will vary significantly depending on
where detectors are deployed. The protocols used,
diversity of addresses seen, typical session durations,
response latency, and daily volume fluctuations will
differ dramatically among LAN environments, edge
routers, and core routers. A detection method effective
in one of these environments may fare poorly in others.
In addition, if the method is to be applied in core
routers, its per-packet computational requirements and
memory usage must be modest in order to make real-
time processing at high bandwidths practical (see
Section 3.5).
Once an attack is detected, can the detector charac-
terize the attack traffic sufficiently to produce a tar-
geted response that mitigates the attacks effects?
Detection alone may be useful for alerting human
administrators to attacks in progress or notifying
upstream (closer to attack sources) devices that some-
thing should be done. However, many DDoS attacks
today are only two minutes in duration [8], so the
ability to generate automated responses, at least as a
preliminary measure, is important. A detection method
that can effectively describe the nature of the attack will
make such automated response more practical.
The remainder of this section describes attempts an-
swer these questions for the entropy and chi-square
DDoS detection methods.
3.1. Prototype Implementation
To evaluate the DDoS attack detection methods de-
scribed in Section 2 under realistic conditions, we
implemented prototype detector modules as plug-ins for
Snort, the popular, open-source network intrusion
detection system [13], [14]. In addition to real-time
traffic monitoring, Snort supports off-line processing of
previously captured network traffic, making it possible
to conduct reproducible detection experiments with
traffic data from a variety of environments.
The chi-square and entropy detectors were built as
Snort preprocessors, operating on every IP datagram
received by Snort prior to stream reassembly and other
packet manipulation. The two detectors can be indi-
vidually enabled and configured in the
snort.conf
configuration file, and can trigger alarms through
Snorts modular alerting facility.
In addition to issuing alerts, these plug-ins record
data to log files in the Snort log directory. The entropy
detector logs periodically computed entropy values for
each packet attribute specified in the initialization file
(e.g., source and destination IP addresses and
TCP/UDP ports, datagram length, and TCP window
size). The chi-square detector logs the periodically
computed chi-square statistics for each of the specified
packet attributes, along with the current and baseline
bin frequency values used to compute those statistics.
This data can be useful for manual or automatic detec-
tor tuning and alert threshold setting.
3.2. Network Trace Data
A critical element of evaluating these detectors is
exposing them to traffic from a variety of network
environments. This allows us to determine how stable
the traffic statistics monitored by the detectors are in
those environments, and how effectively the detectors
can identify DDoS attack traffic in different contexts.
For this purpose, we obtained several publicly avail-
able network traces as well as some traces collected
specifically for our experiments. These traces are not
known to contain substantial DDoS attacks, so we treat
them as consisting of legitimate traffic. To test the
effects of DDoS attacks, we simulate these attacks by
overlaying the kind of attack traffic generated by some
existing DDoS attack tools onto the traces at various
concentrations [10]. Ideally, we would make use of
traces containing identifiable periods during which
actual DDoS attacks were in progress, but few of these
are publicly available.
The traces used were drawn from a variety of net-
work environments, as described below, and most have
IP addresses that have been transformed via an un-
known but one-to-one function for privacy purposes.
Proceedings of the DARPA Information Survivability Conference and Exposition (DISCEX’03)
0-7695-1897-4/03 $17.00 © 2003 IEEE

This address re-mapping is irrelevant to the currently
implemented detectors, since they make no assumptions
about relationships between different IP addresses.
The following traces were used:
NZIX. This trace, from July 2000, includes five
consecutive days of IP headers sent through the
New Zealand Internet Exchange (NZIX), a peer-
ing point for several major New Zealand ISPs
and the University of Waikato; throughput
ranges roughly from 4 to 12 Mbits/s. Two six-
hour periods were used for detector experimen-
tation.
Bell Labs. This trace contains one week of IP
headers observed outside the firewall for Bell
Labs, a 9Mbit/s connection serving a staff of
about 450. One full day of this traffic was used
for experimentation.
University. This trace was collected from the
Stocker Engineering and Technology network at
Ohio University. It contains all the packets en-
tering and leaving the network, with throughput
ranging from 8 to 16 Mbit/s. Three sets of data,
each having around 30,000,000 packets, were
collected at different times during a day for ex-
perimentation.
Small Company. This trace contains one week
of network traffic observed outside the firewall
of a small technology company in the United
States. The connection served a staff of about
200 users in the company. One 24-hour week-
day trace was used for experimentation.
3.3. Detection Example
To illustrate the effects of an attack on the entropy
and chi-square statistics, we examined a 1,000,000-
packet excerpt from the NZIX data set with a simulated
DDoS attack comprising 25% of all packets, starting at
packet number 700,000 and ending at packet number
800,000. (Packets in this excerpt are numbered from
200,000 to 1,200,000.) In this attack, IP source ad-
dresses are chosen at random from a uniform distribu-
tion; we will focus on source-address-based detection.
Figure 1 shows the output (entropy values) of an
entropy detector examining the IP source address
packet attribute with a window size of 10,000 packets.
Before the attack begins, source address entropy meas-
urements fall entirely within the range 7.0-7.5. During
the attack, the entropy increases by approximately 1.5.
Any maximum-entropy threshold setting between 7.5
and 8.75 would detect this attack without generating
any false-positives in this example.
In Figure 2, the bin frequency profile for a source
address chi-square detector (current traffic half-life is
Figure 1: Entropy for a brief DDoS attack
Figure 2: Bin frequencies for a brief attack
Figure 3: Chi-square values for a brief attack
20000 packets; bins defined as most frequent source
address, next 4 most frequent, next 16, next 256, next
4096, and the remainder) is displayed for the same
example. The six colored regions represent the percent-
Proceedings of the DARPA Information Survivability Conference and Exposition (DISCEX’03)
0-7695-1897-4/03 $17.00 © 2003 IEEE

Citations
More filters
Book ChapterDOI

Deep Neural Network Based Malicious Network Activity Detection Under Adversarial Machine Learning Attacks

TL;DR: Wang et al. as discussed by the authors presented a new approach to protect a malicious activity detection model from the fast-gradient sign method (FGSM) attack, which is the purest form of the gradient-based evading technique that is used by attackers to evade the classification model.
Proceedings ArticleDOI

Aggregated representations and metrics for scalable flow analysis

TL;DR: An aggregated representation of the network traffic is leveraged which is further analyzed using dedicated entropic based metrics and machine learning techniques, resulting in a reduction of the computational complexity while the accuracy still remains acceptable as highlighted by evaluation on real datasets.
Proceedings ArticleDOI

Query-Crafting DoS Threats Against Internet DNS

TL;DR: This work investigates the DoS on DNS system and introduces the query-crafting threats where the attacker controls the DNS query payload (the domain name) to maximize the threat impact per query (increasing the communications between the DNS servers and the threat time duration), which is orthogonal to other DoS approaches to increase the attack impact.

Denial-of-service attack modelling and detection for HTTP/2 services

Erwin Adi
TL;DR: This research work provides a novel model for DoS attacks against HTTP/2 services, but also provides a model of stealthy variants of such attacks, that can disrupt routine web services.
Proceedings ArticleDOI

A second-order statistical detection approach with application to Internet anomaly detection

TL;DR: The experimental results indicate that the proposed approach achieves high detection rates in detecting multiple known and unknown anomalies.
References
More filters
Journal ArticleDOI

A mathematical theory of communication

TL;DR: This final installment of the paper considers the case where the signals or the messages or both are continuously variable, in contrast with the discrete nature assumed until now.
Book

The Mathematical Theory of Communication

TL;DR: The Mathematical Theory of Communication (MTOC) as discussed by the authors was originally published as a paper on communication theory more than fifty years ago and has since gone through four hardcover and sixteen paperback printings.
Proceedings ArticleDOI

On power-law relationships of the Internet topology

TL;DR: These power-laws hold for three snapshots of the Internet, between November 1997 and December 1998, despite a 45% growth of its size during that period, and can be used to generate and select realistic topologies for simulation purposes.
Proceedings Article

Snort - Lightweight Intrusion Detection for Networks

TL;DR: Snort provides a layer of defense which monitors network traffic for predefined suspicious activity or patterns, and alert system administrators when potential hostile traffic is detected.
Related Papers (5)
Frequently Asked Questions (2)
Q1. What contributions have the authors mentioned in the paper "Statistical approaches to ddos attack detection and response" ?

This paper presents methods to identify DDoS attacks by computing entropy and frequency-sorted distributions of selected packet attributes. The authors also describe their detection-response prototype and how the detectors can be extended to make effective response The results indicate that these methods can be effective against current attacks and suggest directions for improving detection of more stealthy attacks. 

Future research and development will focus on tighter integration of detection and response modules. In the current implementation, detectors generate concise recommended rules for responders to impose, and there is no further detector/responder coordination. By implementing detection and response methods on this platform and testing their performance, the authors can validate the claim that they are appropriate for use in future high-speed routers.