Proceedings Article•DOI•

Adaptive network intrusion detection system using a hybrid approach

R Rangadurai Karthick¹, Vipul P. Hattiwale¹, Balaraman Ravindran¹•Institutions (1)

13 Feb 2012-pp 1-7

TL;DR: An adaptive network intrusion detection system, that uses a two stage architecture, where in the first stage a probabilistic classifier is used to detect potential anomalies in the traffic and in the second stage a HMM based traffic models are used to narrow down the potential attack IP addresses.

read less

Abstract: Any activity aimed at disrupting a service or making a resource unavailable or gaining unauthorized access can be termed as an intrusion. Examples include buffer overflow attacks, flooding attacks, system break-ins, etc. Intrusion detection systems (IDSs) play a key role in detecting such malicious activities and enable administrators in securing network systems. Two key criteria should be met by an IDS for it to be effective: (i) ability to detect unknown attack types, (ii) having very less miss classification rate. In this paper we describe an adaptive network intrusion detection system, that uses a two stage architecture. In the first stage a probabilistic classifier is used to detect potential anomalies in the traffic. In the second stage a HMM based traffic model is used to narrow down the potential attack IP addresses. Various design choices that were made to make this system practical and difficulties faced in integrating with existing models are also described. We show that this system achieves good performance empirically.

...read moreread less

Figures (7)

Content maybe subject to copyright Report

Adaptive Network Intrusion Detection System using

a Hybrid Approach

R Rangadurai Karthick

Department of Computer Science

and Engineering

IIT Madras, India

ranga@cse.iitm.ac.in

Vipul P. Hattiwale

Department of Computer Science

and Engineering

IIT Madras, India

vipul.hattiwale@gmail.com

Balaraman Ravindran

Department of Computer Science

and Engineering

IIT Madras, India

ravi@cse.iitm.ac.in

Abstract—Any activity aimed at disrupting a service or making

a resource unavailable or gaining unauthorized access can be

termed as an intrusion. Examples include buffer overﬂow attacks,

ﬂooding attacks, system break-ins, etc. Intrusion detection sys-

tems (IDSs) play a key role in detecting such malicious activities

and enable administrators in securing network systems. Two

key criteria should be met by an IDS for it to be effective: (i)

ability to detect unknown attack types, (ii) having very less miss

classiﬁcation rate.

In this paper we describe an adaptive network intrusion

detection system, that uses a two stage architecture. In the ﬁrst

stage a probabilistic classiﬁer is used to detect potential anomalies

in the trafﬁc. In the second stage a HMM based trafﬁc model is

used to narrow down the potential attack IP addresses. Various

design choices that were made to make this system practical

and difﬁculties faced in integrating with existing models are also

described. We show that this system achieves good performance

empirically.

I. INTRODUCTION

Any attempt made to gain unauthorized access to a com-

puter or disrupt the availability of a service/resource is termed

as an intrusion. Intrusion Detection Systems (IDS) refers to

a software or a system built to detect intrusions. In general,

detection mechanism used by IDS can be classiﬁed into two

major categories.

1) Signature based detection: Models built from well

known attack types, i.e., from already known attack

patterns.

2) Anomaly based detection: Modeled using normal trafﬁc

and deviation from this proﬁle is considered anomalous.

Anomaly based techniques are preferred over signature

based techniques owing to their ability to detect novel intru-

sions. Signature based techniques

The key aspects that we considered for building an anomaly

based IDS are

• Choice of attributes: The model is proposed to be imple-

mented in a web server or at network gateway, where the

inﬂow of trafﬁc is huge. We considered to use information

available from a packet’s header as features to build the

model. This way we don’t incur much overhead on the

server and does not become a bottleneck.

• Handling infrequent patterns: All normal network trafﬁc

do not follow uniform ﬂow pattern. Any model proposed

should be able to handle those normal trafﬁc that are

infrequent. Our model uses boosting techniques to learn

over these infrequent patterns in order to classify them

correctly.

• False alarm rate: The main drawback of an anomaly

based detection is the high false alarm rate. Boosting

technique used for the proposed model takes care of this

problem and had very low false alarm rate.

We use Hidden Markov Model (HMM), a generative model,

for modeling input data. The model is proposed to proﬁle TCP

based communication channel for intrusions. Any normal TCP

connection would have three phases during their connection

time, i.e., connection establishment, data transmission and

connection termination phase. There is an inherent sequential

nature in such mode of communication and makes it conve-

nient for us to model them using HMM, which can exploit

this nature of TCP trafﬁc to build models.

A brief description of the HMM model is as follows. The

ﬁrst step in our approach is source separating trafﬁc. It is

performed on both training and testing trafﬁc in order to

preserve the sequence information of TCP trafﬁc. HMM is

used to proﬁle source separated clean trafﬁc and the model

thus built is used to classify test trafﬁc. This approach had

high attack detection rate and also high false positive rate.

High false positive rate corresponds to ﬂagging legitimate

trafﬁc as attack and cannot be accepted when designing such

systems. Hence various design choices, port based separation,

cascading of HMMs, were considered for trafﬁc proﬁling.

These approaches increased the accuracy of the classiﬁer

with very low false positive rate. Intrusion detection data set

released by DARPA [3] is used to train and test HMM models.

The HMM based model had few shortcomings when we

tried to implementing it in real time. This lead us to look for

alternative methods that could be compounded with HMMs

to make it work in real time. Vijayasarathy et al. [2] had

proposed a Naive Bayesian (NB) based model for proﬁling

trafﬁc. This approach handles the skewness in network trafﬁc,

i.e., the amount of anomalous trafﬁc to a server is very low

compared to that of clean trafﬁc. NB based model runs faster,978-1-4673-0298-2/12/$31.00

 2012 IEEE

close to line speed, and computes probability of occurrence of

groups of incoming packets, windows.

The NB model is used for online classiﬁcation and HMM

model is used for ofﬂine analysis of trafﬁc. Trafﬁc that were

ﬂagged as anomalous by the NB model were fed into an ofﬂine

HMM that computed the probability of connections present in

the window. Thus combining NB and HMM models we form

a hybrid model, where in NB model computes probability of

occurrence of windows and HMM computes the probability of

each connection within the window. This way the output from

the HMM, list of attacking IPs, can be used as an update

to ﬁrewall on what IPs to block. HMM now can be used

to generate IP blacklist and makes the hybrid model more

efﬁcient.

The rest of the paper is organized as follows. A brief

description of HMM is presented in Section 2. Section 3

describes our proposed HMM model and preliminary results

obtained are presented in Section 4. Section 5 explains the

problems that we anticipated that could be faced while imple-

menting this system in real time. Section 6 describes hybrid

model and Section 7 describes various experiments and results

obtained. Section 8 describes related work that has been done

by research community in this area.

II. HIDDEN MARKOV MODEL

HMM is a generative model that can model data which

is sequential in nature. It is used to model data where the

assumption of i.i.d. is too restrictive, like speech processing

applications. A detailed tutorial on HMM is available in [1].

Markov Property: Consider a system with N states and at

discrete time intervals, there is transition among states. Let

these instances be t, t = 1, 2, 3, · · · . Any process is Markovian

if the conditional probability of future states, given the present

state and past states, depend only upon the present state. In

order to predict future state, the process by which the current

state is obtained does not matter, i.e.,

P r[q

= S

t−1

= S

, q

t−2

= S

, · · · ]

= P r[q

= S

t−1

= S

]

(1)

We have used HMM that follows the above ﬁrst order

Markov property.

In a HMM, the states and their transitions are not visible.

Instead an output symbol, from a discrete set of symbols, is

emitted during every transition. This sequence of symbols are

the observables used to train a HMM. The following ﬁgure

explains this.

Deﬁnition of a HMM:

HMM [λ] is a ﬁve tuple, i.e., λ = [N, M, A, B, π].

The parameters of the model are

N, number of states in the model, S = {S

, S

, · · · , S

M, number of observation symbols, V = {V

, V

, · · · , V

Fig. 1. HMM Architecture

A, state transition probability matrix, A={a

}, where

= P r[q

t+1

= S

] 1 ≤ i, j ≤ N

(2)

It is a N*N matrix.

B, observation symbol probability matrix, B={b

(k)},

where

(k) = P r[v

at t|q

= S

] 1 ≤ j ≤ N

1 ≤ k ≤ M

(3)

It is a N*M matrix.

π, initial state probability matrix, π = {π

}, where

= P r[q

= S

] 1 ≤ i ≤ N

(4)

It is a 1*N matrix.

Algorithms for HMM: The following two algorithms are

used to model and use the HMM.

1) Baum-Welch algorithm is used to learn the parameters

of the model, {A, B, π}, from input data.

2) Forward-Backward algorithm is used to learn the prob-

ability of occurrence of an observation sequence given

the model, P[O|λ].

III. DESIGN CHOICES

Web servers in general use Transmission Control Protocol

(TCP) for communication between clients and server. TCP

is a state based protocol, i.e., any TCP connection would

progress through set of state transitions during its life time.

This inherent stateful and temporal nature of TCP trafﬁc could

be captured well by using a HMM based classiﬁer. This lead us

to use HMM as our basic building block in our system design.

In the remainder of this section, we describe parameters that

were used to build our model and other design considerations

that shaped our model design.

A. Choosing Parameters

The key aspect in building a HMM is to decide the states

and symbols that are to be used to build the model. Choosing

right set of attributes for a model is very important as this

step would ensure effective usage of available data. For

our experiments, we use TCP header information present in

packets as features.

States of the model are called hidden or latent variables and

are used to describe the underlying distribution generating the

data. In our approach, states of the HMM do not correspond

to actual TCP states. They are used to model the HMM to best

explain the trafﬁc. They do not have direct physical signiﬁ-

cance. For example, network trafﬁc can be assumed to consist

of trafﬁc from legitimate and malicious users. Transition from

one state to another can be considered equivalent to switch

from trafﬁc between malicious and legitimate users.

Next we had to decide upon what could be used to represent

symbols in our HMM model. We use TCP ﬂags as symbols for

the HMM model, following Vijayasarathy et al. [2]. The other

parameters of the HMM model - π, A, and B are estimated

using Baum-Welch algorithm.

B. Initial Approach

Building anomaly based classiﬁer involves two phases -

training and testing. During the training phase, the classiﬁer

is made to proﬁle over clean trafﬁc, i.e., trafﬁc stream which

is devoid of any malicious trafﬁc stream. During the testing

phase, trafﬁc which were not used during training are used to

measure the performance of the model built. The classiﬁer

ﬂags any trafﬁc that deviates from clean trafﬁc proﬁle as

suspicious. The intuition behind this approach is that clean

trafﬁc and malicious trafﬁc are not generated from the same

distribution.

Training phase of our algorithm begins with source separat-

ing training trafﬁc into separate streams. All packets between

a unique source/destination IP pair constitute a stream. Each

stream consist of series of TCP ﬂags that were used in the

packets throughout the connection. Then a single HMM model

is used to learn the characteristics of all streams to the server.

The HMM model takes these TCP ﬂags as observables

and other parameters of the model can be computed from

them. Upon analyzing the trafﬁc data, we found that only few

ﬂags were used in general for most TCP communication. We

associated a number with each ﬂag and a connection with

sequence of ﬂags is converted into a sequence of numbers.

HMM model is trained over this sequence of numbers. The

frequently used TCP ﬂags and the unique ID which we used

for our modeling are as follows.

• SYN - 0

• SYN/ACK - 1

• ACK - 2

• PUSH/ACK -3

• FIN/ACK - 4

• RST - 5

• other TCP ﬂags - 6

The same procedure is followed in the testing phase. The TCP

ﬂag sequence is converted into a sequence of numbers and the

probability of occurrence of this sequence is tested over the

model. Since states of the model does not correspond to actual

TCP states, the number of states can be chosen empirically.

The testing phase of the above said approach is depicted in

Figure 2.

unique

IP pair

unique

IP pair

unique

IP pair

unique

IP pair

HMM

model

Incoming

traﬃc to

server

Source Separation

Attack

traﬃc

Legitimate

traﬃc

Fig. 2. Initial Approach

DARPA data set for intrusion detection [3] is used for

training and testing our HMM model. Preliminary results

obtained for the above said approach were not satisfactory.

The model had very high false positive rate, i.e., clean trafﬁc

stream were also being ﬂagged as attack. The classiﬁer did

not succeed in discriminating between good trafﬁc and bad

trafﬁc. This low performance might be attributed to using just

one HMM to learn all clean trafﬁc proﬁle. A single HMM

could not capture all the characteristics of clean trafﬁc that

were used for training.

C. Alternate Design

In order to overcome the above said shortcoming, we

performed source separation on training/testing trafﬁc ac-

cording to destination ports of the server and then upon

source/destination IP address. Instead of using a single HMM

to lean all trafﬁc coming to a server, we used separate HMMs

for each frequently occurring server port. The reasoning

behind such an approach is that not all trafﬁc belonging to

different applications behave in the same way. For instance,

different trafﬁc streams belonging to a particular application

port, say port 25 (SMTP), have similar characteristics than to

trafﬁc at port 20 (FTP). This approach improved the results

drastically, i.e., the model had higher accuracy and lower false

positive rate compared to the single HMM approach.

The implementation details of this model are as follows.

Training trafﬁc to the server is ﬁrst separated based upon

destination port number of packets. Trafﬁc to particular ports

are then source separated and trained by separate models for

each port. Ports which have higher trafﬁc, like ports for HTTP,

telnet, FTP, etc., were modeled with separate HMM models.

Trafﬁc to other infrequent ports were modeled by a separate

model. The testing phase proceeds the same way. Testing

trafﬁc is ﬁrst separated based upon ports and then source

separated and tested by corresponding HMM model for the

port. Figure 3 describes this approach.

Fig. 3. Layered Model

Even though port wise separation approach had better

results that single model approach, the false positive rate were

still high, almost 10% of training trafﬁc were ﬂagged as attack.

Any practical system designed to detect intrusions should

have low false positive rate, i.e., rate at which a legitimate

user is wrongly classiﬁed as attack should be very low. It

is able to classify most of the frequently occurring positive

trafﬁc correctly but it is not able to correctly classify positive

trafﬁc that were infrequent. Infrequent trafﬁc that were clean

or positive were also ﬂagged as attack. We made this model

as our basic classiﬁer model and it required us explore other

strategies that would improve the performance of our base

classiﬁer.

D. Cascaded HMMs

The positive trafﬁc that were wrongly classiﬁed by the

above approach were trafﬁc streams that were not so frequent.

This can be attributed to those trafﬁc streams which had

very low probabilities in the training phase of the above

approach. In order to overcome this high false positive rate,

we employed multi-stage combination of models to improve

the base classiﬁer’s performance. We employed cascading of

base classiﬁers into several layers to improve performance.

Figure 4 describes the cascading of models.

Implementation details of this approach: Low probability

legitimate streams that were ﬂagged suspicious by all the base

classiﬁers are fed as input to a separate HMM model. This

HMM trains on all the infrequently occurring streams and

builds a model. Trafﬁc streams that have low probabilities in

this model are fed into next layer of HMM model for training.

Fig. 4. Cascaded HMM design

The above process of adding new HMMs, i.e., cascading

HMMs, can be continued until addition of a new model makes

no improvement to the accuracy of the model.

The usage of trafﬁc streams from different protocols for

the ﬁrst layer of cascaded HMM might be counter-intuitive,

since we perform protocol based trafﬁc separation in the

ﬁrst step before feeding trafﬁc into HMM models for each

protocol. We observed that most of training trafﬁc connections

to frequently occurring ports were correctly classiﬁed by their

respective HMM model. The number of connections that were

wrongly ﬂagged as anomalous were very less. But this was

not the case with the HMM model for infrequent port trafﬁc.

The trafﬁc connections that occur infrequently were the ones

wrongly ﬂagged by initial HMMs. Since the connections were

anyway infrequent in their respective protocol, combining

them together did not reduce the performance of the model.

Instead, it improved the accuracy of the HMM model.

The HMM model can be extended to having separate levels

of cascading for each protocol. Since the data available for

training and testing were limited, we performed a combined

layer of cascading for all protocols.

IV. PRELIMINARY RESULTS

Building any classiﬁer involves two phases, i.e., training

and testing phases. Training phase in our approach involves

learning the parameters of the model from a clean trafﬁc

trace. HMM proﬁles this data and uses this information to test

incoming trafﬁc. During the testing phase, trafﬁc that were not

used for training are tested against the model learnt. To build a

classiﬁer we need to have labeled data for training and testing.

Data sets released by DARPA[3] were used to train and test

our classiﬁer.

Experiments

The experiments that were conducted are described as

follows.

# states Connection Separation Separate Models for Protocols Boosting Accuracy (%) False Alarm Rate (%)

5 Just IP No No 81.75 19.63

9 Just IP No No 85.14 15.05

5 IP & Port Yes No 91.49 9.49

9 IP & Port Yes No 92.27 8.49

5 IP & Port Yes Yes 96.96 2.89

9 IP & Port Yes Yes 97.1 2.71

TABLE I

RESULTS ON DARPA DATA SET

1) Single HMM model: Training trafﬁc is separated accord-

ing to source/destination IP pair and trained with a single

HMM model. In the testing phase, source separated connec-

tions were tested against the learnt model. The performance

of the model is bad since it had very high false positive

rate. Probable reason for the failure of the model could be

that a single HMM could not capture all possible trafﬁc

characteristics. High false positive rate can be alleviated by

the following approach.

2) Multiple HMM models: We performed source separation

both on IP and port information of source and destination.

Separate HMMs were used to train/test connections pertaining

to different protocols. Protocols with large amount of incoming

trafﬁc were trained separately, while other infrequent ports

were trained separately. This approach reduced the false pos-

itive rate and we made this type of source separation as our

basic step for building HMM.

3) Cascading of HMMs: In order to improve the per-

formance of the above approach, we employed boosting.

HMM models were cascaded into several layers to model low

probability trafﬁc. The results reported for our experiments are

using two layers of HMM model for cascading.

We used two days of clean trafﬁc data from DARPA data

set for training and the rest of the trafﬁc from other days were

used for testing the learnt model. This way we don’t overﬁt

the training process. Table I describe the performance of our

model on a particular server in DARPA data.

Number of states for the model

The number of states to be used for HMM could be

determined experimentally. Using 9 or 10 states for the model

gave us good results for DARPA data set. We tried using

higher number of states for HMM and the results obtained

were similar and did not improve the performance any further.

Hence we have reported the results on using 9 states for HMM

model.

Attacks detected by HMM

The following attacks present in the DARPA data set were

detected by HMM model.

• neptune - Syn ﬂood denial of service attack on one or

more ports.

• ipsweep - Surveillance sweep performing ping on multi-

ple host addresses.

• portsweep - Surveillance sweep through many ports to

determine which services are active on a single host.

• satan - Network probing tool to exploiting well-known

weaknesses.

• nmap - Network mapping using the nmap tool.

Auckland Data Set

We tried our cascaded HMM experiments on Auckland

IV[4] data set. In the training phase, HMM model is trained

with clean HTTP trafﬁc from DARPA data. For testing pur-

pose, HTTP trafﬁc to various servers in Auckland data set

were considered. Auckland data set is not a labeled data set.

Hence the testing results had to be cross checked manually.

HTTP sequences that were ﬂagged as anomalous were of the

following types.

• Reset Attacks

• Short Connections

Connections that were too short were ﬂagged as anomalous

by the model. The reason for very short connection length

could be abrupt end of connection. HMM model with just 5

states is sufﬁcient for classifying Auckland data set.

V. MOTIVATION FOR HYBRID APPROACH

The goal of the work is to implement suitable models that

can function effectively in real time. When implementing the

above model into a real-time system and it in turn had the

following pitfalls [6].

Source separation of incoming trafﬁc is the ﬁrst and fore-

most step in our design. This way, the model keeps track of

all incoming IP addresses. But then, the problem of IP address

spooﬁng could tax our proposed model. Assume an incoming

packet to have spoofed IP address. The server replies to it

and allocates resource for this IP address. It is highly unlikely

that the connection established by a spoofed IP address would

proceed any further. This would make the server to wait

until time out period and to reclaim allocated resource. The

above scenario could be repeated by attackers and result in

exhausting the resources of a server.

The second issue to consider is the typical length of a

connection. The DARPA data used for training and testing

our model had information about entire connections. But in

reality, we have no way of telling when a connection would

end. The computations performed had complete end to end

connection data, which is quite impossible in reality. If this

model were to be implemented in a server, then the server has

to have separate buffers for each incoming new connection.

This again would end up in using all of server’s available

buffer to store packets. We cannot decide on how much buffer

HTML Viewer

Frequently Asked Questions (13)

Q1. What contributions have the authors mentioned in the paper "Adaptive network intrusion detection system using a hybrid approach" ?

In this paper the authors describe an adaptive network intrusion detection system, that uses a two stage architecture. The authors show that this system achieves good performance empirically. In the first stage a probabilistic classifier is used to detect potential anomalies in the traffic. In the second stage a HMM based traffic model is used to narrow down the potential attack IP addresses.

Q2. What are the two algorithms used to model and use the HMM?

States of the model are called hidden or latent variables and are used to describe the underlying distribution generating the data.

Q3. How did the authors overcome the high false positive rate?

In order to overcome this high false positive rate, the authors employed multi-stage combination of models to improve the base classifier’s performance.

Q4. Why is HMM model added to NB model?

The addition of HMM model to NB model is intended to narrow down on the attacking IPs present in flagged traffic rather than to improve the performance of it.

Q5. What would be the main drawbacks of the proposed hybrid model?

HMM model would then perform source separation for the connections present in the flagged traffic and classifies the connections as either attack or normal.

Q6. What is the procedure used to test the probability of occurrence of the sequence of numbers?

The TCP flag sequence is converted into a sequence of numbers and the probability of occurrence of this sequence is tested over the model.

Q7. What is the implementation of this approach?

Implementation details of this approach: Low probability legitimate streams that were flagged suspicious by all the base classifiers are fed as input to a separate HMM model.

Q8. What is the simplest way to classify incoming traffic?

In their implementation, if there were five consecutive attack flags raised by the NB model, and incoming traffic from then on would be buffered and fed as input to the HMM model.

Q9. What is the reason for the low false positive rate?

Any practical system designed to detect intrusions should have low false positive rate, i.e., rate at which a legitimate user is wrongly classified as attack should be very low.

Q10. How can the authors compute the number of states to be used for a server?

Using HMM with larger states gave us exact results and the number of states to be chosen for a server can be computed empirically.

Q11. What is the purpose of the above study?

In order to overcome the above said shortcoming, the authors performed source separation on training/testing traffic according to destination ports of the server and then upon source/destination IP address.

Q12. What is the number of windows to consider during time out mechanism?

The number of windows to consider during time out mechanism is implementation specific, depending upon traffic characteristics of a server.

Q13. What is the test phase of the above approach?

The testing phase of the above said approach is depicted in Figure 2.DARPA data set for intrusion detection [3] is used for training and testing their HMM model.

Adaptive network intrusion detection system using a hybrid approach

Figures (7)

Citations

Cites methods from "Adaptive network intrusion detectio..."

Cites methods from "Adaptive network intrusion detectio..."

References

Related Papers (5)

Frequently Asked Questions (13)

Q1. What contributions have the authors mentioned in the paper "Adaptive network intrusion detection system using a hybrid approach" ?

Q2. What are the two algorithms used to model and use the HMM?

Q3. How did the authors overcome the high false positive rate?

Q4. Why is HMM model added to NB model?

Q5. What would be the main drawbacks of the proposed hybrid model?

Q6. What is the procedure used to test the probability of occurrence of the sequence of numbers?

Q7. What is the implementation of this approach?

Q8. What is the simplest way to classify incoming traffic?

Q9. What is the reason for the low false positive rate?

Q10. How can the authors compute the number of states to be used for a server?

Q11. What is the purpose of the above study?

Q12. What is the number of windows to consider during time out mechanism?

Q13. What is the test phase of the above approach?