Outside the Closed World: On Using Machine Learning for Network Intrusion Detection
Summary (4 min read)
Introduction
- In this paper the authors set out to examine the differences between the intrusion detection domain and other areas where machine learning is used with more success.
- In addition, the authors identify further characteristics that their domain exhibits that are not well aligned with the requirements of machine-learning.
- By “machine-learning” the authors mean algorithms that are first trained with reference input to “learn” its specifics (either supervised or unsupervised), to then be deployed on previously unseen input for the actual detection process.
II. MACHINE LEARNING IN INTRUSION DETECTION
- Anomaly detection systems find deviations from expected behavior.
- To capture normal activity, IDES (and its successor NIDES [10]) used a combination of statistical metrics and profiles.
- Since then, many more approaches have been pursued.
- Often, they borrow schemes from the machine learning community, such as information theory [11], neural networks [12], support vector machines [13], genetic algorithms [14], artificial immunesystems [15], and many more.
- The authors discussion in this paper aims to develop a different general point: that much of the difficulty with anomaly detection systems stems from using tools borrowed from the machine learning community in inappropriate ways.
III. CHALLENGES OF USING MACHINE LEARNING
- It can be surprising at first to realize that despite extensive academic research efforts on anomaly detection, the success of such systems in operational environments has been very limited.
- The authors believe that this “success discrepancy” arises because the intrusion detection domain exhibits particular characteristics that make the effective deployment of machine learning approaches fundamentally harder than in many other contexts.
- In the following the authors identify these differences, with an aim of raising the community’s awareness of the unique challenges anomaly detection faces when operating on network traffic.
- The authors note that their examples from other domains are primarily for illustration, as there is of course a continuous spectrum for many of the properties discussed (e.g., spam detection faces a similarly adversarial environment as intrusion detection does).
- Based on discussions with colleagues who work with machine learning on a daily basis, the authors believe these intuitive arguments match well with what a more formal analysis would yield.
A. Outlier Detection
- Fundamentally, machine-learning algorithms excel much better at finding similarities than at identifying activity that does not belong there: the classic machine learning application is a classification problem, rather than discovering meaningful outliers as required by an anomaly detection system [21].
- Filtering, matching each of a user’s purchased (or positively rated) items with other similar products, where similarity is determined by products that tend be bought together.
- The idea of specifying only positive examples and adopting a standing assumption that the rest are negative is called the closed world assumption.
- Originally proposed by Graham [8], Bayesian frameworks trained with large corpora of both spam and ham have evolved into a standard tool for reliably identifying unsolicited mail.
- The observation that machine learning works much better for such true classification problems then leads to the conclusion that anomaly detection is likely in fact better suited for finding variations of known attacks, rather than previously unknown malicious activity.
B. High Cost of Errors
- In intrusion detection, the relative cost of any misclassification is extremely high compared to many other machine learning applications.
- While for the seller a good recommendation has the potential to increase sales, a bad choice rarely hurts beyond a lost opportunity to have made a more enticing recommendation.
- Spelling and grammar checkers are commonly employed to clean up results, weeding out the obvious mistakes.
- Spam detection faces a highly unbalanced cost model: false positives (i.e., ham declared as spam) can prove very expensive, but false negatives (spam not identified as such) do not have a significant impact.
- Overall, an anomaly detection system faces a much more stringent limit on the number of errors that it can tolerate.
C. Semantic Gap
- Anomaly detection systems face a key challenge of transferring their results into actionable reports for the network operator.
- Unfortunately, in the intrusion detection community the authors find a tendency to limit the evaluation of anomaly detection systems to an assessment of a system’s capability to reliably identify deviations from the normal profile.
- When addressing the semantic gap, one consideration is the incorporation of local security policies.
- Returning to the P2P example, when examining only NetFlow records, it is hard to imagine how one might spot inappropriate content.
- As another example, consider exfiltration of personally identifying information (PII).
D. Diversity of Network Traffic
- Network traffic often exhibits much more diversity than people intuitively expect, which leads to misconceptions about what anomaly detection technology can realistically achieve in operational environments.
- Wright et al. [27] infer the language spoken on encrypted VOIP sessions;.
- However these examples all demonstrate the power of exploiting structural knowledge informed by very careful examination of the particular domain of study—results not obtainable by simply expecting an anomaly detection system to develop inferences about “peculiar” activity.
- While highly variable over smallto-medium time intervals, traffic properties tend to greater stability when observed over longer time periods (hours to days, sometimes weeks).
- Finally, the authors note that traffic diversity is not restricted to packet-level features, but extends to application-layer information as well, both in terms of syntactic and semantic variability.
E. Difficulties with Evaluation
- For an anomaly detection system, a thorough evaluation is particularly crucial to perform, as experience shows that many promising approaches turn out in practice to fall short of one’s expectations.
- The two publicly available datasets that have provided something of a standardized setting in the past—the DARPA/Lincoln Labs packet traces [41], [42] and the KDD Cup dataset derived from them [43]—are now a decade old, and no longer adequate for any current study.
- It is understandable that in the face of such high risks, researchers frequently encounter insurmountable organizational and legal barriers when they attempt to provide datasets to the community.
- The authors argue that when evaluating an anomaly detection system, understanding the system’s semantic properties— the operationally relevant activity that it can detect, as well as the blind spots every system will necessarily have— is much more valuable than identifying a concrete set of parameters for which the system happens to work best for a particular input.
- Exploiting the specifics of a machine learning implementation requires significant effort, time, and expertise on the attacker’s side.
IV. RECOMMENDATIONS FOR USING MACHINE LEARNING
- The authors note that they view these guidelines as touchstones rather than as firm rules; there is certainly room for further discussion within the wider intrusion detection community.
- If the authors could give only one recommendation on how to improve the state of anomaly detection research, it would be: Understand what the system is doing.
- The nature of their domain is such that one can always find a variation that works slightly better than anything else in a particular setting.
- The point the authors wish to convey however is that they are working in an area where insight matters much more than just numerical results.
A. Understanding the Threat Model
- Before starting to develop an anomaly detector, one needs to consider the anticipated threat model, as that establishes the framework for choosing trade-offs.
- Operation in a small network faces very different challenges than for a large enterprise or backbone network; academic environments impose different requirements than commercial enterprises.
- Possible answers ranges from “very little” to “lethal.”.
- The degree to which attackers might analyze defense techniques and seek to circumvent them determines the robustness requirements for any detector.
B. Keeping The Scope Narrow
- A common pitfall is starting with the premise to use machinelearning (or, worse, a particular machine-learning approach) and then looking for a problem to solve.
- Question is identifying the feature set the detector will work with: insight into the features’ significance (in terms of the domain) and capabilities (in terms of revealing the targeted activity) goes a long way towards reliable detection.
- Laying out the land like this sets up the stage for a well-grounded study.
C. Reducing the Costs
- Per the discussion in Section III-B, it follows that one obtains enormous benefit from reducing the costs associated with using an anomaly detection system.
- As the authors have seen, an anomaly detection system does not necessarily make more mistakes than machine learning systems deployed in other domains—yet the high cost associated with each error often conflicts with effective operation.
- Likely the most important step towards fewer mistakes is reducing the system’s scope, as discussed in Section IV-B.
- The setup of the underlying machine-learning problem also has a direct impact on the number of false positives.
- As a simple flow-level example, the set of destination ports a particular internal host contacts will likely fluctuate quite a bit for typical client systems; but the authors might often find the set of ports on which it accepts incoming connections to be stable over extended periods of time.
D. Evaluation
- When evaluating an anomaly detection system, the primary objective should be to develop insight into the system’s capabilities:.
- The authors discuss evaluation separately in terms of working with data, and interpreting results.
- Often measurements include artifacts that can impact the results (such as filtering or unintended loss), or unrelated noise that one can safely filter out if readily identified (e.g., an internal vulnerability scan run by the security department), also known as No dataset is perfect.
- Thus, machine learning can sometimes serve very effectively to “point the way” to how to develop detectors that are themselves based on different principles.
- The successful operation of an anomaly detection system typically requires significant experience with the particular system, as it needs to be tuned to the local setting—experience that can prove cumbersome to collect if the underlying objective is instead to understand the new system.
V. CONCLUSION
- The authors work examines the surprising imbalance between the extensive amount of research on machine learning-based anomaly detection pursued in the academic intrusion detection community, versus the lack of operational deployments of such systems.
- The authors argue that this discrepancy stems in large part from specifics of the problem domain that make it significantly harder to apply machine learning effectively than in many other areas of computer science where such schemes are used with greater success.
- It is crucial to acknowledge that the nature of the domain is such that one can always find schemes that yield marginally better ROC curves than anything else has for a specific given setting.
- Such results however do not contribute to the progress of the field without any semantic understanding of the gain.
Did you find this useful? Give us your feedback
Citations
1,905 citations
Cites background from "Outside the Closed World: On Using ..."
...It is a common shortcoming of learningbased approaches that they are black-box methods [34]....
[...]
1,050 citations
Cites background or methods from "Outside the Closed World: On Using ..."
...This has been pointed out in numerous work such as Tavallaee et al. (2010) and has also been argued in Sommer and Paxson (2010) that “the most significant challenge an evaluation faces is the lack of appropriate public datasets for assessing anomaly detection systems.”...
[...]
...It it worthy to note that works such as Sommer and Paxson (2010) have made interesting observations on anomaly-based network intrusion detection mechanisms and have provided recommendations to further improve research in this field....
[...]
...The systematic approach described in this work addresses the evaluation recommendations of Sommer and Paxson (2010)....
[...]
947 citations
703 citations
677 citations
Cites background from "Outside the Closed World: On Using ..."
...For SSH traffic, it achieves 95.9%DR and 2.8% FPR on theDalhousie dataset, 97.2% DR and 0.8% FPR on the AMP dataset, and 82.9% DR and 0.5% FPR on the MAWI dataset....
[...]
...Consequentially, the True Positive Rate (TPR) describing the number of correct predictions is inferred from the confusion matrix as: TPR (Recall) = TP TP + FN The converse, False Positive Rate (FPR) is the ratio of incorrect predictions and is defined as: FPR = FP FP + TN Similarly, True Negative Rate (TNR) and False Negative Rate (FNR) are used to deduce the number of correct and incorrect negative predictions, respectively....
[...]
...The authors evaluate the sensitivity of parameters to optimize their settings in order to guarantee the best performance, that is higher TPR and lower FPR....
[...]
...The C4.5 DT classifier also performed well for Skype traffic with 98.4% DR and 7.8% FPR in the Dalhousie dataset....
[...]
...Despite the extensive literature on ML-based anomaly detection, it has not received the same traction in real deployments [415]....
[...]
References
20,196 citations
"Outside the Closed World: On Using ..." refers background in this paper
...(This is a basic requirement for sound science, yet overlooked surprisingly often; see however [21] for a set of standard techniques one can apply when having only limited data available)....
[...]
...Fundamentally, machine-learning algorithms excel much better at finding similarities than at identifying activity that does not belong there: the classic machine learning application is a classification problem, rather than discovering meaningful outliers as required by an anomaly detection system [21]....
[...]
[...]
9,995 citations
9,627 citations
"Outside the Closed World: On Using ..." refers methods in this paper
...Chandola et al. provide a survey of anomaly detection in [16], including other areas where similar approaches are used, such as monitoring credit card spending patterns for fraudulent activity....
[...]
4,788 citations
"Outside the Closed World: On Using ..." refers background in this paper
...…situation is somewhat striking when considering the success that machine-learning—which frequently forms the basis for anomaly-detection—sees in many other areas of computer science, where it often results in large-scale 1Other styles include specification-based [1] and behavioral detection [2]....
[...]
...In other domains, the very same machine learning tools that form the basis of anomaly detection systems have proven to work with great success, and are regularly used in commercial settings where large quantities of data render manual inspection infeasible....
[...]
4,372 citations
Related Papers (5)
Frequently Asked Questions (9)
Q2. What future works have the authors mentioned in the paper "Outside the closed world: on using machine learning for network intrusion detection" ?
The authors hope for this discussion to contribute to strengthening future research on anomaly detection by pinpointing the fundamental challenges it faces.
Q3. What is the strategy for bringing the data to the experimenter?
mediated trace access can be a viable strategy [64]: rather than bringing the data to the experimenter, bring the experiment to the data, i.e., researchers send their analysis programs to data providers who then run them on their behalf and return the output.
Q4. What is the way to address site-specifics?
For an anomaly detection system, the natural strategy to address site-specifics is having the system “learn” them during training with normal traffic.
Q5. Why are the results of an anomaly detection system harder to predict than for a misuse detector?
Due to the opacity of the detection process, the results of an anomaly detection system are harder to predict than for a misuse detector.
Q6. What is the basic challenge with regard to the semantic gap?
The basic challenge with regard to the semantic gap is understanding how the features the anomaly detection system operates on relate to the semantics of the network environment.
Q7. What is the significant challenge an evaluation faces?
1) Difficulties of Data: Arguably the most significant challenge an evaluation faces is the lack of appropriate public datasets for assessing anomaly detection systems.
Q8. What is the convincing test of any anomaly detection system?
the most convincing real-world test of any anomaly detection system is to solicit feedback from operators who run the system in their network.
Q9. Why has the lack of public data garnered little traction to date?
despite intensive efforts [52], [53], publishing such datasets has garnered little traction to date, mostly one suspects for the fear that information can still leak.