A semantics-aware approach to the automated network protocol identification

doi:10.1109/TNET.2014.2381230

Journal ArticleDOI

A semantics-aware approach to the automated network protocol identification

Xiaochun Yun, +3 more

- 01 Feb 2016 -

IEEE ACM Transactions on Networking

- Vol. 24, Iss: 1, pp 583-595

Chats0

TLDR

Experimental results on BitTorrent, CIFS/SMB, DNS, FTP, PPLIVE, SIP, and SMTP traces show that Securitas has the ability to accurately identify the network traces of the target application protocol with an average recall and an average precision of about 98.4%.

Abstract:

Traffic classification, a mapping of traffic to network applications, is important for a variety of networking and security issues, such as network measurement, network monitoring, as well as the detection of malware activities. In this paper, we propose Securitas, a network trace-based protocol identification system, which exploits the semantic information in protocol message formats. Securitas requires no prior knowledge of protocol specifications. Deeming a protocol as a language between two processes, our approach is based upon the new insight that the n-grams of protocol traces, just like those of natural languages, exhibit highly skewed frequency-rank distribution that can be leveraged in the context of protocol identification. In Securitas, we first extract the statistical protocol message formats by clustering n-grams with the same semantics, and then use the corresponding statistical formats to classify raw network traces. Our tool involves the following key features: 1) applicable to both connection oriented protocols and connection less protocols; 2) suitable for both text and binary protocols; 3) no need to assemble IP packets into TCP or UDP flows; and 4) effective for both long-live flows and short-live flows. We implement Securitas and conduct extensive evaluations on real-world network traces containing both textual and binary protocols. Our experimental results on BitTorrent, CIFS/SMB, DNS, FTP, PPLIVE, SIP, and SMTP traces show that Securitas has the ability to accurately identify the network traces of the target application protocol with an average recall of about 97.4% and an average precision of about 98.4%. Our experimental results prove Securitas is a robust system, and meanwhile displaying a competitive performance in practice.

A semantics-aware approach to the automated network protocol identification

Citations

Detecting Android Malware Leveraging Text Semantics of Network Flows

Byte Segment Neural Network for Network Traffic Classification

$BitCoding$: Network Traffic Classification Through Encoded Bit Level Signatures

Survey of Protocol Reverse Engineering Algorithms: Decomposition of Tools for Static Traffic Analysis

Machine Learning for Wireless Link Quality Estimation: A Survey

References

Latent dirichlet allocation

Latent Dirichlet Allocation

Foundations of Statistical Natural Language Processing

Finding scientific topics

Markov Chain Monte Carlo: Stochastic Simulation for Bayesian Inference

Related Papers (5)

Discoverer: automatic protocol reverse engineering from network traces

ACAS: automated construction of application signatures

Robust network traffic classification

Polyglot: automatic extraction of protocol message format using dynamic binary analysis

Network Traffic Classification Using Correlation Information