scispace - formally typeset
Journal ArticleDOI

A semantics-aware approach to the automated network protocol identification

Reads0
Chats0
TLDR
Experimental results on BitTorrent, CIFS/SMB, DNS, FTP, PPLIVE, SIP, and SMTP traces show that Securitas has the ability to accurately identify the network traces of the target application protocol with an average recall and an average precision of about 98.4%.
Abstract
Traffic classification, a mapping of traffic to network applications, is important for a variety of networking and security issues, such as network measurement, network monitoring, as well as the detection of malware activities. In this paper, we propose Securitas, a network trace-based protocol identification system, which exploits the semantic information in protocol message formats. Securitas requires no prior knowledge of protocol specifications. Deeming a protocol as a language between two processes, our approach is based upon the new insight that the n-grams of protocol traces, just like those of natural languages, exhibit highly skewed frequency-rank distribution that can be leveraged in the context of protocol identification. In Securitas, we first extract the statistical protocol message formats by clustering n-grams with the same semantics, and then use the corresponding statistical formats to classify raw network traces. Our tool involves the following key features: 1) applicable to both connection oriented protocols and connection less protocols; 2) suitable for both text and binary protocols; 3) no need to assemble IP packets into TCP or UDP flows; and 4) effective for both long-live flows and short-live flows. We implement Securitas and conduct extensive evaluations on real-world network traces containing both textual and binary protocols. Our experimental results on BitTorrent, CIFS/SMB, DNS, FTP, PPLIVE, SIP, and SMTP traces show that Securitas has the ability to accurately identify the network traces of the target application protocol with an average recall of about 97.4% and an average precision of about 98.4%. Our experimental results prove Securitas is a robust system, and meanwhile displaying a competitive performance in practice.

read more

Citations
More filters
Journal ArticleDOI

Detecting Android Malware Leveraging Text Semantics of Network Flows

TL;DR: An effective and automatic malware detection method using the text semantics of network traffic, which considers each HTTP flow generated by mobile apps as a text document, which can be processed by natural language processing to extract text-level features to develop an effective malware detection model.
Proceedings ArticleDOI

Byte Segment Neural Network for Network Traffic Classification

TL;DR: The recurrent neural network is introduced to network traffic classification and a novel neural network, the Byte Segment Neural Network (BSNN), which has superiority over the traditional machine learning-based method and the packet inspection method.
Journal ArticleDOI

$BitCoding$: Network Traffic Classification Through Encoded Bit Level Signatures

TL;DR: BitCoding is described, a bit-level DPI-based signature generation technique that has very good detection performance across different types of protocols (text, binary, and proprietary) making it protocol-type agnostic.
Journal ArticleDOI

Survey of Protocol Reverse Engineering Algorithms: Decomposition of Tools for Static Traffic Analysis

TL;DR: This survey collects tools presented by prior research in the field of protocol reverse engineering by static traffic trace analysis and presents and discusses an explicit process model for static traffic traces analysis to reveal the common structure of the decomposed tools and frameworks.
Journal ArticleDOI

Machine Learning for Wireless Link Quality Estimation: A Survey

TL;DR: This article provides a comprehensive survey on link quality estimators developed from empirical data and then focuses on the subset that use ML algorithms, and focuses on how they address quality requirements that are important from the perspective of the applications they serve.
References
More filters
Journal ArticleDOI

Latent dirichlet allocation

TL;DR: This work proposes a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hofmann's aspect model.
Proceedings Article

Latent Dirichlet Allocation

TL;DR: This paper proposed a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI).
Book

Foundations of Statistical Natural Language Processing

TL;DR: This foundational text is the first comprehensive introduction to statistical natural language processing (NLP) to appear and provides broad but rigorous coverage of mathematical and linguistic foundations, as well as detailed discussion of statistical methods, allowing students and researchers to construct their own implementations.
Journal ArticleDOI

Finding scientific topics

TL;DR: A generative model for documents is described, introduced by Blei, Ng, and Jordan, and a Markov chain Monte Carlo algorithm is presented for inference in this model, which is used to analyze abstracts from PNAS by using Bayesian model selection to establish the number of topics.
Book

Markov Chain Monte Carlo: Stochastic Simulation for Bayesian Inference

Dani Gamerman
TL;DR: Model Adequacy Model Choice: MCMC Over Model and Parameter Spaces Convergence Acceleration Exercises Further topics in MCMC are explained.
Related Papers (5)