Journal ArticleDOI
A semantics-aware approach to the automated network protocol identification
Reads0
Chats0
TLDR
Experimental results on BitTorrent, CIFS/SMB, DNS, FTP, PPLIVE, SIP, and SMTP traces show that Securitas has the ability to accurately identify the network traces of the target application protocol with an average recall and an average precision of about 98.4%.Abstract:
Traffic classification, a mapping of traffic to network applications, is important for a variety of networking and security issues, such as network measurement, network monitoring, as well as the detection of malware activities. In this paper, we propose Securitas, a network trace-based protocol identification system, which exploits the semantic information in protocol message formats. Securitas requires no prior knowledge of protocol specifications. Deeming a protocol as a language between two processes, our approach is based upon the new insight that the n-grams of protocol traces, just like those of natural languages, exhibit highly skewed frequency-rank distribution that can be leveraged in the context of protocol identification. In Securitas, we first extract the statistical protocol message formats by clustering n-grams with the same semantics, and then use the corresponding statistical formats to classify raw network traces. Our tool involves the following key features: 1) applicable to both connection oriented protocols and connection less protocols; 2) suitable for both text and binary protocols; 3) no need to assemble IP packets into TCP or UDP flows; and 4) effective for both long-live flows and short-live flows. We implement Securitas and conduct extensive evaluations on real-world network traces containing both textual and binary protocols. Our experimental results on BitTorrent, CIFS/SMB, DNS, FTP, PPLIVE, SIP, and SMTP traces show that Securitas has the ability to accurately identify the network traces of the target application protocol with an average recall of about 97.4% and an average precision of about 98.4%. Our experimental results prove Securitas is a robust system, and meanwhile displaying a competitive performance in practice.read more
Citations
More filters
Journal ArticleDOI
Detecting Android Malware Leveraging Text Semantics of Network Flows
TL;DR: An effective and automatic malware detection method using the text semantics of network traffic, which considers each HTTP flow generated by mobile apps as a text document, which can be processed by natural language processing to extract text-level features to develop an effective malware detection model.
Proceedings ArticleDOI
Byte Segment Neural Network for Network Traffic Classification
TL;DR: The recurrent neural network is introduced to network traffic classification and a novel neural network, the Byte Segment Neural Network (BSNN), which has superiority over the traditional machine learning-based method and the packet inspection method.
Journal ArticleDOI
$BitCoding$: Network Traffic Classification Through Encoded Bit Level Signatures
TL;DR: BitCoding is described, a bit-level DPI-based signature generation technique that has very good detection performance across different types of protocols (text, binary, and proprietary) making it protocol-type agnostic.
Journal ArticleDOI
Survey of Protocol Reverse Engineering Algorithms: Decomposition of Tools for Static Traffic Analysis
TL;DR: This survey collects tools presented by prior research in the field of protocol reverse engineering by static traffic trace analysis and presents and discusses an explicit process model for static traffic traces analysis to reveal the common structure of the decomposed tools and frameworks.
Journal ArticleDOI
Machine Learning for Wireless Link Quality Estimation: A Survey
TL;DR: This article provides a comprehensive survey on link quality estimators developed from empirical data and then focuses on the subset that use ML algorithms, and focuses on how they address quality requirements that are important from the perspective of the applications they serve.
References
More filters
Journal ArticleDOI
Latent dirichlet allocation
TL;DR: This work proposes a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hofmann's aspect model.
Proceedings Article
Latent Dirichlet Allocation
TL;DR: This paper proposed a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI).
Book
Foundations of Statistical Natural Language Processing
TL;DR: This foundational text is the first comprehensive introduction to statistical natural language processing (NLP) to appear and provides broad but rigorous coverage of mathematical and linguistic foundations, as well as detailed discussion of statistical methods, allowing students and researchers to construct their own implementations.
Journal ArticleDOI
Finding scientific topics
TL;DR: A generative model for documents is described, introduced by Blei, Ng, and Jordan, and a Markov chain Monte Carlo algorithm is presented for inference in this model, which is used to analyze abstracts from PNAS by using Bayesian model selection to establish the number of topics.
Book
Markov Chain Monte Carlo: Stochastic Simulation for Bayesian Inference
TL;DR: Model Adequacy Model Choice: MCMC Over Model and Parameter Spaces Convergence Acceleration Exercises Further topics in MCMC are explained.