Showing papers on "Traffic classification published in 2016"

PDF

Open Access

Proceedings Article•DOI•

Characterization of Encrypted and VPN Traffic using Time-related Features

[...]

Gerard Draper-Gil¹, Arash Habibi Lashkari¹, Mohammad Saiful Islam Mamun¹, Ali A. Ghorbani¹•Institutions (1)

19 Feb 2016

TL;DR: This paper studies the effectiveness of flow-based time-related features to detect VPN traffic and to characterize encrypted traffic into different categories, according to the type of traffic e.g., browsing, streaming, etc.

...read moreread less

Abstract: Traffic characterization is one of the major challenges in today’s security industry. The continuous evolution and generation of new applications and services, together with the expansion of encrypted communications makes it a difficult task. Virtual Private Networks (VPNs) are an example of encrypted communication service that is becoming popular, as method for bypassing censorship as well as accessing services that are geographically locked. In this paper, we study the effectiveness of flow-based time-related features to detect VPN traffic and to characterize encrypted traffic into different categories, according to the type of traffic e.g., browsing, streaming, etc. We use two different well-known machine learning techniques (C4.5 and KNN) to test the accuracy of our features. Our results show high accuracy and performance, confirming that time-related features are good classifiers for encrypted traffic characterization.

...read moreread less

562 citations

Journal Article•DOI•

Building an Intrusion Detection System Using a Filter-Based Feature Selection Algorithm

[...]

Mohammed A. Ambusaidi¹, Xiangjian He¹, Priyadarsi Nanda¹, Zhiyuan Tan²•Institutions (2)

University of Technology, Sydney¹, University of Twente²

01 Oct 2016-IEEE Transactions on Computers

TL;DR: The evaluation results show that the feature selection algorithm contributes more critical features for LSSVM-IDS to achieve better accuracy and lower computational cost compared with the state-of-the-art methods.

...read moreread less

Abstract: Redundant and irrelevant features in data have caused a long-term problem in network traffic classification. These features not only slow down the process of classification but also prevent a classifier from making accurate decisions, especially when coping with big data. In this paper, we propose a mutual information based algorithm that analytically selects the optimal feature for classification. This mutual information based feature selection algorithm can handle linearly and nonlinearly dependent data features. Its effectiveness is evaluated in the cases of network intrusion detection. An Intrusion Detection System (IDS), named Least Square Support Vector Machine based IDS (LSSVM-IDS), is built using the features selected by our proposed feature selection algorithm. The performance of LSSVM-IDS is evaluated using three intrusion detection evaluation datasets, namely KDD Cup 99, NSL-KDD and Kyoto 2006+ dataset. The evaluation results show that our feature selection algorithm contributes more critical features for LSSVM-IDS to achieve better accuracy and lower computational cost compared with the state-of-the-art methods.

...read moreread less

406 citations

Journal Article•DOI•

Traffic Engineering in Software-Defined Networking: Measurement and Management

[...]

Zhaogang Shu¹, Jiafu Wan², Jiaxiang Lin¹, Shiyong Wang², Di Li², Seungmin Rho, Changcai Yang¹ - Show less +3 more•Institutions (2)

Fujian Agriculture and Forestry University¹, South China University of Technology²

21 Jun 2016-IEEE Access

TL;DR: A reference framework for TE in the SDN is proposed, which consists of two parts, traffic measurement and traffic management; technologies related to traffic management include traffic load balancing, QoS-guarantee scheduling, energy-saving scheduling, and trafficmanagement for the hybrid IP/SDN.

...read moreread less

Abstract: As the next generation network architecture, software-defined networking (SDN) has exciting application prospects. Its core idea is to separate the forwarding layer and control layer of network system, where network operators can program packet forwarding behavior to significantly improve the innovation capability of network applications. Traffic engineering (TE) is an important network application, which studies measurement and management of network traffic, and designs reasonable routing mechanisms to guide network traffic to improve utilization of network resources, and better meet requirements of the network quality of service (QoS). Compared with the traditional networks, the SDN has many advantages to support TE due to its distinguish characteristics, such as isolation of control and forwarding, global centralized control, and programmability of network behavior. This paper focuses on the traffic engineering technology based on the SDN. First, we propose a reference framework for TE in the SDN, which consists of two parts, traffic measurement and traffic management. Traffic measurement is responsible for monitoring and analyzing real-time network traffic, as a prerequisite for traffic management. In the proposed framework, technologies related to traffic measurement include network parameters measurement, a general measurement framework, and traffic analysis and prediction; technologies related to traffic management include traffic load balancing, QoS-guarantee scheduling, energy-saving scheduling, and traffic management for the hybrid IP/SDN. Current existing technologies are discussed in detail, and our insights into future development of TE in the SDN are offered.

...read moreread less

149 citations

Proceedings Article•DOI•

Machine Learning in Software Defined Networks: Data collection and traffic classification

[...]

Pedro Amaral¹, Joao Dinis¹, Paulo Pinto¹, Luis Bernardo¹, João Manuel R. S. Tavares, Henrique São Mamede² - Show less +2 more•Institutions (2)

Universidade Nova de Lisboa¹, Universidade Aberta²

01 Nov 2016

TL;DR: This work describes a simple architecture deployed in an enterprise network that gathers traffic data using the OpenFlow protocol and presents the data-sets that can be obtained and shows how several ML techniques can be applied to it for traffic classification.

...read moreread less

Abstract: Software Defined Networks (SDNs) provides a separation between the control plane and the forwarding plane of networks. The software implementation of the control plane and the built in data collection mechanisms of the OpenFlow protocol promise to be excellent tools to implement Machine Learning (ML) network control applications. A first step in that direction is to understand the type of data that can be collected in SDNs and how information can be learned from that data. In this work we describe a simple architecture deployed in an enterprise network that gathers traffic data using the OpenFlow protocol. We present the data-sets that can be obtained and show how several ML techniques can be applied to it for traffic classification. The results indicate that high accuracy classification can be obtained with the data-sets using supervised learning.

...read moreread less

136 citations

Proceedings Article•DOI•

A Framework for QoS-aware Traffic Classification Using Semi-supervised Machine Learning in SDNs

[...]

Pu Wang¹, Shih-Chun Lin², Min Luo³•Institutions (3)

Wichita State University¹, Georgia Institute of Technology², Huawei³

01 Jun 2016

TL;DR: The proposed framework jointly exploits deep packet inspection (DPI) and semi-supervised machine learning so that accurate traffic classification can be realized, while requiring minimal communications between the network controller and the SDN switches.

...read moreread less

Abstract: In this paper, a QoS-aware traffic classification framework for software defined networks is proposed. Instead of identifying specific applications in most of the previous work of traffic classification, our approach classifies the network traffic into different classes according to the QoS requirements, which provide the crucial information to enable the fine-grained and QoS-aware traffic engineering. The proposed framework is fully located in the network controller so that the real-time, adaptive, and accurate traffic classification can be realized by exploiting the superior computation capacity, the global visibility, andthe inherent programmability of the network controller. More specifically, the proposed framework jointly exploits deep packet inspection (DPI) and semi-supervised machine learning so that accurate traffic classification can be realized, while requiring minimal communications between the network controller and the SDN switches. Based on the real Internet data set, the simulation results show the proposed classification framework can provide good performance in terms of classification accuracy and communication costs

...read moreread less

116 citations

Proceedings Article•DOI•

Network Traffic Classification techniques and comparative analysis using Machine Learning algorithms

[...]

Muhammad Shafiq¹, Xiangzhan Yu¹, Asif Ali Laghari¹, Lu Yao¹, Nabin Kumar Karn¹, Foudil Abdessamia¹ - Show less +2 more•Institutions (1)

Harbin Institute of Technology¹

01 Oct 2016

TL;DR: This paper discusses network traffic classification techniques step by step and real time internet data set is develop using network traffic capture tool, after that feature extraction tool is use to extract features from the capture traffic and four machine learning classifiers Support Vector Machine, C4.5 decision tree, Naïve Bays and Bayes Net classifiers are applied.

...read moreread less

Abstract: Network Traffic Classification is a central topic nowadays in the field of computer science. It is a very essential task for internet service providers (ISPs) to know which types of network applications flow in a network. Network Traffic Classification is the first step to analyze and identify different types of applications flowing in a network. Through this technique, internet service providers or network operators can manage the overall performance of a network. There are many methods traditional technique to classify internet traffic like Port Based, Pay Load Based and Machine Learning Based technique. The most common technique used these days is Machine Learning (ML) technique. Which is used by many researchers and got very effective accuracy results. In this paper, we discuss network traffic classification techniques step by step and real time internet data set is develop using network traffic capture tool, after that feature extraction tool is use to extract features from the capture traffic and then four machine learning classifiers Support Vector Machine, C4.5 decision tree, Naive Bays and Bayes Net classifiers are applied. Experimental analysis shows that C4.5 classifiers gives very good accuracy result as compare to other classifies.

...read moreread less

108 citations

Proceedings Article•DOI•

ATLANTIC: A framework for anomaly traffic detection, classification, and mitigation in SDN

[...]

Anderson Santos da Silva¹, Juliano Araujo Wickboldt¹, Lisandro Zambenedetti Granville¹, Alberto Schaeffer-Filho¹•Institutions (1)

Universidade Federal do Rio Grande do Sul¹

25 Apr 2016

TL;DR: It is argued that Software-Defined Networking (SDN) form propitious environments for the design and implementation of more robust and extensible anomaly classification schemes.

...read moreread less

Abstract: Anomaly traffic detection and classification mechanisms need to be flexible and easy to manage in order to detect the ever growing spectrum of anomalies. Detection and classification are difficult tasks because of several reasons, including the need to obtain an accurate and comprehensive view of the network, the ability to detect the occurrence of new attack types, and the need to deal with misclassification. In this paper, we argue that Software-Defined Networking (SDN) form propitious environments for the design and implementation of more robust and extensible anomaly classification schemes. Different than other approaches from the literature, which individually tackle either anomaly detection or classification or mitigation, we present a management framework to perform these tasks jointly. Our proposed framework is called ATLANTIC and it combines the use of information theory to calculate deviations in the entropy of flow tables and a range of machine learning algorithms to classify traffic flows. As a result, ATLANTIC is a flexible framework capable of categorizing traffic anomalies and using the information collected to handle each traffic profile in a specific manner, e.g., blocking malicious flows.

...read moreread less

101 citations

Proceedings Article•DOI•

Is Anybody Home? Inferring Activity From Smart Home Network Traffic

[...]

Bogdan Copos¹, Karl Levitt¹, Matt Bishop¹, Jeff Rowe¹•Institutions (1)

University of California, Davis¹

22 May 2016

TL;DR: This work studies two of the most popular smart home devices, the Nest Thermostat and the wired Nest Protect, and shows that traffic analysis can be used to learn potentially sensitive information about the state of a smart home.

...read moreread less

Abstract: As smart home devices are introduced into our homes, security and privacy concerns are being raised. Smart home devices collect, exchange, and transmit various data about the environment of our homes. This data can not only be used to characterize a physical property but also to infer personal information about the inhabitants. One potential attack vector for smart home devices is the use of traffic classification as a source for covert channel attacks. Specifically, we are concerned with the use of traffic classification techniques for inferring events taking place within a building. In this work, we study two of the most popular smart home devices, the Nest Thermostat and the wired Nest Protect (i.e. smoke and carbon dioxide detector) and show that traffic analysis can be used to learn potentially sensitive information about the state of a smart home. Among other observations, we show that we can determine, with 88% and 67% accuracy respectively, when the thermostat transitions between the Home and Auto Away mode and vice versa, based only on network traffic originating from the device. This information may be used, for example, by an attacker to infer whether the home is occupied.

...read moreread less

99 citations

Journal Article•DOI•

A semantics-aware approach to the automated network protocol identification

[...]

Xiaochun Yun¹, Yipeng Wang¹, Yongzheng Zhang¹, Yu Zhou¹•Institutions (1)

Chinese Academy of Sciences¹

01 Feb 2016-IEEE ACM Transactions on Networking

TL;DR: Experimental results on BitTorrent, CIFS/SMB, DNS, FTP, PPLIVE, SIP, and SMTP traces show that Securitas has the ability to accurately identify the network traces of the target application protocol with an average recall and an average precision of about 98.4%.

...read moreread less

Abstract: Traffic classification, a mapping of traffic to network applications, is important for a variety of networking and security issues, such as network measurement, network monitoring, as well as the detection of malware activities. In this paper, we propose Securitas, a network trace-based protocol identification system, which exploits the semantic information in protocol message formats. Securitas requires no prior knowledge of protocol specifications. Deeming a protocol as a language between two processes, our approach is based upon the new insight that the n-grams of protocol traces, just like those of natural languages, exhibit highly skewed frequency-rank distribution that can be leveraged in the context of protocol identification. In Securitas, we first extract the statistical protocol message formats by clustering n-grams with the same semantics, and then use the corresponding statistical formats to classify raw network traces. Our tool involves the following key features: 1) applicable to both connection oriented protocols and connection less protocols; 2) suitable for both text and binary protocols; 3) no need to assemble IP packets into TCP or UDP flows; and 4) effective for both long-live flows and short-live flows. We implement Securitas and conduct extensive evaluations on real-world network traces containing both textual and binary protocols. Our experimental results on BitTorrent, CIFS/SMB, DNS, FTP, PPLIVE, SIP, and SMTP traces show that Securitas has the ability to accurately identify the network traces of the target application protocol with an average recall of about 97.4% and an average precision of about 98.4%. Our experimental results prove Securitas is a robust system, and meanwhile displaying a competitive performance in practice.

...read moreread less

78 citations

Journal Article•DOI•

Benchmarking the Effect of Flow Exporters and Protocol Filters on Botnet Traffic Classification

[...]

Fariba Haddadi¹, A. Nur Zincir-Heywood¹•Institutions (1)

Dalhousie University¹

01 Dec 2016-IEEE Systems Journal

TL;DR: A study on the effect of (if any) the feature sets of network traffic flow exporters on the performance of botnet traffic classification indicates that the use of a flow exporter and a protocol filter indeed has an effect on theperformance of botnets.

...read moreread less

Abstract: Botnets represent one of the most aggressive threats against cyber security. Different techniques using different feature sets have been proposed for botnet traffic analysis and classification. However, no work has been performed to study the effect of such differences. In this paper, we perform a study on the effect of (if any) the feature sets of network traffic flow exporters. To this end, we explore five different traffic flow exporters (each with a different set of flow features) using two different protocol filters [Hypertext Transfer Protocol (HTTP) and Domain Name System (DNS)] and five different classifiers. We evaluate all these on eight different botnet traffic data sets. Our results indicate that the use of a flow exporter and a protocol filter indeed has an effect on the performance of botnet traffic classification. Experimental results show that the best performance is achieved using Tranalyzer flow exporter and HTTP filter with the C4.5 classifier.

...read moreread less

71 citations

Is Anybody Home? Inferring Activity From Smart Home Network Traffic - eScholarship

[...]

Bogdan Copos, Karl Levitt, Matt Bishop, Jeff Rowe

01 May 2016

TL;DR: Copos et al. as discussed by the authors investigated how device-to-device and deviceto-cloud smart home network traffic can be used to infer personal information about the state of a smart home and showed that with 88% and 67% accuracy respectively, when the thermostat transitions between Home and Auto Away mode and vice versa, based only on network traffic originating from the device.

...read moreread less

Abstract: Is Anybody Home? Inferring Activity From Smart Home Network Traffic Bogdan Copos ∗ , Karl Levitt † , Matt Bishop ‡ , Jeff Rowe § Department of Computer Science University of California, Davis Email: ∗ bcopos@ucdavis.edu, † levitt@cs.ucdavis.edu, ‡ mabishop@ucdavis.edu, § rowe@cs.cdavis.edu, Abstract—As smart home devices are introduced into our homes, security and privacy concerns are being raised. Smart home devices collect, exchange, and transmit various data about the environment of our homes. This data can not only be used to characterize a physical property but also to infer personal information about the inhabitants. One potential attack vector for smart home devices is the use of traffic classification as a source for covert channel attacks. Specifically, we are concerned with the use of traffic classification techniques for inferring events taking place within a building. In this work, we study two of the most popular smart home devices, the Nest Thermostat and the wired Nest Protect (i.e. smoke and carbon dioxide detector) and show that traffic analysis can be used to learn potentially sensitive information about the state of a smart home. Among other observations, we show that we can determine, with 88% and 67% accuracy respectively, when the thermostat transitions between the Home and Auto Away mode and vice versa, based only on network traffic originating from the device. This information may be used, for example, by an attacker to infer whether the home is occupied. I. I NTRODUCTION Smart home devices are becoming increasingly popular in households around the world. Nest Labs, one of the most popular manufacturers of smart thermostats and smoke detectors, is believed to have sold 440,000 smoke detector units over the span of four months in 2014 alone. Smart home devices are designed to help homeowners automate and simplify mundane tasks around their property. However, bringing internet connectivity to household devices has also introduced many security and privacy concerns. At the end of 2015, security researchers discovered a vulnerability in Barbie dolls which would allow attackers to not only steal personal information but also convert a doll into a spying device capable of listening into conversations [6]. In early 2016, security research from Rapid7 found vulnerabilities in Comcast’s Xfinity Home Security system that would cause the system to not report when a property’s windows and/or doors were compromised [19]. In this paper, we investigate how device-to-device and device-to-cloud smart home network traffic can be used to infer personal information. Specifically, we use traffic analysis techniques on network traffic generated by devices from Nest Labs to learn information about the presence of residents and other events occurring within the property. Traffic analysis is the process of intercepting and analyzing network packets in order to deduce information from patterns in communication. The experiments involve two smart home devices, a smart thermostat and a smart smoke and carbon dioxide detector. The rest of the paper is organized as follows: • Section II describes relevant previous work. • Section III gives a detailed rundown of the devices used in this study and their features and capabilities. • In Section IV the data collection process is described. • In Section V, the methodology behind the traffic classi- fication is explained. • Section VI reports the findings of our analysis. • Section VII describes how the findings were tested for validity and presents information about the accuracy of our findings. • Section VIII discusses limitations of our approach. • In section IX we provide some initial ideas for solutions and list possible future work. II. P REVIOUS W ORK Traffic analysis attacks were highlighted in “Attacks of the SSL 3.0 protocol” [16], by Wagner and Schneier who showed the URL of an HTTP GET request is leaked in SSL because cipher-texts fail to disguise the plaintext length. Later, Cheng and Avnur [3] show that websites can be fingerprinted by performing traffic analysis of SSL encrypted web browsing traffic. Ever since, there have been a number of works [2], [7], [8], [10], [13], [15] exploring traffic analysis attacks using various features including source and destination attributes (e.g. address, port), protocol, packet and connection sizes, and even timing information (e.g. duration of connec- tions, burstiness of transmissions). Efforts have also been put into developing countermeasures for such attacks [5], [11], [18]. Countermeasure techniques include traffic padding and traffic masking. Another variation is in the implementation, whether server side, client side, or both. Recently, in “Peek-a-Boo, I Still See You: Why Efficient Traffic Analysis Countermeasures Fail” [4], Dyer, Coull et. al. provide the first comprehensive analysis of some of the proposed traffic analysis countermeasures and show why they fail to protect against attacks. The authors argue that there is no efficient solution.

...read moreread less

Journal Article•DOI•

On Internet Traffic Classification

[...]

Taimur Bakhshi¹, Bogdan Ghita¹•Institutions (1)

University of Plymouth¹

01 Jun 2016-Journal of Computer Networks and Communications

TL;DR: The computational performance and accuracy of the proposed methodology in comparison with similar machine learning techniques lead us to recommend its extension to other applications in achieving highly granular real-time traffic classification.

...read moreread less

Abstract: Traffic classification utilizing flow measurement enables operators to perform essential network management. Flow accounting methods such as NetFlow are, however, considered inadequate for classification requiring additional packet-level information, host behaviour analysis, and specialized hardware limiting their practical adoption. This paper aims to overcome these challenges by proposing two-phased machine learning classification mechanism with NetFlow as input. The individual flow classes are derived per application through k-means and are further used to train a C5.0 decision tree classifier. As part of validation, the initial unsupervised phase used flow records of fifteen popular Internet applications that were collected and independently subjected to k-means clustering to determine unique flow classes generated per application. The derived flow classes were afterwards used to train and test a supervised C5.0 based decision tree. The resulting classifier reported an average accuracy of 92.37% on approximately 3.4 million test cases increasing to 96.67% with adaptive boosting. The classifier specificity factor which accounted for differentiating content specific from supplementary flows ranged between 98.37% and 99.57%. Furthermore, the computational performance and accuracy of the proposed methodology in comparison with similar machine learning techniques lead us to recommend its extension to other applications in achieving highly granular real-time traffic classification.

...read moreread less

Journal Article•DOI•

Traffic matrix prediction and estimation based on deep learning in large-scale IP backbone networks

[...]

Laisen Nie¹, Dingde Jiang¹, Lei Guo¹, Shui Yu²•Institutions (2)

Northeastern University (China)¹, Deakin University²

01 Dec 2016-Journal of Network and Computer Applications

TL;DR: This paper uses a deep learning architecture to explore the dynamic properties of network traffic, and proposes a novel network traffic prediction approach based on a deep belief network and a network traffic estimation method utilizing theDeep belief network via link counts and routing information.

...read moreread less

Proceedings Article•DOI•

Traffic Matrix Prediction and Estimation Based on Deep Learning for Data Center Networks

[...]

Laisen Nie¹, Dingde Jiang¹, Lei Guo¹, Shui Yu², Houbing Song³ - Show less +1 more•Institutions (3)

Northeastern University (China)¹, Deakin University², West Virginia University³

01 Dec 2016

TL;DR: This work uses a deep architecture to explore the time-varying property of network traffic in a data center network, and proposes a novel network traffic prediction approach based on a deep belief network and a logistic regression model.

...read moreread less

Abstract: Network traffic analysis is a crucial technique for systematically operating a data center network. Many network management functions rely on exact network traffic information. Although a great number of works to obtain network traffic have been carried out in traditional ISP networks, they cannot be employed effectively in data center networks. Motivated by that, we focus on the problem of network traffic prediction and estimation in data center networks. We involve deep learning techniques in the network traffic prediction and estimation fields, and propose two deep architectures for network traffic prediction and estimation, respectively. We first use a deep architecture to explore the time-varying property of network traffic in a data center network, and then propose a novel network traffic prediction approach based on a deep belief network and a logistic regression model. Meanwhile, to deal with the highly ill-pose property of network traffic estimation, we further propose a network traffic estimation method using the deep belief network trained by link counts. We validate the effectiveness of our methodologies by real traffic data.

...read moreread less

Proceedings Article•DOI•

Certificate-aware encrypted traffic classification using Second-Order Markov Chain

[...]

Meng Shen¹, Mingwei Wei¹, Liehuang Zhu¹, Mingzhong Wang², Fuliang Li³ - Show less +1 more•Institutions (3)

Beijing Institute of Technology¹, University of the Sunshine Coast², Northeastern University (China)³

20 Jun 2016

TL;DR: This paper develops a new model by incorporating the certificate packet length clustering into the Second-Order homogeneous Markov chains, and shows that the proposed method lead to a 30% improvement on average compared with the state-of-the-art method, in terms of classification accuracy.

...read moreread less

Abstract: With the prosperity of network applications, traffic classification serves as a crucial role in network management and malicious attack detection. The widely used encryption transmission protocols, such as the Secure Socket Layer/Transport Layer Security (SSL/TLS) protocols, leads to the failure of traditional payload-based classification methods. Existing methods for encrypted traffic classification suffer from low accuracy. In this paper, we propose a certificate-aware encrypted traffic classification method based on the Second-Order Markov Chain. We start by exploring reasons why existing methods not perform well, and make a novel observation that certificate packet length in SSL/TLS sessions contributes to application discrimination. To increase the diversity of application fingerprints, we develop a new model by incorporating the certificate packet length clustering into the Second-Order homogeneous Markov chains. Extensive evaluation results show that the proposed method lead to a 30% improvement on average compared with the state-of-the-art method, in terms of classification accuracy.

...read moreread less

Proceedings Article•DOI•

Data Traffic Model in Machine to Machine Communications over 5G Network Slicing

[...]

Mohammed Dighriri¹, Ali Saeed Dayem Alfoudi¹, Gyu Myoung Lee², Thar Baker¹•Institutions (2)

Liverpool John Moores University¹, Electronics and Telecommunications Research Institute²

01 Aug 2016

TL;DR: A novel data traffic aggregation model and algorithm along with a new 5G network slicing based on classification and measuring the data traffic to satisfy Quality of Service for smart systems in a smart city environment is proposed.

...read moreread less

Abstract: The recent advancements in cellular communication domain have resulted in the emergence of Machine-to-Machine applications, in support of the wide range and coverage provision, low costs, and high mobility. 5G network standards represent a promising technology to support the future of Machine-to-Machine data traffic. In recent years, Human-Type-Communication traffic has seen exponential growth over cellular networks, which resulted in increasing the capacity and higher data rates. These networks are expected to face challenges such as explosion of the data traffic due to the future of smart devices data traffic with various Quality of Service requirements. This paper proposes a novel data traffic aggregation model and algorithm along with a new 5G network slicing based on classification and measuring the data traffic to satisfy Quality of Service for smart systems in a smart city environment. In our proposal, 5G radio resources are efficiently utilized as the smallest unit of a physical resource block in a relay node by aggregating the data traffic of several Machine-to-Machine devices as separate slices based on Quality of Service for each application. OPNET is used to assess the performance of the proposed model. The simulated 5G data traffic classes include file transfer protocol, voice over IP, and video users.

...read moreread less

Proceedings Article•DOI•

Automatic Mobile Application Traffic Identification by Convolutional Neural Networks

[...]

Zhengyang Chen, Bowen Yu¹, Yu Zhang, Jianzhong Zhang¹, Jingdong Xu - Show less +1 more•Institutions (1)

Nankai University¹

01 Aug 2016

TL;DR: A novel approach is proposed, which can identify mobile application by automatically extracting abstract features from labeled packets by mainly based on convolutional neural networks (CNNs), which can extract the abstract statistical features between characters in HTTP and thus improve the identification accuracy.

...read moreread less

Abstract: Mobile network security and management are becoming important issues, due to the rapid development and widespread of the mobile network. Application traffic identification is a critical technology to resolve these issues. A variety of traffic classification methods on desktop applications are no longer effective in mobile network, because the majority of mobile traffic is carried over HTTP without distinctive features. Existing approaches to identify mobile traffic simply extract obvious features like fixed strings or regular expressions, which are not effective to capture hidden structure within the HTTP headers. In this paper, we propose a novel approach, which can identify mobile application by automatically extracting abstract features from labeled packets. Our approach is mainly based on convolutional neural networks (CNNs). The CNNs can extract the abstract statistical features between characters in HTTP and thus improve the identification accuracy. It's also able to reduce the dependence on prior knowledge and human effort in designing features. To verify the effectiveness of our method, we apply it to several identification tasks. The evaluation shows that our method can accurately identify the traffic of the target mobile application.

...read moreread less

Proceedings Article•DOI•

An improved network traffic classification algorithm based on Hadoop decision tree

[...]

Zhengwu Yuan¹, Chaozheng Wang¹•Institutions (1)

Chongqing University of Posts and Telecommunications¹

28 May 2016

TL;DR: Experiments show that the improved HAC4.5 decision tree algorithm not only improves the running speed, but also improves the accuracy of the calculation.

...read moreread less

Abstract: In the current age of the Internet, network traffic increased exponentially, either based on user demand for network resources, QoS scheduling, or according to the development trend of network applications for expansion transformation of the existing network, various applications in network traffic need to be classified and identified accurately, network traffic classification is particularly important. C4.5 decision tree algorithm as a commonly used supervised classification algorithm is often applied in traffic classification, but with the increase of data volume, the efficiency of C4.5 algorithm has been reduced. Hadoop platform as open source cloud framework, in dealing with big data has a high performance, so in many cases as the preferred handle large data. On the basis of the original C4.5 algorithm, the improved algorithm is simplified, and the algorithm is parallel to the Hadoop platform, I call it HAC4.5 decision tree algorithm. Experiments show that the improved HAC4.5 decision tree algorithm not only improves the running speed, but also improves the accuracy of the calculation.

...read moreread less

Proceedings Article•DOI•

Towards web service classification using addresses and DNS

[...]

Martino Trevisan¹, Idilio Drago¹, Marco Mellia¹, Maurizio Matteo Munafo¹•Institutions (1)

Polytechnic University of Turin¹

01 Sep 2016

TL;DR: This paper studies to what extent basic information visible at flow-level measurements is useful for traffic classification on the web, and quantifies how often the same server IP address is used by different services, and how services use hostnames by analyzing a large dataset of flow measurements.

...read moreread less

Abstract: The identification of the services that generate traffic is crucial for ISPs and companies to plan and monitor the network. The widespread deployment of encryption and the convergence of the web services towards HTTP/HTTPS challenge traditional classification techniques. Algorithms to classify traffic are left with little information, such as server IP addresses, flow characteristics and queries performed at the DNS. Moreover, due to the usage of Content Delivery Networks and cloud infrastructure, it is unclear whether such coarse metadata is sufficient to differentiate the traffic. This paper studies to what extent basic information visible at flow-level measurements is useful for traffic classification on the web. By analyzing a large dataset of flow measurements, we quantify how often the same server IP address is used by different services, and how services use hostnames. Our results show that a very simple classifier that relies only on server IP addresses and on lists of hostnames can distinguish up to 55% of the traffic volume. Yet, collisions of names and addresses are common among popular services, calling for more ingenuity. This paper is a preliminary step in the evaluation of classification algorithms that are suitable for the modern Internet, where only minimal metadata collection will be possible in the network.

...read moreread less

Proceedings Article•DOI•

Real-time traffic classification with Twitter data mining

[...]

Dwi Aji Kurniawan¹, Sunu Wibirama¹, Noor Akhmad Setiawan¹•Institutions (1)

Gadjah Mada University¹

01 Oct 2016

TL;DR: It is implied that social network service may be used as an alternative source for traffic anomalies detection by providing information of traffic flow condition in real-time by harnessing the power of social network data, Twitter.

...read moreread less

Abstract: The growth of vehicles in Yogyakarta Province, Indonesia is not proportional to the growth of roads. This problem causes severe traffic jam in many main roads. Common traffic anomalies detection using surveillance camera requires manpower and costly, while traffic anomalies detection with crowdsourcing mobile applications are mostly owned by private. This research aims to develop a real-time traffic classification by harnessing the power of social network data, Twitter. In this study, Twitter data are processed to the stages of preprocessing, feature extraction, and tweet classification. This study compares classification performance of three machine learning algorithms, namely Naive Bayes (NB), Support Vector Machine (SVM), and Decision Tree (DT). Experimental results show that SVM algorithm produced the best performance among the other algorithms with 99.77% and 99.87% of classification accuracy in balanced and imbalanced data, respectively. This research implies that social network service may be used as an alternative source for traffic anomalies detection by providing information of traffic flow condition in real-time.

...read moreread less

Journal Article•DOI•

DBStream: A holistic approach to large-scale network traffic monitoring and analysis

[...]

Arian Baer, Pedro Casas¹, Alessandro D'Alconzo¹, Pierdomenico Fiadino, Lukasz Golab², Marco Mellia³, Erich Schikuta⁴ - Show less +3 more•Institutions (4)

Austrian Institute of Technology¹, University of Waterloo², Polytechnic University of Turin³, University of Vienna⁴

09 Oct 2016-Computer Networks

TL;DR: DBStream is presented, a holistic approach to large-scale network monitoring and analysis applications and its Continuous Execution Language (CEL) can be used to automate several data processing and analysis tasks typical for monitoring operational ISP networks.

...read moreread less

Journal Article•DOI•

A streaming flow-based technique for traffic classification applied to 12 + 1 years of Internet traffic

[...]

Valentín Carela-Español, Pere Barlet-Ros, Albert Bifet¹, Kensuke Fukuda²•Institutions (2)

Huawei¹, National Institute of Informatics²

01 Oct 2016-Telecommunication Systems

TL;DR: A streaming flow-based classification solution based on Hoeffding Adaptive Tree, a machine learning technique specifically designed for evolving data streams that can sustain a very high accuracy over the years, with significantly less cost and complexity than existing alternatives based on static learning algorithms.

...read moreread less

Abstract: The continuous evolution of Internet traffic and its applications makes the classification of network traffic a topic far from being completely solved. An essential problem in this field is that most of proposed techniques in the literature are based on a static view of the network traffic (i.e., they build a model or a set of patterns from a static, invariable dataset). However, very little work has addressed the practical limitations that arise when facing a more realistic scenario with an infinite, continuously evolving stream of network traffic flows. In this paper, we propose a streaming flow-based classification solution based on Hoeffding Adaptive Tree, a machine learning technique specifically designed for evolving data streams. The main novelty of our proposal is that it is able to automatically adapt to the continuous evolution of the network traffic without storing any traffic data. We apply our solution to a 12 + 1 year-long dataset from a transit link in Japan, and show that it can sustain a very high accuracy over the years, with significantly less cost and complexity than existing alternatives based on static learning algorithms, such as C4.5.

...read moreread less

Proceedings Article•DOI•

vTC: Machine Learning Based Traffic Classification as a Virtual Network Function

[...]

Lu He¹, Chen Xu¹, Yan Luo¹•Institutions (1)

University of Massachusetts Lowell¹

11 Mar 2016

TL;DR: In this paper, a design of virtual network functions is proposed to flexibly select and apply the best suitable machine learning classifiers at run time. And the experimental results show that the proposed NFV for flow classification can improve the accuracy of classification by up to 13%.

...read moreread less

Abstract: Network flow classification is fundamental to network management and network security. However, it is challenging to classify network flows at very high line rates while simultaneously preserving user privacy. Machine learning based classification techniques utilize only meta-information of a flow and have been shown to be effective in identifying network flows. We analyze a group of widely used machine learning classifiers, and observe that the effectiveness of different classification models depends highly upon the protocol types as well as the flow features collected from network data.We propose vTC, a design of virtual network functions to flexibly select and apply the best suitable machine learning classifiers at run time. The experimental results show that the proposed NFV for flow classification can improve the accuracy of classification by up to 13%.

...read moreread less

Journal Article•DOI•

High performance traffic classification based on message size sequence and distribution

[...]

Chun Nan Lu¹, Chun-Ying Huang¹, Ying-Dar Lin¹, Yuan-Cheng Lai²•Institutions (2)

National Chiao Tung University¹, National Taiwan University of Science and Technology²

01 Dec 2016-Journal of Network and Computer Applications

TL;DR: Two statistics-based solutions are proposed, the message size distribution classifier (MSDC) and themessage size sequence classifiers (MSSC) depending on classification accuracy and real timeliness, which aims to identify network flows in an accurate manner and provide a lightweight and real-time solution.

...read moreread less

Journal Article•DOI•

Differentiated forwarding and caching in named-data networking

[...]

Yusung Kim¹, Young-Hoon Kim¹, Jun Bi², Ikjun Yeom¹•Institutions (2)

Sungkyunkwan University¹, Tsinghua University²

01 Jan 2016-Journal of Network and Computer Applications

TL;DR: The proposed diff Serv model is designed to follow the guidelines from the diffserv model in the current Internet, and is considered to take the advantages of NDN unique features such as interest aggregation and in-network caching.

...read moreread less

Journal Article•DOI•

A Big Network Traffic Data Fusion Approach Based on Fisher and Deep Auto-Encoder

[...]

Tao Xiaoling, Kong Deyan¹, Wei Yi¹, Wang Yong•Institutions (1)

Guilin University of Electronic Technology¹

23 Mar 2016-Information-an International Interdisciplinary Journal

TL;DR: The experimental results show that the DFA-F-DAE improves the generalization ability of the three classification algorithms (J48, back propagation neural network (BPNN), and support vector machine (SVM) by data dimensionality reduction.

...read moreread less

Abstract: Data fusion is usually performed prior to classification in order to reduce the input space. These dimensionality reduction techniques help to decline the complexity of the classification model and thus improve the classification performance. The traditional supervised methods demand labeled samples, and the current network traffic data mostly is not labeled. Thereby, better learners will be built by using both labeled and unlabeled data, than using each one alone. In this paper, a novel network traffic data fusion approach based on Fisher and deep auto-encoder (DFA-F-DAE) is proposed to reduce the data dimensions and the complexity of computation. The experimental results show that the DFA-F-DAE improves the generalization ability of the three classification algorithms (J48, back propagation neural network (BPNN), and support vector machine (SVM)) by data dimensionality reduction. We found that the DFA-F-DAE remarkably improves the efficiency of big network traffic classification.

...read moreread less

Proceedings Article•DOI•

WeChat Text and Picture Messages Service Flow Traffic Classification Using Machine Learning Technique

[...]

Muhammad Shafiq¹, Xiangzhan Yu², Asif Ali Laghari², Lu Yao², Nabin Kumar Karn², Foudil Abdesssamia, Salahuddin - Show less +3 more•Institutions (2)

Aalto University¹, Harbin Institute of Technology²

01 Dec 2016

TL;DR: Experimental result analysis show that using HIT data set all the applied machine learning classifiers classify WeChat text and picture messages traffic very accurately as compared to Dorm13 dataset.

...read moreread less

Abstract: Network Traffic Classification carries great importance for both internet service providers (ISPs) and quality of services (QoSs) management. During the last two decades, a lot of machine learning models have been proposed and applied on different types of real time applications to classify their real time traffic and obtain very proficient accuracy results. However, no research has been done on WeChat text and picture messages traffic classification. In this paper, WeChat text and picture messages traffics are classified using two different types of datasets and 4 well-known machine learning algorithms. These two datasets, Harbin Institute of Technology (HIT) and Dorm13, are collected from two different network environments. Having captured the traffic 50 features, they are extracted respectively. Thereafter, well-known four machine learning algorithms C4.5 decision tree, Bayes Net, Naive Bayes and SVM are used to classify WeChat text and picture messages traffic. Experimental result analysis show that using HIT data set all the applied machine learning classifiers classify WeChat text and picture messages traffic very accurately as compared to Dorm13 dataset. Using HIT dataset, all ML classifier perform very well, but C4.5 and SVM are the ones that give very effective accuracy results of 99.91% and 99.57% respectively as compared to other ML classifiers.

...read moreread less

Journal Article•DOI•

A Feature Selection Method for Large-Scale Network Traffic Classification Based on Spark

[...]

Yong Wang¹, Wenlong Ke, Xiaoling Tao¹•Institutions (1)

Guilin University of Electronic Technology¹

15 Feb 2016-Information-an International Interdisciplinary Journal

TL;DR: This paper proposes an efficient feature selection method for network traffic based on a new parallel computing framework called Spark that reduces the time cost of modeling and classification, and improves the execution efficiency of feature selection significantly.

...read moreread less

Abstract: Currently, with the rapid increasing of data scales in network traffic classifications, how to select traffic features efficiently is becoming a big challenge. Although a number of traditional feature selection methods using the Hadoop-MapReduce framework have been proposed, the execution time was still unsatisfactory with numeral iterative computations during the processing. To address this issue, an efficient feature selection method for network traffic based on a new parallel computing framework called Spark is proposed in this paper. In our approach, the complete feature set is firstly preprocessed based on Fisher score, and a sequential forward search strategy is employed for subsets. The optimal feature subset is then selected using the continuous iterations of the Spark computing framework. The implementation demonstrates that, on the precondition of keeping the classification accuracy, our method reduces the time cost of modeling and classification, and improves the execution efficiency of feature selection significantly.

...read moreread less

Patent•

Network traffic classification

[...]

Enzo Fenoglio¹, Andre Surcouf¹, Joseph T. Friel¹, Hugo Latapie¹, Altan J. Stalker¹, Michael Costello¹ - Show less +2 more•Institutions (1)

Cisco Systems, Inc.¹

02 Mar 2016

TL;DR: In this paper, a method for video traffic flow behavioral classification is implemented on a computing device and includes: receiving coarse flow data from a network router, where the fine flow data includes information on a per packet basis.

...read moreread less

Abstract: In one embodiment, a method for video traffic flow behavioral classification is implemented on a computing device and includes: receiving coarse flow data from a network router, where the coarse flow data includes summary statistics for data flows on the router, classifying the summary statistics to detect video flows from among the data flows, requesting fine flow data from the network router for each of the detected video flows, where the fine flow data includes information on a per packet basis, receiving the fine flow data from the network router, and classifying each of the detected video flows per video service provider in accordance with the information.

...read moreread less

Proceedings Article•DOI•

WeChat Text Messages Service Flow Traffic Classification Using Machine Learning Technique

[...]

Muhammad Shafiq¹, Xiangzhan Yu², Asif Ali Laghari²•Institutions (2)

University of Malakand¹, Harbin Institute of Technology²

01 Sep 2016

TL;DR: This paper classifies WeChat messages flows traffic using two different data sets, which are first captured using Wireshark tool from two different locations network environments, Harbin Institute of Technology Lab and Jinyuan Hotel and then 50 features are extracted from captured traffic.

...read moreread less

Abstract: In this era of information technology, Network Traffic Classification is a very important and hot topic from the perspective of network security and management due to substantial use of dynamic applications. Numerous research models have been proposed in Network Traffic Classification to classify different types of applications and achieve significant accuracy results. However, no work has been done to classify WeChat messages flow traffic. WeChat is a free instant messaging application. Hence, it is very important to classify WeChat text messages traffic. In this paper, we classify WeChat messages flows traffic using two different data sets, which are first captured using Wireshark tool from two different locations network environments, Harbin Institute of Technology Lab and Jinyuan Hotel and then 50 features are extracted from captured traffic. After that four machine learning algorithms SVM, C4.5, Bayes Net and Naïve Byes are applied to classify the WeChat text messages traffic. Experimental results show that all classifiers give very high accuracy results using two different data sets. Using Jinyuan data set SVM and C4.5 decision tree algorithm give 100% accuracy result as compared to Bayes Net and Naïve Bayes algorithm and using Harbin Institute of Technology Lab data set all classifiers give 99.7% high accuracy results.

...read moreread less