scispace - formally typeset
Search or ask a question

Showing papers on "Traffic classification published in 2016"


Proceedings ArticleDOI
19 Feb 2016
TL;DR: This paper studies the effectiveness of flow-based time-related features to detect VPN traffic and to characterize encrypted traffic into different categories, according to the type of traffic e.g., browsing, streaming, etc.
Abstract: Traffic characterization is one of the major challenges in today’s security industry. The continuous evolution and generation of new applications and services, together with the expansion of encrypted communications makes it a difficult task. Virtual Private Networks (VPNs) are an example of encrypted communication service that is becoming popular, as method for bypassing censorship as well as accessing services that are geographically locked. In this paper, we study the effectiveness of flow-based time-related features to detect VPN traffic and to characterize encrypted traffic into different categories, according to the type of traffic e.g., browsing, streaming, etc. We use two different well-known machine learning techniques (C4.5 and KNN) to test the accuracy of our features. Our results show high accuracy and performance, confirming that time-related features are good classifiers for encrypted traffic characterization.

562 citations


Journal ArticleDOI
TL;DR: The evaluation results show that the feature selection algorithm contributes more critical features for LSSVM-IDS to achieve better accuracy and lower computational cost compared with the state-of-the-art methods.
Abstract: Redundant and irrelevant features in data have caused a long-term problem in network traffic classification. These features not only slow down the process of classification but also prevent a classifier from making accurate decisions, especially when coping with big data. In this paper, we propose a mutual information based algorithm that analytically selects the optimal feature for classification. This mutual information based feature selection algorithm can handle linearly and nonlinearly dependent data features. Its effectiveness is evaluated in the cases of network intrusion detection. An Intrusion Detection System (IDS), named Least Square Support Vector Machine based IDS (LSSVM-IDS), is built using the features selected by our proposed feature selection algorithm. The performance of LSSVM-IDS is evaluated using three intrusion detection evaluation datasets, namely KDD Cup 99, NSL-KDD and Kyoto 2006+ dataset. The evaluation results show that our feature selection algorithm contributes more critical features for LSSVM-IDS to achieve better accuracy and lower computational cost compared with the state-of-the-art methods.

406 citations


Journal ArticleDOI
TL;DR: A reference framework for TE in the SDN is proposed, which consists of two parts, traffic measurement and traffic management; technologies related to traffic management include traffic load balancing, QoS-guarantee scheduling, energy-saving scheduling, and trafficmanagement for the hybrid IP/SDN.
Abstract: As the next generation network architecture, software-defined networking (SDN) has exciting application prospects. Its core idea is to separate the forwarding layer and control layer of network system, where network operators can program packet forwarding behavior to significantly improve the innovation capability of network applications. Traffic engineering (TE) is an important network application, which studies measurement and management of network traffic, and designs reasonable routing mechanisms to guide network traffic to improve utilization of network resources, and better meet requirements of the network quality of service (QoS). Compared with the traditional networks, the SDN has many advantages to support TE due to its distinguish characteristics, such as isolation of control and forwarding, global centralized control, and programmability of network behavior. This paper focuses on the traffic engineering technology based on the SDN. First, we propose a reference framework for TE in the SDN, which consists of two parts, traffic measurement and traffic management. Traffic measurement is responsible for monitoring and analyzing real-time network traffic, as a prerequisite for traffic management. In the proposed framework, technologies related to traffic measurement include network parameters measurement, a general measurement framework, and traffic analysis and prediction; technologies related to traffic management include traffic load balancing, QoS-guarantee scheduling, energy-saving scheduling, and traffic management for the hybrid IP/SDN. Current existing technologies are discussed in detail, and our insights into future development of TE in the SDN are offered.

149 citations


Proceedings ArticleDOI
01 Nov 2016
TL;DR: This work describes a simple architecture deployed in an enterprise network that gathers traffic data using the OpenFlow protocol and presents the data-sets that can be obtained and shows how several ML techniques can be applied to it for traffic classification.
Abstract: Software Defined Networks (SDNs) provides a separation between the control plane and the forwarding plane of networks. The software implementation of the control plane and the built in data collection mechanisms of the OpenFlow protocol promise to be excellent tools to implement Machine Learning (ML) network control applications. A first step in that direction is to understand the type of data that can be collected in SDNs and how information can be learned from that data. In this work we describe a simple architecture deployed in an enterprise network that gathers traffic data using the OpenFlow protocol. We present the data-sets that can be obtained and show how several ML techniques can be applied to it for traffic classification. The results indicate that high accuracy classification can be obtained with the data-sets using supervised learning.

136 citations


Proceedings ArticleDOI
01 Jun 2016
TL;DR: The proposed framework jointly exploits deep packet inspection (DPI) and semi-supervised machine learning so that accurate traffic classification can be realized, while requiring minimal communications between the network controller and the SDN switches.
Abstract: In this paper, a QoS-aware traffic classification framework for software defined networks is proposed. Instead of identifying specific applications in most of the previous work of traffic classification, our approach classifies the network traffic into different classes according to the QoS requirements, which provide the crucial information to enable the fine-grained and QoS-aware traffic engineering. The proposed framework is fully located in the network controller so that the real-time, adaptive, and accurate traffic classification can be realized by exploiting the superior computation capacity, the global visibility, andthe inherent programmability of the network controller. More specifically, the proposed framework jointly exploits deep packet inspection (DPI) and semi-supervised machine learning so that accurate traffic classification can be realized, while requiring minimal communications between the network controller and the SDN switches. Based on the real Internet data set, the simulation results show the proposed classification framework can provide good performance in terms of classification accuracy and communication costs

116 citations


Proceedings ArticleDOI
01 Oct 2016
TL;DR: This paper discusses network traffic classification techniques step by step and real time internet data set is develop using network traffic capture tool, after that feature extraction tool is use to extract features from the capture traffic and four machine learning classifiers Support Vector Machine, C4.5 decision tree, Naïve Bays and Bayes Net classifiers are applied.
Abstract: Network Traffic Classification is a central topic nowadays in the field of computer science. It is a very essential task for internet service providers (ISPs) to know which types of network applications flow in a network. Network Traffic Classification is the first step to analyze and identify different types of applications flowing in a network. Through this technique, internet service providers or network operators can manage the overall performance of a network. There are many methods traditional technique to classify internet traffic like Port Based, Pay Load Based and Machine Learning Based technique. The most common technique used these days is Machine Learning (ML) technique. Which is used by many researchers and got very effective accuracy results. In this paper, we discuss network traffic classification techniques step by step and real time internet data set is develop using network traffic capture tool, after that feature extraction tool is use to extract features from the capture traffic and then four machine learning classifiers Support Vector Machine, C4.5 decision tree, Naive Bays and Bayes Net classifiers are applied. Experimental analysis shows that C4.5 classifiers gives very good accuracy result as compare to other classifies.

108 citations


Proceedings ArticleDOI
25 Apr 2016
TL;DR: It is argued that Software-Defined Networking (SDN) form propitious environments for the design and implementation of more robust and extensible anomaly classification schemes.
Abstract: Anomaly traffic detection and classification mechanisms need to be flexible and easy to manage in order to detect the ever growing spectrum of anomalies. Detection and classification are difficult tasks because of several reasons, including the need to obtain an accurate and comprehensive view of the network, the ability to detect the occurrence of new attack types, and the need to deal with misclassification. In this paper, we argue that Software-Defined Networking (SDN) form propitious environments for the design and implementation of more robust and extensible anomaly classification schemes. Different than other approaches from the literature, which individually tackle either anomaly detection or classification or mitigation, we present a management framework to perform these tasks jointly. Our proposed framework is called ATLANTIC and it combines the use of information theory to calculate deviations in the entropy of flow tables and a range of machine learning algorithms to classify traffic flows. As a result, ATLANTIC is a flexible framework capable of categorizing traffic anomalies and using the information collected to handle each traffic profile in a specific manner, e.g., blocking malicious flows.

101 citations


Proceedings ArticleDOI
22 May 2016
TL;DR: This work studies two of the most popular smart home devices, the Nest Thermostat and the wired Nest Protect, and shows that traffic analysis can be used to learn potentially sensitive information about the state of a smart home.
Abstract: As smart home devices are introduced into our homes, security and privacy concerns are being raised. Smart home devices collect, exchange, and transmit various data about the environment of our homes. This data can not only be used to characterize a physical property but also to infer personal information about the inhabitants. One potential attack vector for smart home devices is the use of traffic classification as a source for covert channel attacks. Specifically, we are concerned with the use of traffic classification techniques for inferring events taking place within a building. In this work, we study two of the most popular smart home devices, the Nest Thermostat and the wired Nest Protect (i.e. smoke and carbon dioxide detector) and show that traffic analysis can be used to learn potentially sensitive information about the state of a smart home. Among other observations, we show that we can determine, with 88% and 67% accuracy respectively, when the thermostat transitions between the Home and Auto Away mode and vice versa, based only on network traffic originating from the device. This information may be used, for example, by an attacker to infer whether the home is occupied.

99 citations


Journal ArticleDOI
TL;DR: Experimental results on BitTorrent, CIFS/SMB, DNS, FTP, PPLIVE, SIP, and SMTP traces show that Securitas has the ability to accurately identify the network traces of the target application protocol with an average recall and an average precision of about 98.4%.
Abstract: Traffic classification, a mapping of traffic to network applications, is important for a variety of networking and security issues, such as network measurement, network monitoring, as well as the detection of malware activities. In this paper, we propose Securitas, a network trace-based protocol identification system, which exploits the semantic information in protocol message formats. Securitas requires no prior knowledge of protocol specifications. Deeming a protocol as a language between two processes, our approach is based upon the new insight that the n-grams of protocol traces, just like those of natural languages, exhibit highly skewed frequency-rank distribution that can be leveraged in the context of protocol identification. In Securitas, we first extract the statistical protocol message formats by clustering n-grams with the same semantics, and then use the corresponding statistical formats to classify raw network traces. Our tool involves the following key features: 1) applicable to both connection oriented protocols and connection less protocols; 2) suitable for both text and binary protocols; 3) no need to assemble IP packets into TCP or UDP flows; and 4) effective for both long-live flows and short-live flows. We implement Securitas and conduct extensive evaluations on real-world network traces containing both textual and binary protocols. Our experimental results on BitTorrent, CIFS/SMB, DNS, FTP, PPLIVE, SIP, and SMTP traces show that Securitas has the ability to accurately identify the network traces of the target application protocol with an average recall of about 97.4% and an average precision of about 98.4%. Our experimental results prove Securitas is a robust system, and meanwhile displaying a competitive performance in practice.

78 citations


Journal ArticleDOI
TL;DR: A study on the effect of (if any) the feature sets of network traffic flow exporters on the performance of botnet traffic classification indicates that the use of a flow exporter and a protocol filter indeed has an effect on theperformance of botnets.
Abstract: Botnets represent one of the most aggressive threats against cyber security. Different techniques using different feature sets have been proposed for botnet traffic analysis and classification. However, no work has been performed to study the effect of such differences. In this paper, we perform a study on the effect of (if any) the feature sets of network traffic flow exporters. To this end, we explore five different traffic flow exporters (each with a different set of flow features) using two different protocol filters [Hypertext Transfer Protocol (HTTP) and Domain Name System (DNS)] and five different classifiers. We evaluate all these on eight different botnet traffic data sets. Our results indicate that the use of a flow exporter and a protocol filter indeed has an effect on the performance of botnet traffic classification. Experimental results show that the best performance is achieved using Tranalyzer flow exporter and HTTP filter with the C4.5 classifier.

71 citations


01 May 2016
TL;DR: Copos et al. as discussed by the authors investigated how device-to-device and deviceto-cloud smart home network traffic can be used to infer personal information about the state of a smart home and showed that with 88% and 67% accuracy respectively, when the thermostat transitions between Home and Auto Away mode and vice versa, based only on network traffic originating from the device.
Abstract: Is Anybody Home? Inferring Activity From Smart Home Network Traffic Bogdan Copos ∗ , Karl Levitt † , Matt Bishop ‡ , Jeff Rowe § Department of Computer Science University of California, Davis Email: ∗ bcopos@ucdavis.edu, † levitt@cs.ucdavis.edu, ‡ mabishop@ucdavis.edu, § rowe@cs.cdavis.edu, Abstract—As smart home devices are introduced into our homes, security and privacy concerns are being raised. Smart home devices collect, exchange, and transmit various data about the environment of our homes. This data can not only be used to characterize a physical property but also to infer personal information about the inhabitants. One potential attack vector for smart home devices is the use of traffic classification as a source for covert channel attacks. Specifically, we are concerned with the use of traffic classification techniques for inferring events taking place within a building. In this work, we study two of the most popular smart home devices, the Nest Thermostat and the wired Nest Protect (i.e. smoke and carbon dioxide detector) and show that traffic analysis can be used to learn potentially sensitive information about the state of a smart home. Among other observations, we show that we can determine, with 88% and 67% accuracy respectively, when the thermostat transitions between the Home and Auto Away mode and vice versa, based only on network traffic originating from the device. This information may be used, for example, by an attacker to infer whether the home is occupied. I. I NTRODUCTION Smart home devices are becoming increasingly popular in households around the world. Nest Labs, one of the most popular manufacturers of smart thermostats and smoke detectors, is believed to have sold 440,000 smoke detector units over the span of four months in 2014 alone. Smart home devices are designed to help homeowners automate and simplify mundane tasks around their property. However, bringing internet connectivity to household devices has also introduced many security and privacy concerns. At the end of 2015, security researchers discovered a vulnerability in Barbie dolls which would allow attackers to not only steal personal information but also convert a doll into a spying device capable of listening into conversations [6]. In early 2016, security research from Rapid7 found vulnerabilities in Comcast’s Xfinity Home Security system that would cause the system to not report when a property’s windows and/or doors were compromised [19]. In this paper, we investigate how device-to-device and device-to-cloud smart home network traffic can be used to infer personal information. Specifically, we use traffic analysis techniques on network traffic generated by devices from Nest Labs to learn information about the presence of residents and other events occurring within the property. Traffic analysis is the process of intercepting and analyzing network packets in order to deduce information from patterns in communication. The experiments involve two smart home devices, a smart thermostat and a smart smoke and carbon dioxide detector. The rest of the paper is organized as follows: • Section II describes relevant previous work. • Section III gives a detailed rundown of the devices used in this study and their features and capabilities. • In Section IV the data collection process is described. • In Section V, the methodology behind the traffic classi- fication is explained. • Section VI reports the findings of our analysis. • Section VII describes how the findings were tested for validity and presents information about the accuracy of our findings. • Section VIII discusses limitations of our approach. • In section IX we provide some initial ideas for solutions and list possible future work. II. P REVIOUS W ORK Traffic analysis attacks were highlighted in “Attacks of the SSL 3.0 protocol” [16], by Wagner and Schneier who showed the URL of an HTTP GET request is leaked in SSL because cipher-texts fail to disguise the plaintext length. Later, Cheng and Avnur [3] show that websites can be fingerprinted by performing traffic analysis of SSL encrypted web browsing traffic. Ever since, there have been a number of works [2], [7], [8], [10], [13], [15] exploring traffic analysis attacks using various features including source and destination attributes (e.g. address, port), protocol, packet and connection sizes, and even timing information (e.g. duration of connec- tions, burstiness of transmissions). Efforts have also been put into developing countermeasures for such attacks [5], [11], [18]. Countermeasure techniques include traffic padding and traffic masking. Another variation is in the implementation, whether server side, client side, or both. Recently, in “Peek-a-Boo, I Still See You: Why Efficient Traffic Analysis Countermeasures Fail” [4], Dyer, Coull et. al. provide the first comprehensive analysis of some of the proposed traffic analysis countermeasures and show why they fail to protect against attacks. The authors argue that there is no efficient solution.

Journal ArticleDOI
TL;DR: The computational performance and accuracy of the proposed methodology in comparison with similar machine learning techniques lead us to recommend its extension to other applications in achieving highly granular real-time traffic classification.
Abstract: Traffic classification utilizing flow measurement enables operators to perform essential network management. Flow accounting methods such as NetFlow are, however, considered inadequate for classification requiring additional packet-level information, host behaviour analysis, and specialized hardware limiting their practical adoption. This paper aims to overcome these challenges by proposing two-phased machine learning classification mechanism with NetFlow as input. The individual flow classes are derived per application through k-means and are further used to train a C5.0 decision tree classifier. As part of validation, the initial unsupervised phase used flow records of fifteen popular Internet applications that were collected and independently subjected to k-means clustering to determine unique flow classes generated per application. The derived flow classes were afterwards used to train and test a supervised C5.0 based decision tree. The resulting classifier reported an average accuracy of 92.37% on approximately 3.4 million test cases increasing to 96.67% with adaptive boosting. The classifier specificity factor which accounted for differentiating content specific from supplementary flows ranged between 98.37% and 99.57%. Furthermore, the computational performance and accuracy of the proposed methodology in comparison with similar machine learning techniques lead us to recommend its extension to other applications in achieving highly granular real-time traffic classification.

Journal ArticleDOI
TL;DR: This paper uses a deep learning architecture to explore the dynamic properties of network traffic, and proposes a novel network traffic prediction approach based on a deep belief network and a network traffic estimation method utilizing theDeep belief network via link counts and routing information.

Proceedings ArticleDOI
01 Dec 2016
TL;DR: This work uses a deep architecture to explore the time-varying property of network traffic in a data center network, and proposes a novel network traffic prediction approach based on a deep belief network and a logistic regression model.
Abstract: Network traffic analysis is a crucial technique for systematically operating a data center network. Many network management functions rely on exact network traffic information. Although a great number of works to obtain network traffic have been carried out in traditional ISP networks, they cannot be employed effectively in data center networks. Motivated by that, we focus on the problem of network traffic prediction and estimation in data center networks. We involve deep learning techniques in the network traffic prediction and estimation fields, and propose two deep architectures for network traffic prediction and estimation, respectively. We first use a deep architecture to explore the time-varying property of network traffic in a data center network, and then propose a novel network traffic prediction approach based on a deep belief network and a logistic regression model. Meanwhile, to deal with the highly ill-pose property of network traffic estimation, we further propose a network traffic estimation method using the deep belief network trained by link counts. We validate the effectiveness of our methodologies by real traffic data.

Proceedings ArticleDOI
20 Jun 2016
TL;DR: This paper develops a new model by incorporating the certificate packet length clustering into the Second-Order homogeneous Markov chains, and shows that the proposed method lead to a 30% improvement on average compared with the state-of-the-art method, in terms of classification accuracy.
Abstract: With the prosperity of network applications, traffic classification serves as a crucial role in network management and malicious attack detection. The widely used encryption transmission protocols, such as the Secure Socket Layer/Transport Layer Security (SSL/TLS) protocols, leads to the failure of traditional payload-based classification methods. Existing methods for encrypted traffic classification suffer from low accuracy. In this paper, we propose a certificate-aware encrypted traffic classification method based on the Second-Order Markov Chain. We start by exploring reasons why existing methods not perform well, and make a novel observation that certificate packet length in SSL/TLS sessions contributes to application discrimination. To increase the diversity of application fingerprints, we develop a new model by incorporating the certificate packet length clustering into the Second-Order homogeneous Markov chains. Extensive evaluation results show that the proposed method lead to a 30% improvement on average compared with the state-of-the-art method, in terms of classification accuracy.

Proceedings ArticleDOI
01 Aug 2016
TL;DR: A novel data traffic aggregation model and algorithm along with a new 5G network slicing based on classification and measuring the data traffic to satisfy Quality of Service for smart systems in a smart city environment is proposed.
Abstract: The recent advancements in cellular communication domain have resulted in the emergence of Machine-to-Machine applications, in support of the wide range and coverage provision, low costs, and high mobility. 5G network standards represent a promising technology to support the future of Machine-to-Machine data traffic. In recent years, Human-Type-Communication traffic has seen exponential growth over cellular networks, which resulted in increasing the capacity and higher data rates. These networks are expected to face challenges such as explosion of the data traffic due to the future of smart devices data traffic with various Quality of Service requirements. This paper proposes a novel data traffic aggregation model and algorithm along with a new 5G network slicing based on classification and measuring the data traffic to satisfy Quality of Service for smart systems in a smart city environment. In our proposal, 5G radio resources are efficiently utilized as the smallest unit of a physical resource block in a relay node by aggregating the data traffic of several Machine-to-Machine devices as separate slices based on Quality of Service for each application. OPNET is used to assess the performance of the proposed model. The simulated 5G data traffic classes include file transfer protocol, voice over IP, and video users.

Proceedings ArticleDOI
Zhengyang Chen, Bowen Yu1, Yu Zhang, Jianzhong Zhang1, Jingdong Xu 
01 Aug 2016
TL;DR: A novel approach is proposed, which can identify mobile application by automatically extracting abstract features from labeled packets by mainly based on convolutional neural networks (CNNs), which can extract the abstract statistical features between characters in HTTP and thus improve the identification accuracy.
Abstract: Mobile network security and management are becoming important issues, due to the rapid development and widespread of the mobile network. Application traffic identification is a critical technology to resolve these issues. A variety of traffic classification methods on desktop applications are no longer effective in mobile network, because the majority of mobile traffic is carried over HTTP without distinctive features. Existing approaches to identify mobile traffic simply extract obvious features like fixed strings or regular expressions, which are not effective to capture hidden structure within the HTTP headers. In this paper, we propose a novel approach, which can identify mobile application by automatically extracting abstract features from labeled packets. Our approach is mainly based on convolutional neural networks (CNNs). The CNNs can extract the abstract statistical features between characters in HTTP and thus improve the identification accuracy. It's also able to reduce the dependence on prior knowledge and human effort in designing features. To verify the effectiveness of our method, we apply it to several identification tasks. The evaluation shows that our method can accurately identify the traffic of the target mobile application.

Proceedings ArticleDOI
28 May 2016
TL;DR: Experiments show that the improved HAC4.5 decision tree algorithm not only improves the running speed, but also improves the accuracy of the calculation.
Abstract: In the current age of the Internet, network traffic increased exponentially, either based on user demand for network resources, QoS scheduling, or according to the development trend of network applications for expansion transformation of the existing network, various applications in network traffic need to be classified and identified accurately, network traffic classification is particularly important. C4.5 decision tree algorithm as a commonly used supervised classification algorithm is often applied in traffic classification, but with the increase of data volume, the efficiency of C4.5 algorithm has been reduced. Hadoop platform as open source cloud framework, in dealing with big data has a high performance, so in many cases as the preferred handle large data. On the basis of the original C4.5 algorithm, the improved algorithm is simplified, and the algorithm is parallel to the Hadoop platform, I call it HAC4.5 decision tree algorithm. Experiments show that the improved HAC4.5 decision tree algorithm not only improves the running speed, but also improves the accuracy of the calculation.

Proceedings ArticleDOI
01 Sep 2016
TL;DR: This paper studies to what extent basic information visible at flow-level measurements is useful for traffic classification on the web, and quantifies how often the same server IP address is used by different services, and how services use hostnames by analyzing a large dataset of flow measurements.
Abstract: The identification of the services that generate traffic is crucial for ISPs and companies to plan and monitor the network. The widespread deployment of encryption and the convergence of the web services towards HTTP/HTTPS challenge traditional classification techniques. Algorithms to classify traffic are left with little information, such as server IP addresses, flow characteristics and queries performed at the DNS. Moreover, due to the usage of Content Delivery Networks and cloud infrastructure, it is unclear whether such coarse metadata is sufficient to differentiate the traffic. This paper studies to what extent basic information visible at flow-level measurements is useful for traffic classification on the web. By analyzing a large dataset of flow measurements, we quantify how often the same server IP address is used by different services, and how services use hostnames. Our results show that a very simple classifier that relies only on server IP addresses and on lists of hostnames can distinguish up to 55% of the traffic volume. Yet, collisions of names and addresses are common among popular services, calling for more ingenuity. This paper is a preliminary step in the evaluation of classification algorithms that are suitable for the modern Internet, where only minimal metadata collection will be possible in the network.

Proceedings ArticleDOI
01 Oct 2016
TL;DR: It is implied that social network service may be used as an alternative source for traffic anomalies detection by providing information of traffic flow condition in real-time by harnessing the power of social network data, Twitter.
Abstract: The growth of vehicles in Yogyakarta Province, Indonesia is not proportional to the growth of roads. This problem causes severe traffic jam in many main roads. Common traffic anomalies detection using surveillance camera requires manpower and costly, while traffic anomalies detection with crowdsourcing mobile applications are mostly owned by private. This research aims to develop a real-time traffic classification by harnessing the power of social network data, Twitter. In this study, Twitter data are processed to the stages of preprocessing, feature extraction, and tweet classification. This study compares classification performance of three machine learning algorithms, namely Naive Bayes (NB), Support Vector Machine (SVM), and Decision Tree (DT). Experimental results show that SVM algorithm produced the best performance among the other algorithms with 99.77% and 99.87% of classification accuracy in balanced and imbalanced data, respectively. This research implies that social network service may be used as an alternative source for traffic anomalies detection by providing information of traffic flow condition in real-time.

Journal ArticleDOI
TL;DR: DBStream is presented, a holistic approach to large-scale network monitoring and analysis applications and its Continuous Execution Language (CEL) can be used to automate several data processing and analysis tasks typical for monitoring operational ISP networks.

Journal ArticleDOI
TL;DR: A streaming flow-based classification solution based on Hoeffding Adaptive Tree, a machine learning technique specifically designed for evolving data streams that can sustain a very high accuracy over the years, with significantly less cost and complexity than existing alternatives based on static learning algorithms.
Abstract: The continuous evolution of Internet traffic and its applications makes the classification of network traffic a topic far from being completely solved. An essential problem in this field is that most of proposed techniques in the literature are based on a static view of the network traffic (i.e., they build a model or a set of patterns from a static, invariable dataset). However, very little work has addressed the practical limitations that arise when facing a more realistic scenario with an infinite, continuously evolving stream of network traffic flows. In this paper, we propose a streaming flow-based classification solution based on Hoeffding Adaptive Tree, a machine learning technique specifically designed for evolving data streams. The main novelty of our proposal is that it is able to automatically adapt to the continuous evolution of the network traffic without storing any traffic data. We apply our solution to a 12 + 1 year-long dataset from a transit link in Japan, and show that it can sustain a very high accuracy over the years, with significantly less cost and complexity than existing alternatives based on static learning algorithms, such as C4.5.

Proceedings ArticleDOI
11 Mar 2016
TL;DR: In this paper, a design of virtual network functions is proposed to flexibly select and apply the best suitable machine learning classifiers at run time. And the experimental results show that the proposed NFV for flow classification can improve the accuracy of classification by up to 13%.
Abstract: Network flow classification is fundamental to network management and network security. However, it is challenging to classify network flows at very high line rates while simultaneously preserving user privacy. Machine learning based classification techniques utilize only meta-information of a flow and have been shown to be effective in identifying network flows. We analyze a group of widely used machine learning classifiers, and observe that the effectiveness of different classification models depends highly upon the protocol types as well as the flow features collected from network data.We propose vTC, a design of virtual network functions to flexibly select and apply the best suitable machine learning classifiers at run time. The experimental results show that the proposed NFV for flow classification can improve the accuracy of classification by up to 13%.

Journal ArticleDOI
TL;DR: Two statistics-based solutions are proposed, the message size distribution classifier (MSDC) and themessage size sequence classifiers (MSSC) depending on classification accuracy and real timeliness, which aims to identify network flows in an accurate manner and provide a lightweight and real-time solution.

Journal ArticleDOI
TL;DR: The proposed diff Serv model is designed to follow the guidelines from the diffserv model in the current Internet, and is considered to take the advantages of NDN unique features such as interest aggregation and in-network caching.

Journal ArticleDOI
TL;DR: The experimental results show that the DFA-F-DAE improves the generalization ability of the three classification algorithms (J48, back propagation neural network (BPNN), and support vector machine (SVM) by data dimensionality reduction.
Abstract: Data fusion is usually performed prior to classification in order to reduce the input space. These dimensionality reduction techniques help to decline the complexity of the classification model and thus improve the classification performance. The traditional supervised methods demand labeled samples, and the current network traffic data mostly is not labeled. Thereby, better learners will be built by using both labeled and unlabeled data, than using each one alone. In this paper, a novel network traffic data fusion approach based on Fisher and deep auto-encoder (DFA-F-DAE) is proposed to reduce the data dimensions and the complexity of computation. The experimental results show that the DFA-F-DAE improves the generalization ability of the three classification algorithms (J48, back propagation neural network (BPNN), and support vector machine (SVM)) by data dimensionality reduction. We found that the DFA-F-DAE remarkably improves the efficiency of big network traffic classification.

Proceedings ArticleDOI
01 Dec 2016
TL;DR: Experimental result analysis show that using HIT data set all the applied machine learning classifiers classify WeChat text and picture messages traffic very accurately as compared to Dorm13 dataset.
Abstract: Network Traffic Classification carries great importance for both internet service providers (ISPs) and quality of services (QoSs) management. During the last two decades, a lot of machine learning models have been proposed and applied on different types of real time applications to classify their real time traffic and obtain very proficient accuracy results. However, no research has been done on WeChat text and picture messages traffic classification. In this paper, WeChat text and picture messages traffics are classified using two different types of datasets and 4 well-known machine learning algorithms. These two datasets, Harbin Institute of Technology (HIT) and Dorm13, are collected from two different network environments. Having captured the traffic 50 features, they are extracted respectively. Thereafter, well-known four machine learning algorithms C4.5 decision tree, Bayes Net, Naive Bayes and SVM are used to classify WeChat text and picture messages traffic. Experimental result analysis show that using HIT data set all the applied machine learning classifiers classify WeChat text and picture messages traffic very accurately as compared to Dorm13 dataset. Using HIT dataset, all ML classifier perform very well, but C4.5 and SVM are the ones that give very effective accuracy results of 99.91% and 99.57% respectively as compared to other ML classifiers.

Journal ArticleDOI
TL;DR: This paper proposes an efficient feature selection method for network traffic based on a new parallel computing framework called Spark that reduces the time cost of modeling and classification, and improves the execution efficiency of feature selection significantly.
Abstract: Currently, with the rapid increasing of data scales in network traffic classifications, how to select traffic features efficiently is becoming a big challenge. Although a number of traditional feature selection methods using the Hadoop-MapReduce framework have been proposed, the execution time was still unsatisfactory with numeral iterative computations during the processing. To address this issue, an efficient feature selection method for network traffic based on a new parallel computing framework called Spark is proposed in this paper. In our approach, the complete feature set is firstly preprocessed based on Fisher score, and a sequential forward search strategy is employed for subsets. The optimal feature subset is then selected using the continuous iterations of the Spark computing framework. The implementation demonstrates that, on the precondition of keeping the classification accuracy, our method reduces the time cost of modeling and classification, and improves the execution efficiency of feature selection significantly.

Patent
02 Mar 2016
TL;DR: In this paper, a method for video traffic flow behavioral classification is implemented on a computing device and includes: receiving coarse flow data from a network router, where the fine flow data includes information on a per packet basis.
Abstract: In one embodiment, a method for video traffic flow behavioral classification is implemented on a computing device and includes: receiving coarse flow data from a network router, where the coarse flow data includes summary statistics for data flows on the router, classifying the summary statistics to detect video flows from among the data flows, requesting fine flow data from the network router for each of the detected video flows, where the fine flow data includes information on a per packet basis, receiving the fine flow data from the network router, and classifying each of the detected video flows per video service provider in accordance with the information.

Proceedings ArticleDOI
01 Sep 2016
TL;DR: This paper classifies WeChat messages flows traffic using two different data sets, which are first captured using Wireshark tool from two different locations network environments, Harbin Institute of Technology Lab and Jinyuan Hotel and then 50 features are extracted from captured traffic.
Abstract: In this era of information technology, Network Traffic Classification is a very important and hot topic from the perspective of network security and management due to substantial use of dynamic applications. Numerous research models have been proposed in Network Traffic Classification to classify different types of applications and achieve significant accuracy results. However, no work has been done to classify WeChat messages flow traffic. WeChat is a free instant messaging application. Hence, it is very important to classify WeChat text messages traffic. In this paper, we classify WeChat messages flows traffic using two different data sets, which are first captured using Wireshark tool from two different locations network environments, Harbin Institute of Technology Lab and Jinyuan Hotel and then 50 features are extracted from captured traffic. After that four machine learning algorithms SVM, C4.5, Bayes Net and Naïve Byes are applied to classify the WeChat text messages traffic. Experimental results show that all classifiers give very high accuracy results using two different data sets. Using Jinyuan data set SVM and C4.5 decision tree algorithm give 100% accuracy result as compared to Bayes Net and Naïve Bayes algorithm and using Harbin Institute of Technology Lab data set all classifiers give 99.7% high accuracy results.