scispace - formally typeset
Search or ask a question

Showing papers on "Traffic classification published in 2022"


Proceedings ArticleDOI
13 Feb 2022
TL;DR: This paper proposes a new traffic representation model called Encrypted Traffic Bidirectional Encoder Representations from Transformer (ET-BERT), which pre-trains deep contextualized datagram-level representation from large-scale unlabeled data and achieves state-of-the-art performance across five encrypted traffic classification tasks.
Abstract: Encrypted traffic classification requires discriminative and robust traffic representation captured from content-invisible and imbalanced traffic data for accurate classification, which is challenging but indispensable to achieve network security and network management. The major limitation of existing solutions is that they highly rely on the deep features, which are overly dependent on data size and hard to generalize on unseen data. How to leverage the open-domain unlabeled traffic data to learn representation with strong generalization ability remains a key challenge. In this paper, we propose a new traffic representation model called Encrypted Traffic Bidirectional Encoder Representations from Transformer (ET-BERT), which pre-trains deep contextualized datagram-level representation from large-scale unlabeled data. The pre-trained model can be fine-tuned on a small number of task-specific labeled data and achieves state-of-the-art performance across five encrypted traffic classification tasks, remarkably pushing the F1 of ISCX-VPN-Service to 98.9% (5.2%↑), Cross-Platform (Android) to 92.5% (5.4%↑), CSTNET-TLS 1.3 to 97.4% (10.0%↑). Notably, we provide explanation of the empirically powerful pre-training model by analyzing the randomness of ciphers. It gives us insights in understanding the boundary of classification ability over encrypted traffic. The code is available at: https://github.com/linwhitehat/ET-BERT.

32 citations


Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper modeled the time-series network traffic by the recurrent neural network (RNN) and the attention mechanism was introduced for assisting network traffic classification in the form of the following two models, the attention aided long short term memory (LSTM) as well as the hierarchical attention network (HAN).
Abstract: Network traffic classification has become an important part of network management, which is beneficial for achieving intelligent network operation and maintenance, enhancing the network quality of service (QoS), and for network security. Given the rapid development of various applications and protocols, more and more encrypted traffic has emerged in networks. Traditional traffic classification methods exhibited the unsatisfied performance since the encrypted traffic is no longer in plain text. In this work, we modeled the time-series network traffic by the recurrent neural network (RNN). Moreover, the attention mechanism was introduced for assisting network traffic classification in the form of the following two models, the attention aided long short term memory (LSTM) as well as the hierarchical attention network (HAN). Finally, relying on the ISCX VPN-NonVPN dataset, extensive experiments were conducted, showing that the proposed methods achieved 91.2 percent in accuracy while the highest accuracy of other methods was 89.8 percent relying on the same dataset.

27 citations


Journal ArticleDOI
TL;DR: Li et al. as discussed by the authors proposed an attention-based long short-term memory (LSTM) model for traffic classification, which can effectively identify global dependencies between the input and the output with a limited number of important raw features.

24 citations


Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors proposed a self-attentive deep learning method for darknet traffic classification and application identification, which utilizes a cascaded model with a 1D CNN and a bidirectional Long Short-Term Memory (Bi-LSTM) network to capture local spatial-temporal features from the payload content of packets, while the self attention mechanism is integrated into the abovementioned feature extraction network to mine the intrinsic relationships and hidden connections among the previously extracted content features.

13 citations


Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors proposed a cost-sensitive deep learning approach to increase the robustness of deep learning classifiers against the imbalanced class problem in NTC, where the dataset is divided into different partitions, and a cost matrix is created for each partition by considering the data distribution.
Abstract: Network traffic classification (NTC) plays an important role in cyber security and network performance, for example in intrusion detection and facilitating a higher quality of service. However, due to the unbalanced nature of traffic datasets, NTC can be extremely challenging and poor management can degrade classification performance. While existing NTC methods seek to re-balance data distribution through resampling strategies, such approaches are known to suffer from information loss, overfitting, and increased model complexity. To address these challenges, we propose a new cost-sensitive deep learning approach to increase the robustness of deep learning classifiers against the imbalanced class problem in NTC. First, the dataset is divided into different partitions, and a cost matrix is created for each partition by considering the data distribution. Then, the costs are applied to the cost function layer to penalize classification errors. In our approach, costs are diverse in each type of misclassification because the cost matrix is specifically generated for each partition. To determine its utility, we implement the proposed cost-sensitive learning method in two deep learning classifiers, namely: stacked autoencoder and convolution neural networks. Our experiments on the ISCX VPN-nonVPN dataset show that the proposed approach can obtain higher classification performance on low-frequency classes, in comparison to three other NTC methods.

13 citations


Journal ArticleDOI
TL;DR: The proposed network traffic classification approach based on deep learning and data fusion techniques is presented and can identify encrypted traffic and distinguish between VPN and non-VPN network traffic.

12 citations


Journal ArticleDOI
TL;DR: In this article , the authors performed an in-depth comparative analysis of various popular machine learning algorithms using different effective features extracted from IoT network traffic, and provided a few suggestions for selecting the machine learning algorithm for different use cases based on the obtained results.
Abstract: Internet of Things (IoT) refers to a wide variety of embedded devices connected to the Internet, enabling them to transmit and share information in smart environments with each other. The regular monitoring of IoT network traffic generated from IoT devices is important for their proper functioning and detection of malicious activities. One such crucial activity is the classification of IoT devices in the network traffic. It enables the administrator to monitor the activities of IoT devices which can be useful for proper implementation of Quality of Service, detect malicious IoT devices, etc. In the literature, various methods are proposed for IoT traffic classification using various machine learning algorithms. However, the accuracy of these machine learning algorithms depends on the data generated from various IoT devices, features extracted from network traffic, site at which IoT is deployed, etc. Moreover, the selection of features and machine learning algorithms are manual operations that are prone to error. Therefore, it is important to study the network traffic characteristics as well as suitable machine learning algorithms for accurate and optimized IoT traffic classification. In this article, we perform an in-depth comparative analysis of various popular machine learning algorithms using different effective features extracted from IoT network traffic. We utilize a public data set having 20 days of network traces generated from 20 popular IoT devices. Network traces are first processed to extract the significant features. We then selected state-of-the-art machine learning algorithms based on the recent survey papers for the IoT traffic classification. We then comparatively evaluated the performance of those machine learning algorithms on the basis of classification accuracy, speed, training time, etc. Finally, we provided a few suggestions for selecting the machine learning algorithm for different use cases based on the obtained results.

12 citations



Journal ArticleDOI
TL;DR: In this article , a combination of the convolutional neural network (CNN), the ant-lion meta-heuristic algorithm (ALO), and the self-organizing map (SOM) was used to identify encrypted traffic and distinguish between VPN and non-VPN traffics.

11 citations


Journal ArticleDOI
TL;DR: In this article , the authors proposed an extended feature set including, flow, packet and device level features to characterize the IoT devices in the context of a smart environment, and presented insights into traffic characteristics using feature selection and correlation mechanisms.
Abstract: As the number of Internet of Things (IoT) devices and applications increases, the capacity of the IoT access networks is considerably stressed. This can create significant performance bottlenecks in various layers of an end-to-end communication path, including the scheduling of the spectrum, the resource requirements for processing the IoT data at the Edge and/or Cloud, and the attainable delay for critical emergency scenarios. Thus, a proper classification or prediction of the time varying traffic characteristics of the IoT devices is required. However, this classification remains at large an open challenge. Most of the existing solutions are based on machine learning techniques, which nonetheless present high computational cost, whereas they are not considering the fine-grained flow characteristics of the traffic. To this end, this paper introduces the following four contributions. Firstly, we provide an extended feature set including, flow, packet and device level features to characterize the IoT devices in the context of a smart environment. Secondly, we propose a custom weighting based preprocessing algorithm to determine the importance of the data values. Thirdly, we present insights into traffic characteristics using feature selection and correlation mechanisms. Finally, we develop a two-stage learning algorithm and we demonstrate its ability to accurately categorize the IoT devices in two different datasets. The evaluation results show that the proposed learning framework achieves 99.9% accuracy for the first dataset and 99.8% accuracy for the second. Additionally, for the first dataset we achieve a precision and recall performance of 99.6% and 99.5%, while for the second dataset the precission and recall attained is of 99.6% and 99.7% respectively. These results show that our approach clearly outperforms other well-known machine learning methods. Hence, this work provides a useful model deployed in a realistic IoT scenario, where IoT traffic and devices’ profiles are predicted and classified, while facilitating the data processing in the upper layers of an end-to-end communication model.

9 citations


Journal ArticleDOI
TL;DR: In this article , the applicability of an active form of ML, called Active Learning (AL), in NTC is investigated, which reduces the need for a large number of labeled examples by actively choosing the instances that should be labeled.
Abstract: Network Traffic Classification (NTC) has become an important feature in various network management operations, e.g., Quality of Service (QoS) provisioning and security services. Machine Learning (ML) algorithms as a popular approach for NTC can promise reasonable accuracy in classification and deal with encrypted traffic. However, ML-based NTC techniques suffer from the shortage of labeled traffic data which is the case in many real-world applications. This study investigates the applicability of an active form of ML, called Active Learning (AL), in NTC. AL reduces the need for a large number of labeled examples by actively choosing the instances that should be labeled. The study first provides an overview of NTC and its fundamental challenges along with surveying the literature on ML-based NTC methods. Then, it introduces the concepts of AL, discusses it in the context of NTC, and review the literature in this field. Further, challenges and open issues in AL-based classification of network traffic are discussed. Moreover, as a technical survey, some experiments are conducted to show the broad applicability of AL in NTC. The simulation results show that AL can achieve high accuracy with a small amount of data.


Journal ArticleDOI
TL;DR: The experimental results demonstrate that ETC-PS is superior to the state-of-the-art methods in terms of accuracy, $f_{1}$ score, time complexity, and stability.
Abstract: Although many network traffic protection methods have been developed to protect user privacy, encrypted traffic can still reveal sensitive user information with sophisticated analysis. In this paper, we propose ETC-PS, a novel encrypted traffic classification method with path signature. We first construct the traffic path with a session packet length sequence to represent the interactions between the client and the server. Then, path transformations are conducted to exhibit its structure and obtain different information. A multiscale path signature is finally computed as a kind of distinctive feature to train the traditional machine learning classifier, which achieves highly robust accuracy and low training overhead. Six publicly available datasets with different traffic types of HTTPS/1, HTTPS/2, QUIC, VPN, non-VPN, Tor, and non-Tor are used to conduct closed-world and open-world evaluations to verify the effectiveness of ETC-PS. The experimental results demonstrate that ETC-PS is superior to the state-of-the-art methods in terms of accuracy, $f_{1}$ score, time complexity, and stability.

Journal ArticleDOI
TL;DR: Festic, a few-shot learning based approach to IoT traffic classification, is proposed and the experimental results show that Festic has excellent classification accuracy and outperforms the state-of-the-art traffic classification methods.
Abstract: IoT traffic classification is an important step in network management. Efficient and accurate IoT traffic classification helps Internet Service Providers provide high-quality services to network users. At present, popular IoT traffic classification methods are using traditional machine learning or deep learning algorithm, which relies on a large amount of labeled traffic to construct the traffic-level fingerprinting. However, it is worth noting that some classes of IoT devices only generate limited labeled traffic when they are working, and this limited labeled traffic is insufficient for the aforementioned classification methods. In this letter, we propose Festic, a few-shot learning based approach to IoT traffic classification. Festic can accurately classify IoT traffic under conditions of insufficient labeled traffic. We evaluate Festic on two publicly available datasets, and the experimental results show that Festic has excellent classification accuracy and outperforms the state-of-the-art traffic classification methods.

Proceedings ArticleDOI
06 Jun 2022
TL;DR: A new framework and system is developed that enables a joint evaluation of both the conventional notions of machine learning performance and the systems-level costs of different representations of network traffic, and makes it possible to explore different representations for learning.
Abstract: Network management often relies on machine learning to make predictions about performance and security from network traffic. Often, the representation of the traffic is as important as the choice of the model. The features that the model relies on, and the representation of those features, ultimately determine model accuracy, as well as where and whether the model can be deployed in practice. Thus, the design and evaluation of these models ultimately requires understanding not only model accuracy but also the systems costs associated with deploying the model in an operational network. Towards this goal, this paper develops a new framework and system that enables a joint evaluation of both the conventional notions of machine learning performance (model accuracy) and the systems-level costs of different representations of network traffic. We highlight these two dimensions for two practical network management tasks, video streaming quality inference and malware detection, to demonstrate the importance of exploring different representations to find the appropriate operating point. We demonstrate the benefit of exploring a range of representations of network traffic and present Traffic Refinery, a proof-of-concept implementation that both monitors network traffic at 10~Gbps and transforms traffic in real time to produce a variety of feature representations for machine learning. Traffic Refinery both highlights this design space and makes it possible to explore different representations for learning, balancing systems costs related to feature extraction and model training against model accuracy.

Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors proposed an online multimedia traffic classification framework based on a Convolutional Neural Network (CNN), capable of conducting fast and early classification as well as class incremental learning.
Abstract: In the next generation of communication systems, data traffic is expected to increase dramatically and continuously. Particularly for multimedia traffic, it has a dominant share in the increasing traffic. Therefore, there is an urgent need to develop an effective and accurate scheme to achieve online and automatic traffic management. To this end, this paper proposes an online multimedia traffic classification framework based on a Convolutional Neural Network (CNN), capable of conducting fast and early classification as well as class incremental learning. First, the sliding window technique is applied to capture the flow slices for further feature extraction. Then, the 3-dimensional flow representation is extracted based on the probability distribution function. After that, according to the specific structure of features, a deeply adapted structure of CNN is devised to better learn the knowledge from the representation. Besides, to better support the addition of new services, a class incremental learning model is developed with the techniques of knowledge distillation and bias correction to achieve continuous learning without retraining from scratch. Our experimental results reveal that the proposed method achieves faster and more accurate traffic classification compared with the state-of-the-art. Additionally, the deployed scheme using incremental learning achieves drops by about 50% in both time and memory consumptions compared with existing methods, while guaranteeing the accurate classification after adding new classes.

Journal ArticleDOI
TL;DR: A surge period-based feature extraction method that helps remove the negative influence of background traffic in network sessions and acquire as many traffic flow features as possible and is of great significance to maintaining the security of the network environment is proposed.
Abstract: The identification of Internet protocols provides a significant basis for keeping Internet security and improving Internet Quality of Service (QoS). However, the overwhelming developments and updating of Internet technologies and protocols have led to large volumes of unknown Internet traffic, which threaten the safety of the network environment a lot. Since most of the unknown Internet traffic does not have any labels, it is difficult to adopt deep learning directly. Additionally, the feature accuracy and identification model also impact the identification accuracy a lot. In this paper, we propose a surge period-based feature extraction method that helps remove the negative influence of background traffic in network sessions and acquire as many traffic flow features as possible. In addition, we also establish an identification model of unknown Internet traffic based on JigClu, the self-supervised learning approach to training unlabeled datasets. It finally combines with the clustering method and realizes the further identification of unknown Internet traffic. The model has been demonstrated with an accuracy of no less than 74% in identifying unknown Internet traffic with the public dataset ISCXVPN2016 under different scenarios. The work provides a novel solution for unknown Internet traffic identification, which is the most difficult task in identifying Internet traffic. We believe it is a great leap in Internet traffic identification and is of great significance to maintaining the security of the network environment.

Journal ArticleDOI
TL;DR: Experimental results in real scenarios demonstrate that malicious traffic can be efficiently detected when only few-shot samples are learned, and the proposed scheme outperforms the state-of-the-art methods in detection accuracy.
Abstract: The number of malware attempts that try to bypass the existing Network Intrusion Detection System (NIDS) is increasing. To detect illegal access to servers, deep analysis of the server-side network traffic has become increasingly important. However, the existing approaches have serious performance limitations in terms of real-time and accurate traffic detection. These limitations are mainly because of i) the rigid feature extraction and rule matching techniques of NIDS, which are insensitive to incremental network traffic, and ii) the strong correlation and coupling of malicious traffic to large normal traffic. To address these limitations, we propose a Few-shot Latent Dirichlet Generative Learning (FLAG) scheme for semantic-aware traffic detection in this paper. In FLAG, a Latent Dirichlet Allocation (LDA)-based pseudo samples generation algorithm is designated to augment the few-shot training data, which is essential to improve traffic classification accuracy. Furthermore, we propose a Fuzziness Recycle Method (FRM) to further improve the long short-term memory (LSTM)-based classifier’s robustness. Experimental results in real scenarios demonstrate that malicious traffic can be efficiently detected when only few-shot samples are learned. The results also reveal that the proposed scheme outperforms the state-of-the-art methods in detection accuracy.

Journal ArticleDOI
TL;DR: This paper presents two approaches for dynamically classifying the traffic types of individual flows and transmitting them through a specific slice with an associated 5G quality-of-service identifier (5QI).
Abstract: Network slicing is a promising technique used in the smart delivery of traffic and can satisfy the requirements of specific applications or systems based on the features of the 5G network. To this end, an appropriate slice needs to be selected for each data flow to efficiently transmit data for different applications and heterogeneous requirements. To apply the slicing paradigm at the radio segment of a cellular network, this paper presents two approaches for dynamically classifying the traffic types of individual flows and transmitting them through a specific slice with an associated 5G quality-of-service identifier (5QI). Finally, using a 5G standalone (SA) experimental network solution, we apply the radio resource sharing configuration to prioritize traffic that is dispatched through the most suitable slice. The results demonstrate that the use of network slicing allows for higher efficiency and reliability for the most critical data in terms of packet loss or jitter.

Journal ArticleDOI
TL;DR: A time window-based approach is used to split the activity’s encrypted traffic flow into segments, so that in-app activities can be identified just by observing only a part of the activity-related encrypted traffic.
Abstract: In this study, a simple yet effective framework is proposed to characterize fine-grained in-app user activities performed on mobile applications using a convolutional neural network (CNN). The proposed framework uses a time window-based approach to split the activity’s encrypted traffic flow into segments, so that in-app activities can be identified just by observing only a part of the activity-related encrypted traffic. In this study, matrices were constructed for each encrypted traffic flow segment. These matrices acted as input into the CNN model, allowing it to learn to differentiate previously trained (known) and previously untrained (unknown) in-app activities as well as the known in-app activity type. The proposed method extracts and selects salient features for encrypted traffic classification. This is the first-known approach proposing to filter unknown traffic with an average accuracy of 88%. Once the unknown traffic is filtered, the classification accuracy of our model would be 92%.

Book ChapterDOI
TL;DR: Zhang et al. as mentioned in this paper considered few-shot DroidMal detection as DoridMal encrypted network traffic classification and proposed an image-based method with meta-learning, namely AMDetector, to address the issues.
Abstract: In the severe COVID-19 environment, encrypted mobile malware is increasingly threatening personal privacy, especially those targeting on Android platform. Existing methods mainly focus on extracting features from Android Malware (DroidMal) by reversing the binary samples, which is sensitive to the deduction of the available samples. Thus, they fail to tackle the insufficiency of the novel DoridMal. Therefore, it is necessary to investigate an effective solution to classify large-scale DroidMal, as well as to detect the novel one. We consider few-shot DroidMal detection as DoridMal encrypted network traffic classification and propose an image-based method with meta-learning, namely AMDetector, to address the issues. By capturing network traffic produced by DroidMal, samples are augmented and thus cater to the learning algorithms. Firstly, DroidMal encrypted traffic is converted to session images. Then, session images are embedded into a high dimension metric space, in which traffic samples can be linearly separated by computing the distance with the corresponding prototype. Large-scale and novel DroidMal traffic is classified by applying different meta-learning strategies. Experimental results on public datasets have demonstrated the capability of our method to classify large-scale known DroidMal traffic as well as to detect the novel one. It is encouraging to see that, our model achieves superior performance on known and novel DroidMal traffic classification among the state-of-the-arts. Moreover, AMDetector is able to classify the unseen cross-platform malware.

Journal ArticleDOI
TL;DR: In this paper , the authors review existing network classification techniques, such as port-based identification and those based on deep packet inspection, statistical features in conjunction with machine learning, and deep learning algorithms.

Proceedings ArticleDOI
08 Jan 2022
TL;DR: In this paper , the authors leverage the feasibility to use deep learning architectures for the tasks of malware detection and classification to gain insights into how well these architectures perform in the domain of malware traffic.
Abstract: The world of malware is shifting towards using encrypted traffic. While encryption improves the privacy of users, it brings challenges in the fields of QoS, QoE, and cybersecurity. Recent state-of-the-art Deep-Learning architectures for encrypted traffic classifications demonstrated superb results in tasks of traffic categorization over encrypted traffic. In this paper, we leverage the feasibility to use such architectures for the tasks of malware detection and classification to gain insights into how well these architectures perform in the domain of malware traffic. Specifically, we present a Deep-Learning model for malware traffic detection and classification (MalDIST), which outperforms both classical ML and DL malware traffic classification models both in terms of detection and classification.

Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper proposed an algorithm based on multitask learning to predict network traffic according to the spatial and temporal features of network traffic in industrial Internet of Things (IIoT) networks.
Abstract: With the rapid advance of industrial Internet of Things (IIoT), to provide flexible access for various infrastructures and applications, software-defined networks (SDNs) have been involved in constructing current IIoT networks. To improve the quality of services of industrial applications, network traffic prediction has become an important research direction, which is beneficial for network management and security. Unfortunately, the traffic flows of the SDN-enabled IIoT network contain a large number of irregular fluctuations, which makes network traffic prediction difficult. In this article, we propose an algorithm based on multitask learning to predict network traffic according to the spatial and temporal features of network traffic. Our proposed approach can effectively obtain network traffic predictors according to the evaluations by implementing it on real networks.

Journal ArticleDOI
TL;DR: By integrating random forest and extra trees in the CaForest framework, an end-to-end high-precision detector for small-scale and unbalanced SSL/TSL encrypted malicious traffic was realized and the experimental results showed that the detection rate of DF-IDS was 6.87% to 29.5% higher than that of other methods on a small- scale and un balanced dataset.
Abstract: The SSL/TLS protocol is widely used in data encryption transmission. Aiming at the problem of detecting SSL/TLS-encrypted malicious traffic with small-scale and unbalanced training data, a deep-forest-based detection method called DF-IDS is proposed in this paper. According to the characteristics of SSL/TSL protocol, the network traffic was split into sessions according to the 5-tuple information. Each session was then transformed into a two-dimensional traffic image as the input of a deep-learning classifier. In order to avoid information loss and improve the detection efficiency, the multi-grained cascade forest (gcForest) framework was simplified with only cascade structure, which was named cascade forest (CaForest). By integrating random forest and extra trees in the CaForest framework, an end-to-end high-precision detector for small-scale and unbalanced SSL/TSL encrypted malicious traffic was realized. Compared with other deep-learning-based methods, the experimental results showed that the detection rate of DF-IDS was 6.87% to 29.5% higher than that of other methods on a small-scale and unbalanced dataset. The advantage of DF-IDS was more obvious in the multi-classification case.

Journal ArticleDOI
01 Jul 2022-Sensors
TL;DR: The proposed work is coining a new method using an enhanced deep reinforcement learning (EDRL) algorithm to improve network traffic analysis and prediction, to contribute towards intelligence-based network traffic prediction and solve network management issues.
Abstract: Network data traffic is increasing with expanded networks for various applications, with text, image, audio, and video for inevitable needs. Network traffic pattern identification and analysis of traffic of data content are essential for different needs and different scenarios. Many approaches have been followed, both before and after the introduction of machine and deep learning algorithms as intelligence computation. The network traffic analysis is the process of incarcerating traffic of a network and observing it deeply to predict what the manifestation in traffic of the network is. To enhance the quality of service (QoS) of a network, it is important to estimate the network traffic and analyze its accuracy and precision, as well as the false positive and negative rates, with suitable algorithms. This proposed work is coining a new method using an enhanced deep reinforcement learning (EDRL) algorithm to improve network traffic analysis and prediction. The importance of this proposed work is to contribute towards intelligence-based network traffic prediction and solve network management issues. An experiment was carried out to check the accuracy and precision, as well as the false positive and negative parameters with EDRL. Also, convolutional neural network (CNN) machines and deep learning algorithms have been used to predict the different types of network traffic, which are labeled text-based, video-based, and unencrypted and encrypted data traffic. The EDRL algorithm has outperformed with mean Accuracy (97.20%), mean Precision (97.343%), mean false positive (2.657%) and mean false negative (2.527%) than the CNN algorithm.

Proceedings ArticleDOI
16 May 2022
TL;DR: Three models containing sets of additional input features to improve the prediction quality of different ML algorithms are proposed and proved to have high prediction quality aided by their proposed additional features.
Abstract: The knowledge about future traffic volumes is beneficial for the network operators in many areas. Short-term forecasting of multiple traffic types helps with efficient resource utilization by enabling near real-time adjustment. An important issue is the choice of a suitable prediction model to obtain the most accurate traffic forecasts. A machine learning (ML) algorithm picked for this task can be further tuned by an appropriate feature selection. In this paper, we propose three models containing sets of additional input features to improve the prediction quality of different ML algorithms. We evaluate our models on multiple datasets containing diverse types of network traffic. In extensive numerical experiments, we prove the high prediction quality of ML regression algorithms aided by our proposed additional features. Obtained mean absolute percentage errors (MAPE) are, depending on the predicted traffic type, as little as 1–10%.

Journal ArticleDOI
TL;DR: In this paper , an ensemble learning technique that is based on existing data pre-processing machine learning and deep learning techniques was proposed for the classification of non-VPN encrypted network traffic data.

Proceedings ArticleDOI
14 Jan 2022
TL;DR: Wang et al. as discussed by the authors proposed a network structure that combines CNN and Bi-GRU to learn the temporal and spatial features of encrypted traffic data, and used the public dataset ISCX VPN-non VPN to evaluate the effect of their model, and they used accuracy, recall and F1 score as criteria.
Abstract: With the rapid development of the Internet, the amount and types of network traffic have increased dramatically in tandem. Therefore, precise network traffic classification has become an essential aspect of network management. Furthermore, as user privacy and data encryption requirements have grown, more encrypted traffic has evolved. Because of the wide variety of encryption techniques and methodologies, network traffic categorization has become extremely challenging. Traditional traffic classification methods have been unable to meet the demand for classification accuracy. In this study, we propose a network structure that combines CNN and Bi-GRU to learn the temporal and spatial features of encrypted traffic data. We utilize the public dataset ISCX VPN-non VPN to evaluate the effect of our model, and we use accuracy, recall, and F1 score as criteria. Finally, our model had a classification accuracy of93.1 %, with a recall rate of 93.7 % and an F1 score of 93.6%. We also discussed about the differences between Bi-GRU and LSTM in terms of model parameter scale and time efficiency. Experiments show that Bi-GRU has the greatest classification effect and efficiency.

Journal ArticleDOI
TL;DR: In this paper , the authors present a comprehensive survey on traffic classification techniques by carefully reviewing existing methods from a new perspective, and comprehensively discuss the procedures and datasets for traffic classification.
Abstract: Traffic classification is considered an important research area due to the increasing demand in network users. It not only effectively improve the network service identifications and security issues of the traffic network, but also provide robust accuracy and efficiency in different Internet application behaviors and patterns. Several traffic classification techniques have been proposed and applied successfully in recent years. However, the existing literature lack of comprehensive survey which could provide an overview and analysis towards the recent developments in network traffic classification. To this end, this survey presents a comprehensive investigation on traffic classification techniques by carefully reviewing existing methods from a new perspective. We comprehensively discuss the procedures and datasets for traffic classification. Additionally, traffic criteria are proposed, which could be beneficial to assess the effectiveness of the developed classification algorithm. Then, the traffic classification techniques are discussed in detail. Then, we thoroughly discussed the machine learning (ML) methods for traffic classification. For researcher’s convenience, we present the traffic obfuscation techniques, which could be helpful for designing a better classifier. Finally, key findings and open research challenges for network traffic classification are identified along with recommendations for future research directions. In sum, this survey fills the gap of existing surveys and summarizes the latest research developments in traffic classification.