scispace - formally typeset
Search or ask a question

Showing papers on "Traffic classification published in 2012"


Journal ArticleDOI
TL;DR: The persistently unsolved challenges in the field over the last decade are outlined, and several strategies for tackling these challenges are suggested to promote progress in the science of Internet traffic classification.
Abstract: Traffic classification technology has increased in relevance this decade, as it is now used in the definition and implementation of mechanisms for service differentiation, network design and engineering, security, accounting, advertising, and research. Over the past 10 years the research community and the networking industry have investigated, proposed and developed several classification approaches. While traffic classification techniques are improving in accuracy and efficiency, the continued proliferation of different Internet application behaviors, in addition to growing incentives to disguise some applications to avoid filtering or blocking, are among the reasons that traffic classification remains one of many open problems in Internet research. In this article we review recent achievements and discuss future directions in traffic classification, along with their trade-offs in applicability, reliability, and privacy. We outline the persistently unsolved challenges in the field over the last decade, and suggest several strategies for tackling these challenges to promote progress in the science of Internet traffic classification.

546 citations


Proceedings ArticleDOI
11 Jun 2012
TL;DR: These and other findings suggest that better protocol design, more careful spectrum allocation, and modified pricing schemes may be needed to accommodate the rise of M2M devices.
Abstract: Cellular network based Machine-to-Machine (M2M) communication is fast becoming a market-changing force for a wide spectrum of businesses and applications such as telematics, smart metering, point-of-sale terminals, and home security and automation systems. In this paper, we aim to answer the following important question: Does traffic generated by M2M devices impose new requirements and challenges for cellular network design and management? To answer this question, we take a first look at the characteristics of M2M traffic and compare it with traditional smartphone traffic. We have conducted our measurement analysis using a week-long traffic trace collected from a tier-1 cellular network in the United States. We characterize M2M traffic from a wide range of perspectives, including temporal dynamics, device mobility, application usage, and network performance.Our experimental results show that M2M traffic exhibits significantly different patterns than smartphone traffic in multiple aspects. For instance, M2M devices have a much larger ratio of uplink to downlink traffic volume, their traffic typically exhibits different diurnal patterns, they are more likely to generate synchronized traffic resulting in bursty aggregate traffic volumes, and are less mobile compared to smartphones. On the other hand, we also find that M2M devices are generally competing with smartphones for network resources in co-located geographical regions. These and other findings suggest that better protocol design, more careful spectrum allocation, and modified pricing schemes may be needed to accommodate the rise of M2M devices.

274 citations


Journal ArticleDOI
TL;DR: This work proposes to achieve classification accuracy by using statistics derived from sub-flows—a small number of most recent packets taken at any point in a flow's lifetime—to augment training datasets so that classification accuracy is maintained even when a classifier mixes up client-to-server and server- to-client directions for applications exhibiting asymmetric traffic characteristics.
Abstract: Machine Learning (ML) for classifying IP traffic has relied on the analysis of statistics of full flows or their first few packets only. However, automated QoS management for interactive traffic flows requires quick and timely classification well before the flows finish. Also, interactive flows are often long-lived and should be continuously monitored during their lifetime. We propose to achieve this by using statistics derived from sub-flows--a small number of most recent packets taken at any point in a flow's lifetime. Then, the ML classifier must be trained on a set of sub-flows, and we investigate different sub-flow selection strategies. We also propose to augment training datasets so that classification accuracy is maintained even when a classifier mixes up client-to-server and server-to-client directions for applications exhibiting asymmetric traffic characteristics. We demonstrate the effectiveness of our approach with the Naive Bayes and C4.5 Decision Tree ML algorithms, for the identification of first-person-shooter online game and VoIP traffic. Our results show that we can classify both applications with up to 99% Precision and 95% Recall within less than 1 s. Stable results are achieved regardless of where within a flow the classifier captures the packets and the traffic direction.

141 citations


Proceedings ArticleDOI
12 Mar 2012
TL;DR: A boosted classifier was constructed which was shown to have ability to distinguish between 7 different applications in test set of 76,632-1,622,710 unknown cases with average accuracy of 99.9%.
Abstract: Monitoring of the network performance in highspeed Internet infrastructure is a challenging task, as the requirements for the given quality level are service-dependent. Backbone QoS monitoring and analysis in Multi-hop Networks requires therefore knowledge about types of applications forming current network traffic. To overcome the drawbacks of existing methods for traffic classification, usage of C5.0 Machine Learning Algorithm (MLA) was proposed. On the basis of statistical traffic information received from volunteers and C5.0 algorithm we constructed a boosted classifier, which was shown to have ability to distinguish between 7 different applications in test set of 76,632–1,622,710 unknown cases with average accuracy of 99.3–99.9%. This high accuracy was achieved by using high quality training data collected by our system, a unique set of parameters used for both training and classification, an algorithm for recognizing flow direction and the C5.0 itself. Classified applications include Skype, FTP, torrent, web browser traffic, web radio, interactive gaming and SSH. We performed subsequent tries using different sets of parameters and both training and classification options. This paper shows how we collected accurate traffic data, presents arguments used in classification process, introduces the C5.0 classifier and its options, and finally evaluates and compares the obtained results.

128 citations


Journal ArticleDOI
TL;DR: A novel feature selection metric named Weighted Symmetrical Uncertainty (WSU) is proposed, which prefilters most of features with WSU metric and further uses a wrapper method to select features for a specific classifier with Area Under roc Curve (AUC) metric.

103 citations


Journal ArticleDOI
TL;DR: A novel two-step model, which seamlessly integrates these collective traffic statistics into the existing traffic classification system is proposed, which easily scales to classify traffic on 10Gbps links and displays performance improvement on all traffic classes and an overall error rate reduction.
Abstract: The ability to accurately and scalably classify network traffic is of critical importance to a wide range of management tasks of large networks, such as tier-1 ISP networks and global enterprise networks. Guided by the practical constraints and requirements of traffic classification in large networks, in this article, we explore the design of an accurate and scalable machine learning based flow-level traffic classification system, which is trained on a dataset of flow-level data that has been annotated with application protocol labels by a packet-level classifier. Our system employs a lightweight modular architecture, which combines a series of simple linear binary classifiers, each of which can be efficiently implemented and trained on vast amounts of flow data in parallel, and embraces three key innovative mechanisms, weighted threshold sampling, logistic calibration, and intelligent data partitioning, to achieve scalability while attaining high accuracy. Evaluations using real traffic data from multiple locations in a large ISP show that our system accurately reproduces the labels of the packet level classifier when runs on (unlabeled) flow records, while meeting the scalability and stability requirements of large ISP networks. Using training and test datasets that are two months apart and collected from two different locations, the flow error rates are only 3p for TCP flows and 0.4p for UDP flows. We further show that such error rates can be reduced by combining the information of spatial distributions of flows, or collective traffic statistics, during classification. We propose a novel two-step model, which seamlessly integrates these collective traffic statistics into the existing traffic classification system. Experimental results display performance improvement on all traffic classes and an overall error rate reduction by 15p. In addition to a high accuracy, at runtime, our implementation easily scales to classify traffic on 10Gbps links.

90 citations


Book ChapterDOI
12 Mar 2012
TL;DR: The application classification reveals a trend back to HTTP traffic, underlines the immense usage of flash videos, and unveils a participant of a Botnet in an access network connecting 600 users with the Internet.
Abstract: The fast changing application types and their behavior require consecutive measurements of access networks. In this paper, we present the results of a 14-day measurement in an access network connecting 600 users with the Internet. Our application classification reveals a trend back to HTTP traffic, underlines the immense usage of flash videos, and unveils a participant of a Botnet. In addition, flow and user statistics are presented, which resulting traffic models can be used for simulation and emulation of access networks.

85 citations


Proceedings ArticleDOI
16 Oct 2012
TL;DR: The proposed and evaluated DiffTor, a machine-learning-based approach that classifies Tor's encrypted circuits by application in real time and subsequently assigns distinct classes of service to each application, can considerably improve the experience of Tor clients.
Abstract: Tor is a low-latency anonymity-preserving network that enables its users to protect their privacy online. It consists of volunteer-operated routers from all around the world that serve hundreds of thousands of users every day. Due to congestion and a low relay-to-client ratio, Tor suffers from performance issues that can potentially discourage its wider adoption, and result in an overall weaker anonymity to all users.We seek to improve the performance of Tor by defining different classes of service for its traffic. We recognize that although the majority of Tor traffic is interactive web browsing, a relatively small amount of bulk downloading consumes an unfair amount of Tor's scarce bandwidth. Furthermore, these traffic classes have different time and bandwidth constraints; therefore, they should not be given the same Quality of Service (QoS), which Tor offers them today.We propose and evaluate DiffTor, a machine-learning-based approach that classifies Tor's encrypted circuits by application in real time and subsequently assigns distinct classes of service to each application. Our experiments confirm that we are able to classify circuits we generated on the live Tor network with an extremely high accuracy that exceeds 95%. We show that our real-time classification in combination with QoS can considerably improve the experience of Tor clients, as our simple techniques result in a 75% improvement in responsiveness and an 86% reduction in download times at the median for interactive users.

77 citations


Journal ArticleDOI
TL;DR: This paper proposes a novel approach that takes as input a labeled training data set and produces a set of signatures for matching the application classes presented in the data, and indicates that the signatures are of high quality, and exhibit low false negatives and false positives.

71 citations


Proceedings ArticleDOI
14 Nov 2012
TL;DR: The utility of one-way traffic of the particularly interesting class of unreachable services for monitoring network and service outages is demonstrated by analyzing the impact of interesting events the authors detected in the network of their university.
Abstract: Internet background radiation (IBR) is a very interesting piece of Internet traffic as it is the result of attacks and misconfigurations. Previous work has primarily analyzed IBR traffic to large unused IP address blocks called network telescopes. In this work, we build new techniques for monitoring one-way traffic in live networks with the main goals of 1) expanding our understanding of this interesting type of traffic towards live networks as well as of 2) making it useful for detecting and analyzing the impact of outages. Our first contribution is a classification scheme for dissecting one-way traffic into useful classes, including one-way traffic due to unreachable services, scanning, peer-to-peer applications, and backscatter. Our classification scheme is helpful for monitoring IBR traffic in live networks solely based on flow level data. After thoroughly validating our classifier, we use it to analyze a massive data-set that covers 7.41 petabytes of traffic from a large backbone network to shed light into the composition of one-way traffic. We find that the main sources of one-way traffic are malicious scanning, peer-to-peer applications, and outages. In addition, we report a number of interesting observations including that one-way traffic makes a very large fraction, i.e., between 34% and 67%, of the total number of flows to the monitored network, although it only accounts for only 3.4% of the number of packets, which suggests a new conceptual model for Internet traffic in which IBR is dominant in terms of flows. Finally, we demonstrate the utility of one-way traffic of the particularly interesting class of unreachable services for monitoring network and service outages by analyzing the impact of interesting events we detected in the network of our university.

58 citations


Patent
27 Feb 2012
TL;DR: In this article, a method, a device, and a storage medium provide for storing traffic policies pertaining to egress traffic to a network; receiving a traffic flow, computing a route for the traffic flow; identifying at least one of one or more labels associated with the traffic flows or a network address associated with a remote provider edge device associated with traffic flow.
Abstract: A method, a device, and a storage medium provide for storing traffic policies pertaining to egress traffic to a network; receiving a traffic flow; computing a route for the traffic flow; identifying at least one of one or more labels associated with the traffic flow or a network address associated with a remote provider edge device associated with the traffic flow; selecting one or more traffic policies in response to at least one of an identification of the one or more labels or an identification of the network address; and transmitting along the route in the network according to the one or more traffic policies.

Patent
Seth Keith1
27 Jun 2012
TL;DR: In this paper, a network shaping engine is used to optimize network traffic by employing means to prioritize data packets assigned to a network traffic class over other network traffic, by determining whether received data packets comprise a traffic class mark or indicia that indicates the data packets are part of a minimum latency traffic class.
Abstract: A network shaping engine can be used to optimize network traffic by employing means to prioritize data packets assigned to a network traffic class over other network traffic. The network shaping engine accomplishes network traffic optimization by determining whether received data packets comprise a traffic class mark or indicia that indicates the data packets are part of a minimum latency traffic class. After analyzing the packets, the network optimization engine sorts the data packets according to the identified traffic classes and transmits the packets. Data packets comprising a traffic class marking are transmitted according to a first transmission scheme while data packets that do not comprise a traffic class marking are transmitted according to a second transmission scheme that differs from the first transmission scheme.

Proceedings ArticleDOI
25 Mar 2012
TL;DR: This paper proposes an approach that is easy to bootstrap and deploy, as well as robust to changes in the traffic, such as the emergence of new applications, and exhibits very high accuracy in classifying each application on five traces from different ISPs captured between 2005 and 2011.
Abstract: Many research efforts propose the use of flow-level features (e.g., packet sizes and inter-arrival times) and machine learning algorithms to solve the traffic classification problem. However, these statistical methods have not made the anticipated impact in the real world. We attribute this to two main reasons: (a) training the classifiers and bootstrapping the system is cumbersome, (b) the resulting classifiers have limited ability to adapt gracefully as the traffic behavior changes. In this paper, we propose an approach that is easy to bootstrap and deploy, as well as robust to changes in the traffic, such as the emergence of new applications. The key novelty of our classifier is that it learns to identify the traffic of each application in isolation, instead of trying to distinguish one application from another. This is a very challenging task that hides many caveats and subtleties. To make this possible, we adapt and use subspace clustering, a powerful technique that has not been used before in this context. Subspace clustering allows the profiling of applications to be more precise by automatically eliminating irrelevant features. We show that our approach exhibits very high accuracy in classifying each application on five traces from different ISPs captured between 2005 and 2011. This new way of looking at application classification could generate powerful and practical solutions in the space of traffic monitoring and network management.

Proceedings ArticleDOI
14 Nov 2012
TL;DR: A software-based traffic classification engine running on commodity multi-core hardware, able to process in real-time aggregates of up to 14.2 Mpps over a single 10 Gbps interface, with significant advance with respect to the current state of the art in terms of achieved classification rates.
Abstract: In this paper we present a software-based traffic classification engine running on commodity multi-core hardware, able to process in real-time aggregates of up to 14.2 Mpps over a single 10 Gbps interface -- i.e., the maximum possible packet rate over a 10 Gbps Ethernet links given the minimum frame size of 64 Bytes. This significant advance with respect to the current state of the art in terms of achieved classification rates are made possible by:(i) the use of an improved network driver, PacketShader, to efficiently move batches of packets from the NIC to the main CPU;(ii) the use of lightweight statistical classification techniques exploiting the size of the first few packets of every observed flow;(iii) a careful tuning of critical parameters of the hardware environment and the software application itself.

Journal ArticleDOI
TL;DR: In the previous years, Skype has gained more and more popularity, since it is seen as the best VoIP software with good quality of sound, ease of use and one that works everywhere and with every OS.
Abstract: In the previous years, Skype has gained more and more popularity, since it is seen as the best VoIP software with good quality of sound, ease of use and one that works everywhere and with every OS. Because of its great diffusion, both the operators and the users are, for different reasons, interested in detecting Skype traffic. In this paper we propose a real-time algorithm (named Skype-Hunter) to detect and classify Skype traffic. In more detail, this novel method, by means of both signature-based and statistical procedures, is able to correctly reveal and classify the signaling traffic as well as the data traffic (calls and file transfers). To assess the effectiveness of the algorithm, experimental tests have been performed with several traffic data sets, collected in different network scenarios. Our system outperforms the ‘classical’ statistical traffic classifiers as well as the state-of-the-art ad hoc Skype classifier. Copyright © 2011 John Wiley & Sons, Ltd.

Proceedings ArticleDOI
27 Sep 2012
TL;DR: Results obtained show that the proposed Hierarchical classifier outperforms off-the-shelf non hierarchical classification algorithms by exhibiting average accuracy higher than 90%, with precision and recall that are higher than 95% for most popular classes of traffic.
Abstract: Traffic classification is still today a challenging problem given the ever evolving nature of the Internet in which new protocols and applications arise at a constant pace. In the past, so called behavioral approaches have been successfully proposed as valid alternatives to traditional DPI based tools to properly classify traffic into few and coarse classes. In this paper we push forward the adoption of behavioral classifiers by engineering a Hierarchical classifier that allows proper classification of traffic into more than twenty fine grained classes. Thorough engineering has been followed which considers both proper feature selection and testing seven different classification algorithms. Results obtained over actual and large data sets show that the proposed Hierarchical classifier outperforms off-the-shelf non hierarchical classification algorithms by exhibiting average accuracy higher than 90%, with precision and recall that are higher than 95% for most popular classes of traffic.

Journal ArticleDOI
TL;DR: This work processes an extremely heterogeneous dataset composed of four packet‐level traces with a traffic monitor able to apply different sampling policies and rates to the traffic and extracts several features both in aggregated and per‐flow fashion, providing empirical evidences of the impact of packet sampling on both traffic measurement and traffic classification.
Abstract: The use of packet sampling for traffic measurement has become mandatory for network operators to cope with the huge amount of data transmitted in today's networks, powered by increasingly faster transmission technologies. Therefore, many networking tasks must already deal with such reduced data, more available but less rich in information. In this work we assess the impact of packet sampling on various network monitoring-activities, with a particular focus on traffic characterization and classification. We process an extremely heterogeneous dataset composed of four packet-level traces (representative of different access technologies and operational environments) with a traffic monitor able to apply different sampling policies and rates to the traffic and extract several features both in aggregated and per-flow fashion, providing empirical evidences of the impact of packet sampling on both traffic measurement and traffic classification. First, we analyze feature distortion, quantified by means of two statistical metrics: most features appear already deteriorated under low sampling step, no matter the sampling policy, while only a few remain consistent under harsh sampling conditions, which may even cause some artifacts, undermining the correctness of measurements. Second, we evaluate the performance of traffic classification under sampling. The information content of features, even though deteriorated, still allows a good classification accuracy, provided that the classifier is trained with data obtained at the same sampling rate of the target data. The accuracy is also due to a thoughtful choice of a smart sampling policy which biases the sampling towards packets carrying the most useful information. Copyright © 2012 John Wiley & Sons, Ltd.

Posted Content
TL;DR: In this article, the authors study the network traffic patterns of BitTorrent network traffic and investigate its behavior by using the time series ARMA model, which can be used by Internet Service Providers to manage their network bandwidth and also detect any abnormality in their network.
Abstract: In recent years, there are some major changes in the way content is being distributed over the network. The content distribution techniques have recently started to embrace peer-to-peer (P2P) systems as an alternative to the traditional client-server architecture. P2P systemsthat are based on the BitTorrent protocol uses end-users’ resources to provide a cost effective distribution of bandwidth intensive content to thousands of users. The BitTorrent protocol system offers a scalable mechanism for distributing a large volume of data to a set of peers over the Internet. With the growing demand for file sharing and content distribution, BitTorrent has become one of the most popular Internet applications and contributes to a signification fraction of the Internet traffic. With the wide usage of the BitTorrent protocol system, it has basically solved one of the major problems where data can be quickly transferred to a group of interested parties. The strength of the BitTorrent protocol lies in efficient bandwidth utilization for the downloading and uploading processes. However, the usage of BitTorrent protocol also causes latency for other applications in terms of network bandwidth which in turn has caused concerns for the Internet Service Providers, who strives for quality of service for all their customers. In this paper, we study the network traffic patterns of theBitTorrent network traffic and investigate its behavior by usingthe time series ARMA model. Our experimental results show that BitTorrent network traffic can be modeled and forecasted by using ARMA models. We compared and evaluated the forecasted network traffic with the real traffic patterns. This modeling can be utilized by the Internet Service Providers to manage their network bandwidth and also detect any abnormality in their network.

Proceedings ArticleDOI
01 Dec 2012
TL;DR: A new method is proposed in this paper which builds a reliable identification model for flash crowd and DDoS attacks and achieves highest classification accuracy for DDoS flooding attacks with less than 1% of false positive rate.
Abstract: This paper surveys with the emerging research on various methods to identify the legitimate/illegitimate traffic on the network Here, the focus is on the effective early detection scheme for distinguishing Distributed Denial of Service (DDoS) attack traffic from normal flash crowd traffic The basic characteristics used to distinguish Distributed Denial of Service (DDoS) attacks from flash crowds are access intents, client request rates, cluster overlap, distribution of source IP address, distribution of clients and speed of traffic Various techniques related to these metrics are clearly illustrated and corresponding limitations are listed out with their justification A new method is proposed in this paper which builds a reliable identification model for flash crowd and DDoS attacks The proposed Probabilistic Neural Network based traffic pattern classification method is used for effective classification of attack traffic from legitimate traffic The proposed technique uses the normal traffic profile for their classification process which consists of single and joint distribution of various packet attributes The normal profile contains uniqueness in traffic distribution and also hard for the attackers to mimic as legitimate flow The proposed method achieves highest classification accuracy for DDoS flooding attacks with less than 1% of false positive rate

Proceedings ArticleDOI
11 Jun 2012
TL;DR: It is found that one-way traffic makes a very large fraction of all traffic in terms of flows, it can be primarily attributed to malicious causes, and it has declined since 2004 because of relative decrease of scan traffic.
Abstract: In this work we analyze a massive data-set that captures 5.23 petabytes of traffic to shed light into the composition of one-way traffic towards a large network based on a novel one-way traffic classifier. We find that one-way traffic makes a very large fraction of all traffic in terms of flows, it can be primarily attributed to malicious causes, and it has declined since 2004 because of relative decrease of scan traffic. In addition, we show how our classifier is useful for detecting network outages.

Journal ArticleDOI
TL;DR: Experimental results indicate that the combination of time series characteristics and the statistical properties not only make the established model more precise, but also improve the accuracy of network traffic classification.

Proceedings ArticleDOI
31 Aug 2012
TL;DR: CUTE is an automatic traffic classification method which relies on sets of weighted terms as protocol signatures, and its accuracy is as good as or better than existing complex classification schemes, i.e. precision and recall rates of more than 90%.
Abstract: Among different traffic classification approaches, Deep Packet Inspection (DPI) methods are considered as the most accurate. These methods, however, have two drawbacks: (i) they are not efficient since they use complex regular expressions as protocol signatures, and (ii) they require manual intervention to generate and maintain signatures, partly due to the signature complexity. In this paper, we present CUTE, an automatic traffic classification method, which relies on sets of weighted terms as protocol signatures. The key idea behind CUTE is an observation that, given appropriate weights, the occurrence of a specific term is more important than the relative location of terms in a flow. This observation is based on experimental evaluations as well as theoretical analysis, and leads to several key advantages over previous classification techniques: (i) CUTE is extremely faster than other classification schemes since matching flows with weighed terms is significantly faster than matching regular expressions; (ii) CUTE can classify network traffic using only the first few bytes of the flows in most cases; and (iii) Unlike most existing classification techniques, CUTE can be used to classify partial (or even slightly modified) flows. Even though CUTE replaces complex regular expressions with a set of simple terms, using theoretical analysis and experimental evaluations (based on two large packet traces from tier-one ISPs), we show that its accuracy is as good as or better than existing complex classification schemes, i.e. CUTE achieves precision and recall rates of more than 90%. Additionally, CUTE can successfully classify more than half of flows that other DPI methods fail to classify.

Proceedings ArticleDOI
01 Jul 2012
TL;DR: Two approaches to distinguish various HTTP content are suggested and evaluated: distributed among volunteers' machines and centralized running in the core of the network.
Abstract: Our previous work demonstrated the possibility of distinguishing several kinds of applications with accuracy of over 99%. Today, most of the traffic is generated by web browsers, which provide different kinds of services based on the HTTP protocol: web browsing, file downloads, audio and voice streaming through third-party plugins, etc. This paper suggests and evaluates two approaches to distinguish various HTTP content: distributed among volunteers' machines and centralized running in the core of the network. We also assess accuracy of the global classifier for both HTTP and non-HTTP traffic. We achieved accuracy of 94%, which supposed to be even higher in real-life usage. Finally, we provided graphical characteristics of different kinds of HTTP traffic.

Proceedings ArticleDOI
27 Sep 2012
TL;DR: This paper considers the design of a real-time SVM classifier at many Gbps to allow online detection of categories of applications and proposes a hardware accelerated SVMclassifier on a FPGA board.
Abstract: Understanding the composition of the Internet traffic has many applications nowadays, mainly tracking bandwidth consuming applications, QoS-based traffic engineering and lawful interception of illegal traffic. Although many classification methods such as Support Vector Machines (SVM) have demonstrated their accuracy, not enough attention has been paid to the practical implementation of lightweight classifiers. In this paper, we consider the design of a real-time SVM classifier at many Gbps to allow online detection of categories of applications. Our solution is based on the design of a hardware accelerated SVM classifier on a FPGA board.

Book ChapterDOI
28 May 2012
TL;DR: This work proposes a novel approach of considering harmonic mean as distance matric, and evaluates it in terms of three metrics on real-world encrypted traffic, and shows the classification has better performance compared with the previously.
Abstract: Classification analysis of network traffic based on port number or payload is becoming increasingly difficult from security to quality of service measurements, because of using dynamic port numbers, masquerading and various cryptographic techniques to avoid detection. Research tends to analyze flow statistical features with machine learning techniques. Clustering approaches do not require complex training procedure and large memory cost. However, the performance of clustering algorithm like k-Means still have own disadvantages. We propose a novel approach of considering harmonic mean as distance matric, and evaluate it in terms of three metrics on real-world encrypted traffic. The result shows the classification has better performance compared with the previously.

Proceedings ArticleDOI
01 Oct 2012
TL;DR: A framework that preprocesses and analyzes server log files to detect intrusions and could be used as a real-time anomaly detection system in any network where sufficient data is available.
Abstract: Information security has become a very important topic especially during the last years. Web services are becoming more complex and dynamic. This offers new possibilities for attackers to exploit vulnerabilities by inputting malicious queries or code. However, these attack attempts are often recorded in server logs. Analyzing these logs could be a way to detect intrusions either periodically or in real time. We propose a framework that preprocesses and analyzes these log files. HTTP queries are transformed to numerical matrices using n-gram analysis. The dimensionality of these matrices is reduced using principal component analysis and diffusion map methodology. Abnormal log lines can then be analyzed in more detail. We expand our previous work by elaborating the cluster analysis after obtaining the low-dimensional representation. The framework was tested with actual server log data collected from a large web service. Several previously unknown intrusions were found. Proposed methods could be customized to analyze any kind of log data. The system could be used as a real-time anomaly detection system in any network where sufficient data is available.

Proceedings ArticleDOI
25 Mar 2012
TL;DR: This paper proposes a new online method for traffic classification that combines the statistical and host-based approaches in order to construct a robust and precise method for early Internet traffic identification and shows that leveraging the traffic pattern of the host ameliorates the performance of statistical methods.
Abstract: The identification of Internet traffic applications is very important for ISPs and network administrators to protect their resources from unwanted traffic and prioritize some major applications. Statistical methods are preferred to port-based ones since they don't rely on the port number, which can change dynamically, and to deep packet inspection since they also work for encrypted traffic. These methods combine the statistical analysis of the application packet flow parameters, such as packet size and inter-packet time, with machine learning techniques. Other successful approaches rely on the way the hosts communicate and their traffic patterns to identify applications. In this paper, we propose a new online method for traffic classification that combines the statistical and host-based approaches in order to construct a robust and precise method for early Internet traffic identification. Without loss of generality we use the packet size as the main feature for the classification and we benefit from the traffic profile of the host (i.e., which application and how much) to refine the classification and decide in favor of this or that application. The host profile is then updated online based on the result of the classification of previous flows originated by or addressed to the same host. We evaluate our method on real traces using several applications. The results show that leveraging the traffic pattern of the host ameliorates the performance of statistical methods. They also prove the capacity of our solution to derive profiles for the traffic of Internet hosts and to identify the services they provide.

Proceedings ArticleDOI
28 Jun 2012
TL;DR: The contributions are to confirm the discrimination power of early classification as revealed by previous study, and explore it's accuracy vulnerability to forged packets - the experiments on both simulated and real SSH tunnel traces show the accuracy declines when forged packets are injected.
Abstract: The widely employment of traffic encryption, tunneling and other protection/obfuscation mechanisms in modern network applications, prompts the emergence of traffic behavior (i.e., packet direction pattern, size, and inter-arrival time) based classification approaches. Some proposals even demonstrate its potential for on-line early traffic classification - using the first 4-6 data packets at the beginning of a TCP connection to identify the corresponding application. Nevertheless, the related accuracy issues on early classification are still unclear when forged packets exist. The performance of such mechanism under malicious environment, where sophisticated forged data packets injection techniques are presented, had not been addressed. This work aims to touch the above issues, especially when forged packets are inserted before actual application transaction started. Our contributions are two-folded: (1) confirm the discrimination power of early classification as revealed by previous study; (2) explore it's accuracy vulnerability to forged packets the experiments on both simulated and real SSH tunnel traces show the accuracy declines when forged packets are injected. Our findings show that the intellective early classification methods still deserve further investigation before actual deployment.

Proceedings ArticleDOI
27 Sep 2012
TL;DR: The end-to-end QoS performance of the scheduler is analyzed in several simulation scenarios and shows that the proposed scheduler guarantees provision of QoS to users.
Abstract: Long Term Evolution (LTE) uses Single Carrier Frequency Division Multiple Access (SC-FDMA) as the uplink transmission scheme The Quality of Service (QoS) provision is one of the primary objectives of the wireless network operators In this paper, the end-to-end QoS performance of Bandwidth and QoS Aware (BQA) scheduler for LTE uplink is evaluated in heterogeneous traffic environment The BQA scheduler is designed to provide efficient allocation of radio resources to users according to the QoS requirements of various traffic classes and the instantaneous channel conditions The user QoS provision is ensured by using dynamic QoS weights Additionally, the delay sensitive traffic is facilitated by employing delay thresholds The BQA scheduler algorithm supports multi-bearer users The end-to-end QoS performance of the scheduler is analyzed in several simulation scenarios The results show that the proposed scheduler guarantees provision of QoS to users

Book ChapterDOI
Changhyun Lee1, DK Lee1, Sue Moon1
12 Mar 2012
TL;DR: It is shown that the UDP traffic has grown significantly in recent years on the authors' campus network; there has been a 46-fold increase in volume in the past four years.
Abstract: Transmission control protocol (TCP) has been the dominating protocol for Internet traffic for the past decades. Most network research based on traffic analysis (e.g., router buffer sizing and traffic classification) has been conducted assuming the dominance of TCP over other protocols. However, a few recent traffic statistics are showing a sign of significant UDP traffic growth at various points of Internet links [21]. In this paper we show that the UDP traffic has grown significantly in recent years on our campus network; we have observed a 46-fold increase in volume (from 0.47% to 22.0% of total bytes) in the past four years. The trace collected in 2011 shows that the grown volume is not from a small number of UDP hosts nor port numbers. In addition, the recent UDP flows are not sent at constant bit rate (CBR) for most cases, and the aggregated traffic shows burstiness close to TCP traffic.