scispace - formally typeset
Search or ask a question
Author

Dongmei Wang

Bio: Dongmei Wang is an academic researcher from AT&T Labs. The author has contributed to research in topics: Network packet & Payload (computing). The author has an hindex of 3, co-authored 3 publications receiving 1239 citations.

Papers
More filters
Proceedings ArticleDOI
17 May 2004
TL;DR: In this article, the authors identify the application level signatures by examining some available documentations, and packet-level traces, and then utilize the identified signatures to develop online filters that can efficiently and accurately track the P2P traffic even on high-speed network links.
Abstract: The ability to accurately identify the network traffic associated with different P2P applications is important to a broad range of network operations including application-specific traffic engineering, capacity planning, provisioning, service differentiation,etc. However, traditional traffic to higher-level application mapping techniques such as default server TCP or UDP network-port baseddisambiguation is highly inaccurate for some P2P applications.In this paper, we provide an efficient approach for identifying the P2P application traffic through application level signatures. We firstidentify the application level signatures by examining some available documentations, and packet-level traces. We then utilize the identified signatures to develop online filters that can efficiently and accurately track the P2P traffic even on high-speed network links.We examine the performance of our application-level identification approach using five popular P2P protocols. Our measurements show thatour technique achieves less than 5% false positive and false negative ratios in most cases. We also show that our approach only requires the examination of the very first few packets (less than 10packets) to identify a P2P connection, which makes our approach highly scalable. Our technique can significantly improve the P2P traffic volume estimates over what pure network port based approaches provide. For instance, we were able to identify 3 times as much traffic for the popular Kazaa P2P protocol, compared to the traditional port-based approach.

856 citations

Proceedings ArticleDOI
22 Aug 2005
TL;DR: This paper applies three statistical machine learning algorithms to automatically identify signatures for a range of applications and finds that this approach is highly accurate and scales to allow online application identification on high speed links.
Abstract: An accurate mapping of traffic to applications is important for a broad range of network management and measurement tasks. Internet applications have traditionally been identified using well-known default server network-port numbers in the TCP or UDP headers. However this approach has become increasingly inaccurate. An alternate, more accurate technique is to use specific application-level features in the protocol exchange to guide the identification. Unfortunately deriving the signatures manually is very time consuming and difficult.In this paper, we explore automatically extracting application signatures from IP traffic payload content. In particular we apply three statistical machine learning algorithms to automatically identify signatures for a range of applications. The results indicate that this approach is highly accurate and scales to allow online application identification on high speed links. We also discovered that content signatures still work in the presence of encryption. In these cases we were able to derive content signature for unencrypted handshakes negotiating the encryption parameters of a particular connection.

420 citations

01 Jan 2004
TL;DR: This paper first identifies the application level signatures by examining some available documentations, and packet-level traces, and utilizes the identified signatures to develop online filters that can efficiently and accurately track the P2P traffic even on high-speed network links.
Abstract: The ability to accurately identify the network trafc associated with different P2P applications is important to a broad range of network operations including application-specic trafc engineering, capacity planning, provisioning, service differentiation, etc. However, traditional trafc to higher-level application mapping techniques such as default server TCP or UDP network-port based disambiguation is highly inaccurate for some P2P applications. In this paper, we provide an efcient approach for identifying the P2P application trafc through application level signatures. We rst identify the application level signatures by examining some available documentations, and packet-level traces. We then utilize the identied signatures to develop online lters that can efciently and accurately track the P2P trafc even on high-speed network links. We examine the performance of our application-level identication approach using ve popular P2P protocols. Our measurements show that our technique achieves less than false positive and false negative ratios in most cases. We also show that our approach only requires the examination of the very rst few packets (less than packets) to identify a P2P connection, which makes our approach highly scalable. Our technique can signicantly improve the P2P trafc volume estimates over what pure network port based approaches provide. For instance, we were able to identify times as much trafc for the popular Kazaa P2P protocol, compared to the traditional port-based approach.

23 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: This survey paper looks at emerging research into the application of Machine Learning techniques to IP traffic classification - an inter-disciplinary blend of IP networking and data mining techniques.
Abstract: The research community has begun looking for IP traffic classification techniques that do not rely on `well known? TCP or UDP port numbers, or interpreting the contents of packet payloads. New work is emerging on the use of statistical traffic characteristics to assist in the identification and classification process. This survey paper looks at emerging research into the application of Machine Learning (ML) techniques to IP traffic classification - an inter-disciplinary blend of IP networking and data mining techniques. We provide context and motivation for the application of ML techniques to IP traffic classification, and review 18 significant works that cover the dominant period from 2004 to early 2007. These works are categorized and reviewed according to their choice of ML strategies and primary contributions to the literature. We also discuss a number of key requirements for the employment of ML-based traffic classifiers in operational IP networks, and qualitatively critique the extent to which the reviewed works meet these requirements. Open issues and challenges in the field are also discussed.

1,519 citations

Proceedings ArticleDOI
22 Aug 2005
TL;DR: This work presents a fundamentally different approach to classifying traffic flows according to the applications that generate them, based on observing and identifying patterns of host behavior at the transport layer and demonstrates the effectiveness of this approach on three real traces.
Abstract: We present a fundamentally different approach to classifying traffic flows according to the applications that generate them. In contrast to previous methods, our approach is based on observing and identifying patterns of host behavior at the transport layer. We analyze these patterns at three levels of increasing detail (i) the social, (ii) the functional and (iii) the application level. This multilevel approach of looking at traffic flow is probably the most important contribution of this paper. Furthermore, our approach has two important features. First, it operates in the dark, having (a) no access to packet payload, (b) no knowledge of port numbers and (c) no additional information other than what current flow collectors provide. These restrictions respect privacy, technological and practical constraints. Second, it can be tuned to balance the accuracy of the classification versus the number of successfully classified traffic flows. We demonstrate the effectiveness of our approach on three real traces. Our results show that we are able to classify 80%-90% of the traffic with more than 95% accuracy.

1,216 citations

Proceedings ArticleDOI
25 Oct 2004
TL;DR: In this article, the authors developed a systematic methodology to identify P2P flows at the transport layer, i.e., based on connection patterns of peer-to-peer networks, without relying on packet payload.
Abstract: Since the emergence of peer-to-peer (P2P) networking in the late '90s, P2P applications have multiplied, evolved and established themselves as the leading `growth app' of Internet traffic workload. In contrast to first-generation P2P networks which used well-defined port numbers, current P2P applications have the ability to disguise their existence through the use of arbitrary ports. As a result, reliable estimates of P2P traffic require examination of packet payload, a methodological landmine from legal, privacy, technical, logistic, and fiscal perspectives. Indeed, access to user payload is often rendered impossible by one of these factors, inhibiting trustworthy estimation of P2P traffic growth and dynamics. In this paper, we develop a systematic methodology to identify P2P flows at the transport layer, i.e., based on connection patterns of P2P networks, and without relying on packet payload. We believe our approach is the first method for characterizing P2P traffic using only knowledge of network dynamics rather than any user payload. To evaluate our methodology, we also develop a payload technique for P2P traffic identification, by reverse engineering and analyzing the nine most popular P2P protocols, and demonstrate its efficacy with the discovery of P2P protocols in our traces that were previously unknown to us. Finally, our results indicate that P2P traffic continues to grow unabatedly, contrary to reports in the popular media.

774 citations

Journal ArticleDOI
17 Aug 2008
TL;DR: The experiments demonstrated that P4P either improves or maintains the same level of application performance of native P2P applications, while, at the same time, it substantially reduces network provider cost compared with either native or latency-based localized P1P applications.
Abstract: As peer-to-peer (P2P) emerges as a major paradigm for scalable network application design, it also exposes significant new challenges in achieving efficient and fair utilization of Internet network resources. Being largely network-oblivious, many P2P applications may lead to inefficient network resource usage and/or low application performance. In this paper, we propose a simple architecture called P4P to allow for more effective cooperative traffic control between applications and network providers. We conducted extensive simulations and real-life experiments on the Internet to demonstrate the feasibility and effectiveness of P4P. Our experiments demonstrated that P4P either improves or maintains the same level of application performance of native P2P applications, while, at the same time, it substantially reduces network provider cost compared with either native or latency-based localized P2P applications.

769 citations

Proceedings ArticleDOI
11 Sep 2006
TL;DR: This work considers two unsupervised clustering algorithms, namely K-Means and DBSCAN, that have previously not been used for network traffic classification and evaluates these two algorithms and compares them to the previously used AutoClass algorithm, using empirical Internet traces.
Abstract: Classification of network traffic using port-based or payload-based analysis is becoming increasingly difficult with many peer-to-peer (P2P) applications using dynamic port numbers, masquerading techniques, and encryption to avoid detection. An alternative approach is to classify traffic by exploiting the distinctive characteristics of applications when they communicate on a network. We pursue this latter approach and demonstrate how cluster analysis can be used to effectively identify groups of traffic that are similar using only transport layer statistics. Our work considers two unsupervised clustering algorithms, namely K-Means and DBSCAN, that have previously not been used for network traffic classification. We evaluate these two algorithms and compare them to the previously used AutoClass algorithm, using empirical Internet traces. The experimental results show that both K-Means and DBSCAN work very well and much more quickly then AutoClass. Our results indicate that although DBSCAN has lower accuracy compared to K-Means and AutoClass, DBSCAN produces better clusters.

724 citations