Home
/
Authors
/
Dongmei Wang

Author

Dongmei Wang

Bio: Dongmei Wang is an academic researcher from AT&T Labs. The author has contributed to research in topics: Network packet & Payload (computing). The author has an hindex of 3, co-authored 3 publications receiving 1239 citations.

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Accurate, scalable in-network identification of p2p traffic using application signatures

[...]

Subhabrata Sen¹, Oliver Spatscheck¹, Dongmei Wang¹•Institutions (1)

AT&T Labs¹

17 May 2004

TL;DR: In this article, the authors identify the application level signatures by examining some available documentations, and packet-level traces, and then utilize the identified signatures to develop online filters that can efficiently and accurately track the P2P traffic even on high-speed network links.

...read moreread less

Abstract: The ability to accurately identify the network traffic associated with different P2P applications is important to a broad range of network operations including application-specific traffic engineering, capacity planning, provisioning, service differentiation,etc. However, traditional traffic to higher-level application mapping techniques such as default server TCP or UDP network-port baseddisambiguation is highly inaccurate for some P2P applications.In this paper, we provide an efficient approach for identifying the P2P application traffic through application level signatures. We firstidentify the application level signatures by examining some available documentations, and packet-level traces. We then utilize the identified signatures to develop online filters that can efficiently and accurately track the P2P traffic even on high-speed network links.We examine the performance of our application-level identification approach using five popular P2P protocols. Our measurements show thatour technique achieves less than 5% false positive and false negative ratios in most cases. We also show that our approach only requires the examination of the very first few packets (less than 10packets) to identify a P2P connection, which makes our approach highly scalable. Our technique can significantly improve the P2P traffic volume estimates over what pure network port based approaches provide. For instance, we were able to identify 3 times as much traffic for the popular Kazaa P2P protocol, compared to the traditional port-based approach.

...read moreread less

856 citations

Proceedings Article•DOI•

ACAS: automated construction of application signatures

[...]

Patrick Haffner¹, Subhabrata Sen¹, Oliver Spatscheck¹, Dongmei Wang¹•Institutions (1)

AT&T Labs¹

22 Aug 2005

TL;DR: This paper applies three statistical machine learning algorithms to automatically identify signatures for a range of applications and finds that this approach is highly accurate and scales to allow online application identification on high speed links.

...read moreread less

Abstract: An accurate mapping of traffic to applications is important for a broad range of network management and measurement tasks. Internet applications have traditionally been identified using well-known default server network-port numbers in the TCP or UDP headers. However this approach has become increasingly inaccurate. An alternate, more accurate technique is to use specific application-level features in the protocol exchange to guide the identification. Unfortunately deriving the signatures manually is very time consuming and difficult.In this paper, we explore automatically extracting application signatures from IP traffic payload content. In particular we apply three statistical machine learning algorithms to automatically identify signatures for a range of applications. The results indicate that this approach is highly accurate and scales to allow online application identification on high speed links. We also discovered that content signatures still work in the presence of encryption. In these cases we were able to derive content signature for unencrypted handshakes negotiating the encryption parameters of a particular connection.

...read moreread less

420 citations

Accurate, Scalable In›Network Identication of P2P Trafc Using Application Signatures

[...]

Subhabrata Sen, Oliver Spatscheck, Dongmei Wang

01 Jan 2004

TL;DR: This paper first identifies the application level signatures by examining some available documentations, and packet-level traces, and utilizes the identified signatures to develop online filters that can efficiently and accurately track the P2P traffic even on high-speed network links.

...read moreread less

Abstract: The ability to accurately identify the network trafc associated with different P2P applications is important to a broad range of network operations including application-specic trafc engineering, capacity planning, provisioning, service differentiation, etc. However, traditional trafc to higher-level application mapping techniques such as default server TCP or UDP network-port based disambiguation is highly inaccurate for some P2P applications. In this paper, we provide an efcient approach for identifying the P2P application trafc through application level signatures. We rst identify the application level signatures by examining some available documentations, and packet-level traces. We then utilize the identied signatures to develop online lters that can efciently and accurately track the P2P trafc even on high-speed network links. We examine the performance of our application-level identication approach using ve popular P2P protocols. Our measurements show that our technique achieves less than false positive and false negative ratios in most cases. We also show that our approach only requires the examination of the very rst few packets (less than packets) to identify a P2P connection, which makes our approach highly scalable. Our technique can signicantly improve the P2P trafc volume estimates over what pure network port based approaches provide. For instance, we were able to identify times as much trafc for the popular Kazaa P2P protocol, compared to the traditional port-based approach.

...read moreread less

23 citations

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

A survey of techniques for internet traffic classification using machine learning

[...]

Thuy T. T. Nguyen¹, Grenville Armitage¹•Institutions (1)

Swinburne University of Technology¹

01 Oct 2008-IEEE Communications Surveys and Tutorials

TL;DR: This survey paper looks at emerging research into the application of Machine Learning techniques to IP traffic classification - an inter-disciplinary blend of IP networking and data mining techniques.

...read moreread less

Abstract: The research community has begun looking for IP traffic classification techniques that do not rely on `well known? TCP or UDP port numbers, or interpreting the contents of packet payloads. New work is emerging on the use of statistical traffic characteristics to assist in the identification and classification process. This survey paper looks at emerging research into the application of Machine Learning (ML) techniques to IP traffic classification - an inter-disciplinary blend of IP networking and data mining techniques. We provide context and motivation for the application of ML techniques to IP traffic classification, and review 18 significant works that cover the dominant period from 2004 to early 2007. These works are categorized and reviewed according to their choice of ML strategies and primary contributions to the literature. We also discuss a number of key requirements for the employment of ML-based traffic classifiers in operational IP networks, and qualitatively critique the extent to which the reviewed works meet these requirements. Open issues and challenges in the field are also discussed.

...read moreread less

1,519 citations

Proceedings Article•DOI•

BLINC: multilevel traffic classification in the dark

[...]

Thomas Karagiannis¹, Konstantina Papagiannaki², Michalis Faloutsos¹•Institutions (2)

University of California, Riverside¹, Intel²

22 Aug 2005

TL;DR: This work presents a fundamentally different approach to classifying traffic flows according to the applications that generate them, based on observing and identifying patterns of host behavior at the transport layer and demonstrates the effectiveness of this approach on three real traces.

...read moreread less

Abstract: We present a fundamentally different approach to classifying traffic flows according to the applications that generate them. In contrast to previous methods, our approach is based on observing and identifying patterns of host behavior at the transport layer. We analyze these patterns at three levels of increasing detail (i) the social, (ii) the functional and (iii) the application level. This multilevel approach of looking at traffic flow is probably the most important contribution of this paper. Furthermore, our approach has two important features. First, it operates in the dark, having (a) no access to packet payload, (b) no knowledge of port numbers and (c) no additional information other than what current flow collectors provide. These restrictions respect privacy, technological and practical constraints. Second, it can be tuned to balance the accuracy of the classification versus the number of successfully classified traffic flows. We demonstrate the effectiveness of our approach on three real traces. Our results show that we are able to classify 80%-90% of the traffic with more than 95% accuracy.

...read moreread less

1,216 citations

Proceedings Article•DOI•

Transport layer identification of P2P traffic

[...]

Thomas Karagiannis¹, Andre Broido, Michalis Faloutsos¹, kc claffy•Institutions (1)

University of California, Riverside¹

25 Oct 2004

TL;DR: In this article, the authors developed a systematic methodology to identify P2P flows at the transport layer, i.e., based on connection patterns of peer-to-peer networks, without relying on packet payload.

...read moreread less

Abstract: Since the emergence of peer-to-peer (P2P) networking in the late '90s, P2P applications have multiplied, evolved and established themselves as the leading `growth app' of Internet traffic workload. In contrast to first-generation P2P networks which used well-defined port numbers, current P2P applications have the ability to disguise their existence through the use of arbitrary ports. As a result, reliable estimates of P2P traffic require examination of packet payload, a methodological landmine from legal, privacy, technical, logistic, and fiscal perspectives. Indeed, access to user payload is often rendered impossible by one of these factors, inhibiting trustworthy estimation of P2P traffic growth and dynamics. In this paper, we develop a systematic methodology to identify P2P flows at the transport layer, i.e., based on connection patterns of P2P networks, and without relying on packet payload. We believe our approach is the first method for characterizing P2P traffic using only knowledge of network dynamics rather than any user payload. To evaluate our methodology, we also develop a payload technique for P2P traffic identification, by reverse engineering and analyzing the nine most popular P2P protocols, and demonstrate its efficacy with the discovery of P2P protocols in our traces that were previously unknown to us. Finally, our results indicate that P2P traffic continues to grow unabatedly, contrary to reports in the popular media.

...read moreread less

774 citations

Journal Article•DOI•

P4p: provider portal for applications

[...]

Haiyong Xie¹, Y. Richard Yang¹, Arvind Krishnamurthy², Yanbin Grace Liu³, A. Silberschatz¹ - Show less +1 more•Institutions (3)

Yale University¹, University of Washington², IBM³

17 Aug 2008

TL;DR: The experiments demonstrated that P4P either improves or maintains the same level of application performance of native P2P applications, while, at the same time, it substantially reduces network provider cost compared with either native or latency-based localized P1P applications.

...read moreread less

Abstract: As peer-to-peer (P2P) emerges as a major paradigm for scalable network application design, it also exposes significant new challenges in achieving efficient and fair utilization of Internet network resources. Being largely network-oblivious, many P2P applications may lead to inefficient network resource usage and/or low application performance. In this paper, we propose a simple architecture called P4P to allow for more effective cooperative traffic control between applications and network providers. We conducted extensive simulations and real-life experiments on the Internet to demonstrate the feasibility and effectiveness of P4P. Our experiments demonstrated that P4P either improves or maintains the same level of application performance of native P2P applications, while, at the same time, it substantially reduces network provider cost compared with either native or latency-based localized P2P applications.

...read moreread less

769 citations

Proceedings Article•DOI•

Traffic classification using clustering algorithms

[...]

Jeffrey Erman¹, Martin Arlitt¹, Anirban Mahanti¹•Institutions (1)

University of Calgary¹

11 Sep 2006

TL;DR: This work considers two unsupervised clustering algorithms, namely K-Means and DBSCAN, that have previously not been used for network traffic classification and evaluates these two algorithms and compares them to the previously used AutoClass algorithm, using empirical Internet traces.

...read moreread less

Abstract: Classification of network traffic using port-based or payload-based analysis is becoming increasingly difficult with many peer-to-peer (P2P) applications using dynamic port numbers, masquerading techniques, and encryption to avoid detection. An alternative approach is to classify traffic by exploiting the distinctive characteristics of applications when they communicate on a network. We pursue this latter approach and demonstrate how cluster analysis can be used to effectively identify groups of traffic that are similar using only transport layer statistics. Our work considers two unsupervised clustering algorithms, namely K-Means and DBSCAN, that have previously not been used for network traffic classification. We evaluate these two algorithms and compare them to the previously used AutoClass algorithm, using empirical Internet traces. The experimental results show that both K-Means and DBSCAN work very well and much more quickly then AutoClass. Our results indicate that although DBSCAN has lower accuracy compared to K-Means and AutoClass, DBSCAN produces better clusters.

...read moreread less

724 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse