Papers published on a yearly basis
Papers
More filters
••
22 Aug 2005TL;DR: This paper applies three statistical machine learning algorithms to automatically identify signatures for a range of applications and finds that this approach is highly accurate and scales to allow online application identification on high speed links.
Abstract: An accurate mapping of traffic to applications is important for a broad range of network management and measurement tasks. Internet applications have traditionally been identified using well-known default server network-port numbers in the TCP or UDP headers. However this approach has become increasingly inaccurate. An alternate, more accurate technique is to use specific application-level features in the protocol exchange to guide the identification. Unfortunately deriving the signatures manually is very time consuming and difficult.In this paper, we explore automatically extracting application signatures from IP traffic payload content. In particular we apply three statistical machine learning algorithms to automatically identify signatures for a range of applications. The results indicate that this approach is highly accurate and scales to allow online application identification on high speed links. We also discovered that content signatures still work in the presence of encryption. In these cases we were able to derive content signature for unencrypted handshakes negotiating the encryption parameters of a particular connection.
420 citations
••
12 Jun 2011TL;DR: This paper proposes new techniques based on clustering and regression for analyzing anonymized cellular network data to identify generally important locations, and to discern semantically meaningful locations such as home and work.
Abstract: People spend most of their time at a few key locations, such as home and work. Being able to identify how the movements of people cluster around these "important places" is crucial for a range of technology and policy decisions in areas such as telecommunications and transportation infrastructure deployment. In this paper, we propose new techniques based on clustering and regression for analyzing anonymized cellular network data to identify generally important locations, and to discern semantically meaningful locations such as home and work. Starting with temporally sparse and spatially coarse location information, we propose a new algorithm to identify important locations. We test this algorithm on arbitrary cellphone users, including those with low call rates, and find that we are within 3 miles of ground truth for 88% of volunteer users. Further, after locating home and work, we achieve commute distance estimates that are within 1 mile of equivalent estimates derived from government census data. Finally, we perform carbon footprint analyses on hundreds of thousands of anonymous users as an example of how our data and algorithms can form an accurate and efficient underpinning for policy and infrastructure studies.
416 citations
••
16 Nov 2002TL;DR: It is found that the primary use of workplace IM was for complex work discussions, and evidence of two distinct styles of use is found: heavy IM users and frequent IM partners mainly used it to work together and light users and infrequent pairs mainly used IM to coordinate.
Abstract: Current perceptions of Instant Messaging (IM) use are based primarily on self-report studies. We logged thousands of (mostly) workplace IM conversations and evaluated their conversational characteristics and functions. Contrary to prior research, we found that the primary use of workplace IM was for complex work discussions. Only 28% of conversations were simple, single-purpose interactions and only 31% were about scheduling or coordination. Moreover, people rarely switched from IM to another medium when the conversation got complex. We found evidence of two distinct styles of use. Heavy IM users and frequent IM partners mainly used it to work together: to discuss a broad range of topics via many fast-paced interactions per day, each with many short turns and much threading and multitasking. Light users and infrequent pairs mainly used IM to coordinate: for scheduling, via fewer conversations per day that were shorter, slower-paced with less threading and multitasking.
414 citations
••
TL;DR: This article proves sanity-check bounds for the error of the leave-one-out cross-validation estimate of the generalization error: that is, bounds showing that the worst-case error of this estimate is not much worse than that of the training error estimate.
Abstract: In this article we prove sanity-check bounds for the error of the leave-one-out cross-validation estimate of the generalization error: that is, bounds showing that the worst-case error of this estimate is not much worse than that of the training error estimate. The name sanity check refers to the fact that although we often expect the leave-one-out estimate to perform considerably better than the training error estimate, we are here only seeking assurance that its performance will not be considerably worse. Perhaps surprisingly, such assurance has been given only for limited cases in the prior literature on cross-validation. Any nontrivial bound on the error of leave-one-out must rely on some notion of algorithmic stability. Previous bounds relied on the rather strong notion of hypothesis stability, whose application was primarily limited to nearest-neighbor and other local algorithms. Here we introduce the new and weaker notion of error stability and apply it to obtain sanity-check bounds for leave-one-out for other classes of learning algorithms, including training error minimization procedures and Bayesian algorithms. We also provide lower bounds demonstrating the necessity of some form of error stability for proving bounds on the error of the leave-one-out estimate, and the fact that for training error minimization algorithms, in the worst case such bounds must still depend on the Vapnik-Chervonenkis dimension of the hypothesis class.
411 citations
••
17 May 2002TL;DR: This paper focuses on the region-based memory management of Cyclone and its static typing discipline, and combines default annotations, local type inference, and a novel treatment of region effects to reduce this burden.
Abstract: Cyclone is a type-safe programming language derived from C. The primary design goal of Cyclone is to let programmers control data representation and memory management without sacrificing type-safety. In this paper, we focus on the region-based memory management of Cyclone and its static typing discipline. The design incorporates several advancements, including support for region subtyping and a coherent integration with stack allocation and a garbage collector. To support separate compilation, Cyclone requires programmers to write some explicit region annotations, but a combination of default annotations, local type inference, and a novel treatment of region effects reduces this burden. As a result, we integrate C idioms in a region-based framework. In our experience, porting legacy C to Cyclone has required altering about 8% of the code; of the changes, only 6% (of the 8%) were region annotations.
407 citations
Authors
Showing all 1881 results
Name | H-index | Papers | Citations |
---|---|---|---|
Yoshua Bengio | 202 | 1033 | 420313 |
Scott Shenker | 150 | 454 | 118017 |
Paul Shala Henry | 137 | 318 | 35971 |
Peter Stone | 130 | 1229 | 79713 |
Yann LeCun | 121 | 369 | 171211 |
Louis E. Brus | 113 | 347 | 63052 |
Jennifer Rexford | 102 | 394 | 45277 |
Andreas F. Molisch | 96 | 777 | 47530 |
Vern Paxson | 93 | 267 | 48382 |
Lorrie Faith Cranor | 92 | 326 | 28728 |
Ward Whitt | 89 | 424 | 29938 |
Lawrence R. Rabiner | 88 | 378 | 70445 |
Thomas E. Graedel | 86 | 348 | 27860 |
William W. Cohen | 85 | 384 | 31495 |
Michael K. Reiter | 84 | 380 | 30267 |