Papers published on a yearly basis
Papers
More filters
•
09 Dec 2003TL;DR: The results show that the average AUC is monotonically increasing as a function of the classification accuracy, but that the standard deviation for uneven distributions and higher error rates is noticeable, so algorithms designed to minimize the error rate may not lead to the best possible AUC values.
Abstract: The area under an ROC curve (AUC) is a criterion used in many applications to measure the quality of a classification algorithm. However, the objective function optimized in most of these algorithms is the error rate and not the AUC value. We give a detailed statistical analysis of the relationship between the AUC and the error rate, including the first exact expression of the expected value and the variance of the AUC for a fixed error rate. Our results show that the average AUC is monotonically increasing as a function of the classification accuracy, but that the standard deviation for uneven distributions and higher error rates is noticeable. Thus, algorithms designed to minimize the error rate may not lead to the best possible AUC values. We show that, under certain conditions, the global function optimized by the RankBoost algorithm is exactly the AUC. We report the results of our experiments with RankBoost in several datasets demonstrating the benefits of an algorithm specifically designed to globally optimize the AUC over other existing algorithms optimizing an approximation of the AUC or only locally optimizing the AUC.
621 citations
••
TL;DR: A sensor-driven, or sentient, platform for context-aware computing that enables applications to follow mobile users as they move around a building and presents it in a form suitable for application programmers is described.
Abstract: We describe a sensor-driven, or sentient, platform for context-aware computing that enables applications to follow mobile users as they move around a building. The platform is particularly suitable for richly equipped, networked environments. The only item a user is required to carry is a small sensor tag, which identifies them to the system and locates them accurately in three dimensions. The platform builds a dynamic model of the environment using these location sensors and resource information gathered by telemetry software, and presents it in a form suitable for application programmers. Use of the platform is illustrated through a practical example, which allows a user's current working desktop to follow them as they move around the environment.
620 citations
••
TL;DR: The most impressive feature of the data structure is its constant query time, hence the name "oracle", and it provides faster constructions of sparse spanners of weighted graphs, and improved tree covers and distance labelings of weighted or unweighted graphs.
Abstract: Let G = (V,E) be an undirected weighted graph with vVv = n and vEv = m. Let k ≥ 1 be an integer. We show that G = (V,E) can be preprocessed in O(kmn1/k) expected time, constructing a data structure of size O(kn1p1/k), such that any subsequent distance query can be answered, approximately, in O(k) time. The approximate distance returned is of stretch at most 2k−1, that is, the quotient obtained by dividing the estimated distance by the actual distance lies between 1 and 2k−1. A 1963 girth conjecture of Erdos, implies that Ω(n1p1/k) space is needed in the worst case for any real stretch strictly smaller than 2kp1. The space requirement of our algorithm is, therefore, essentially optimal. The most impressive feature of our data structure is its constant query time, hence the name "oracle". Previously, data structures that used only O(n1p1/k) space had a query time of Ω(n1/k).Our algorithms are extremely simple and easy to implement efficiently. They also provide faster constructions of sparse spanners of weighted graphs, and improved tree covers and distance labelings of weighted or unweighted graphs.
618 citations
••
06 Jul 2002TL;DR: New learning algorithms for natural language processing based on the perceptron algorithm are introduced, showing how the algorithms can be efficiently applied to exponential sized representations of parse trees, such as the "all subtrees" (DOP) representation described by (Bod 1998).
Abstract: This paper introduces new learning algorithms for natural language processing based on the perceptron algorithm. We show how the algorithms can be efficiently applied to exponential sized representations of parse trees, such as the "all subtrees" (DOP) representation described by (Bod 1998), or a representation tracking all sub-fragments of a tagged sentence. We give experimental results showing significant improvements on two tasks: parsing Wall Street Journal text, and named-entity extraction from web data.
618 citations
••
30 Aug 1999TL;DR: A new service interface is proposed, termed a hose, to provide the appropriate performance abstraction to manage network resources in the face of increased uncertainty, and the statistical multiplexing and resizing techniques deal effectively with uncertainties about the traffic.
Abstract: As IP technologies providing both tremendous capacity and the ability to establish dynamic secure associations between endpoints emerge, Virtual Private Networks (VPNs) are going through dramatic growth. The number of endpoints per VPN is growing and the communication pattern between endpoints is becoming increasingly hard to forecast. Consequently, users are demanding dependable, dynamic connectivity between endpoints, with the network expected to accommodate any traffic matrix, as long as the traffic to the endpoints does not overwhelm the rates of the respective ingress and egress links. We propose a new service interface, termed a hose, to provide the appropriate performance abstraction. A hose is characterized by the aggregate traffic to and from one endpoint in the VPN to the set of other endpoints in the VPN, and by an associated performance guarantee.Hoses provide important advantages to a VPN customer: (i) flexibility to send traffic to a set of endpoints without having to specify the detailed traffic matrix, and (ii) reduction in the size of access links through multiplexing gains obtained from the natural aggregation of the flows between endpoints. As compared with the conventional point to point (or customer-pipe) model for managing QoS, hoses provide reduction in the state information a customer must maintain. On the other hand, hoses would appear to increase the complexity of the already difficult problem of resource management to support QoS. To manage network resources in the face of this increased uncertainty, we consider both conventional statistical multiplexing techniques, and a new resizing technique based on online measurements.To study these performance issues, we run trace driven simulations, using traffic derived from AT&T's voice network, and from a large corporate data network. From the customer's perspective, we find that aggregation of traffic at the hose level provides significant multiplexing gains. From the provider's perspective, we find that the statistical multiplexing and resizing techniques deal effectively with uncertainties about the traffic, providing significant gains over the conventional alternative of a mesh of statically sized customer-pipes between endpoints.
615 citations
Authors
Showing all 1881 results
Name | H-index | Papers | Citations |
---|---|---|---|
Yoshua Bengio | 202 | 1033 | 420313 |
Scott Shenker | 150 | 454 | 118017 |
Paul Shala Henry | 137 | 318 | 35971 |
Peter Stone | 130 | 1229 | 79713 |
Yann LeCun | 121 | 369 | 171211 |
Louis E. Brus | 113 | 347 | 63052 |
Jennifer Rexford | 102 | 394 | 45277 |
Andreas F. Molisch | 96 | 777 | 47530 |
Vern Paxson | 93 | 267 | 48382 |
Lorrie Faith Cranor | 92 | 326 | 28728 |
Ward Whitt | 89 | 424 | 29938 |
Lawrence R. Rabiner | 88 | 378 | 70445 |
Thomas E. Graedel | 86 | 348 | 27860 |
William W. Cohen | 85 | 384 | 31495 |
Michael K. Reiter | 84 | 380 | 30267 |