scispace - formally typeset
Search or ask a question
Institution

AT&T Labs

Company
About: AT&T Labs is a based out in . It is known for research contribution in the topics: Network packet & The Internet. The organization has 1879 authors who have published 5595 publications receiving 483151 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: A distributed architecture for inter-domain aggregated resource reservation for unicast traffic is described, and an associated protocol, called the Border Gateway Reservation Protocol (BGRP), is presented that scales well, in terms of message processing load, state storage and bandwidth.
Abstract: Resource reservation must operate in an efficient and scalable fashion, to accommodate the rapid growth of the Internet. In this paper, we describe a distributed architecture for inter-domain aggregated resource reservation for unicast traffic. We also present an associated protocol, called the Border Gateway Reservation Protocol (BGRP), that scales well, in terms of message processing load, state storage and bandwidth. Each stub or transit domain may use its own intra-domain resource reservation protocol. BGRP builds a sink tree for each of the stub domains. Each sink tree aggregates bandwidth reservations from all data sources in the network. Since backbone routers maintain only the sink tree information, the total number of reservations at each router scales linearly with the number of Internet domains N. (Even aggregated versions of the current protocol RSVP have a reservation count that can grow like O(N2).) BGRP maintains these aggregated reservations using “soft state.” To further reduce the protocol message traffic, routers may reserve bandwidth beyond the current load, so that some sources can join or leave the tree without sending messages all the way to the tree root. BGRP relies on Differentiated Services for data forwarding, hence the number of packet classifier entries is extremely small.

101 citations

Journal ArticleDOI
01 Dec 2015
TL;DR: A novel framework within which quantitative and logical data cleaning approaches can be used synergistically to combine their respective strengths is proposed, and it is proved that every instance that can be generated by the repair algorithm is set-minimal.
Abstract: Quantitative data cleaning relies on the use of statistical methods to identify and repair data quality problems while logical data cleaning tackles the same problems using various forms of logical reasoning over declarative dependencies. Each of these approaches has its strengths: the logical approach is able to capture subtle data quality problems using sophisticated dependencies, while the quantitative approach excels at ensuring that the repaired data has desired statistical properties. We propose a novel framework within which these two approaches can be used synergistically to combine their respective strengths.We instantiate our framework using (i) metric functional dependencies, a type of dependency that generalizes functional dependencies (FDs) to identify inconsistencies in domains where only large differences in metric data are considered to be a data quality problem, and (ii) repairs that modify the inconsistent data so as to minimize statistical distortion, measured using the Earth Mover's Distance. We show that the problem of computing a statistical distortion minimal repair is NP-hard. Given this complexity, we present an efficient algorithm for finding a minimal repair that has a small statistical distortion using EMD computation over semantically related attributes. To identify semantically related attributes, we present a sound and complete axiomatization and an efficient algorithm for testing implication of metric FDs. While the complexity of inference for some other FD extensions is co-NP complete, we show that the inference problem for metric FDs remains linear, as in traditional FDs. We prove that every instance that can be generated by our repair algorithm is set-minimal (with no unnecessary changes). Our experimental evaluation demonstrates that our techniques obtain a considerably lower statistical distortion than existing repair techniques, while achieving similar levels of efficiency.

101 citations

Proceedings ArticleDOI
01 Apr 2003
TL;DR: These active learning methods for reducing the labeling effort in a statistical call classification system used for AT&T customer care indicate that it is possible to reduce human labeling effort at least by a factor of two.
Abstract: We describe active learning methods for reducing the labeling effort in a statistical call classification system. Active learning aims to minimize the number of labeled utterances by automatically selecting for labeling the utterances that are likely to be most informative. The first method, inspired by certainty-based active learning, selects the examples that the classifier is least confident about. The second method, inspired by committee-based active learning, selects the examples that multiple classifiers do not agree on. We have evaluated these active learning methods using a call classification system used for AT&T customer care. Our results indicate that it is possible to reduce human labeling effort at least by a factor of two.

100 citations

Book ChapterDOI
31 Aug 2004
TL;DR: This work presents a lossless compression strategy to store and access large boolean matrices efficiently on disk, and adapts classical TSP heuristics by means of instance-partitioning and sampling.
Abstract: Large boolean matrices are a basic representational unit in a variety of applications, with some notable examples being interactive visualization systems, mining large graph structures, and association rule mining. Designing space and time efficient scalable storage and query mechanisms for such large matrices is a challenging problem. We present a lossless compression strategy to store and access such large matrices efficiently on disk. Our approach is based on viewing the columns of the matrix as points in a very high dimensional Hamming space, and then formulating an appropriate optimization problem that reduces to solving an instance of the Traveling Salesman Problem on this space. Finding good solutions to large TSP's in high dimensional Hamming spaces is itself a challenging and little-explored problem -- we cannot readily exploit geometry to avoid the need to examine all N2 inter-city distances and instances can be too large for standard TSP codes to run in main memory. Our multi-faceted approach adapts classical TSP heuristics by means of instance-partitioning and sampling, and may be of independent interest. For instances derived from interactive visualization and telephone call data we obtain significant improvement in access time over standard techniques, and for the visualization application we also make significant improvements in compression.

100 citations

Journal ArticleDOI
TL;DR: An aggregate view based on certain organized sets of large-valued regions (“heavy hitters”) corresponding to hierarchically discounted frequency counts is developed, and online algorithms that find approximate HHHs in one pass are presented, with provable accuracy guarantees.
Abstract: Data items that arrive online as streams typically have attributes which take values from one or more hierarchies (time and geographic location, source and destination IP addresses, etc.). Providing an aggregate view of such data is important for summarization, visualization, and analysis. We develop an aggregate view based on certain organized sets of large-valued regions (“heavy hitters”) corresponding to hierarchically discounted frequency counts. We formally define the notion of hierarchical heavy hitters (HHHs). We first consider computing (approximate) HHHs over a data stream drawn from a single hierarchical attribute. We formalize the problem and give deterministic algorithms to find them in a single pass over the input.In order to analyze a wider range of realistic data streams (e.g., from IP traffic-monitoring applications), we generalize this problem to multiple dimensions. Here, the semantics of HHHs are more complex, since a “child” node can have multiple “parent” nodes. We present online algorithms that find approximate HHHs in one pass, with provable accuracy guarantees. The product of hierarchical dimensions forms a mathematical lattice structure. Our algorithms exploit this structure, and so are able to track approximate HHHs using only a small, fixed number of statistics per stored item, regardless of the number of dimensions.We show experimentally, using real data, that our proposed algorithms yields outputs which are very similar (virtually identical, in many cases) to offline computations of the exact solutions, whereas straightforward heavy-hitters-based approaches give significantly inferior answer quality. Furthermore, the proposed algorithms result in an order of magnitude savings in data structure size while performing competitively.

100 citations


Authors

Showing all 1881 results

NameH-indexPapersCitations
Yoshua Bengio2021033420313
Scott Shenker150454118017
Paul Shala Henry13731835971
Peter Stone130122979713
Yann LeCun121369171211
Louis E. Brus11334763052
Jennifer Rexford10239445277
Andreas F. Molisch9677747530
Vern Paxson9326748382
Lorrie Faith Cranor9232628728
Ward Whitt8942429938
Lawrence R. Rabiner8837870445
Thomas E. Graedel8634827860
William W. Cohen8538431495
Michael K. Reiter8438030267
Network Information
Related Institutions (5)
Microsoft
86.9K papers, 4.1M citations

94% related

Google
39.8K papers, 2.1M citations

91% related

Hewlett-Packard
59.8K papers, 1.4M citations

89% related

Bell Labs
59.8K papers, 3.1M citations

88% related

Performance
Metrics
No. of papers from the Institution in previous years
YearPapers
20225
202133
202069
201971
2018100
201791