scispace - formally typeset
Search or ask a question
Institution

AT&T Labs

Company
About: AT&T Labs is a based out in . It is known for research contribution in the topics: Network packet & The Internet. The organization has 1879 authors who have published 5595 publications receiving 483151 citations.


Papers
More filters
Proceedings ArticleDOI
07 Apr 2008
TL;DR: This work focuses on weighted similarity functions like TF/IDF, and introduces variants that are well suited for set similarity selections in a relational database context that have special semantic properties that can be exploited to design very efficient index structures and algorithms for answering queries efficiently.
Abstract: Data collections often have inconsistencies that arise due to a variety of reasons, and it is desirable to be able to identify and resolve them efficiently. Set similarity queries are commonly used in data cleaning for matching similar data. In this work we concentrate on set similarity selection queries: Given a query set, retrieve all sets in a collection with similarity greater than some threshold. Various set similarity measures have been proposed in the past for data cleaning purposes. In this work we concentrate on weighted similarity functions like TF/IDF, and introduce variants that are well suited for set similarity selections in a relational database context. These variants have special semantic properties that can be exploited to design very efficient index structures and algorithms for answering queries efficiently. We present modifications of existing technologies to work for set similarity selection queries. We also introduce three novel algorithms based on the Threshold Algorithm, that exploit the semantic properties of the new similarity measures to achieve the best performance in theory and practice.

112 citations

Book ChapterDOI
Mikkel Thorup1
08 Jul 2004
TL;DR: A solution to the fully-dynamic all pairs shortest path problem for a directed graph with arbitrary weights allowing negative cycles by supporting each vertex update in O(n^2({\rm log} n + {\rm log^2}(\overline{m}/n))\) amortized time.
Abstract: We present a solution to the fully-dynamic all pairs shortest path problem for a directed graph with arbitrary weights allowing negative cycles. We support each vertex update in \(O(n^2({\rm log} n + {\rm log^2}(\overline{m}/n)))\) amortized time. Here, n is the number vertices, m the number of edges and \(\overline{m} = n + m\). A vertex update inserts or deletes a vertex with all incident edges, and we update a complete distance matrix accordingly. The algorithm runs on a comparison-addition based pointer-machine.

112 citations

Proceedings ArticleDOI
18 Apr 2016
TL;DR: This paper proposes a novel distributed reconstruction technique, called Partial Parallel Repair (PPR), which divides the reconstruction operation to small partial operations and schedules them on multiple nodes already involved in the data reconstruction, and reduces repair time and degraded read time significantly.
Abstract: With the explosion of data in applications all around us, erasure coded storage has emerged as an attractive alternative to replication because even with significantly lower storage overhead, they provide better reliability against data loss. Reed-Solomon code is the most widely used erasure code because it provides maximum reliability for a given storage overhead and is flexible in the choice of coding parameters that determine the achievable reliability. However, reconstruction time for unavailable data becomes prohibitively long mainly because of network bottlenecks. Some proposed solutions either use additional storage or limit the coding parameters that can be used. In this paper, we propose a novel distributed reconstruction technique, called Partial Parallel Repair (PPR), which divides the reconstruction operation to small partial operations and schedules them on multiple nodes already involved in the data reconstruction. Then a distributed protocol progressively combines these partial results to reconstruct the unavailable data blocks and this technique reduces the network pressure. Theoretically, our technique can complete the network transfer in ⌈(log2(k + 1))⌉ time, compared to k time needed for a (k, m) Reed-Solomon code. Our experiments show that PPR reduces repair time and degraded read time significantly. Moreover, our technique is compatible with existing erasure codes and does not require any additional storage overhead. We demonstrate this by overlaying PPR on top of two prior schemes, Local Reconstruction Code and Rotated Reed-Solomon code, to gain additional savings in reconstruction time.

112 citations

Proceedings ArticleDOI
27 May 2003
TL;DR: This paper investigates adapting a lexicalized probabilistic context-free grammar (PCFG) to a novel domain, using maximum a posteriori (MAP) estimation, and shows F-measure parsing accuracy gains of as much as 2.5% for high accuracy lexicalization parsing through the use of out-of-domain treebanks.
Abstract: This paper investigates adapting a lexicalized probabilistic context-free grammar (PCFG) to a novel domain, using maximum a posteriori (MAP) estimation. The MAP framework is general enough to include some previous model adaptation approaches, such as corpus mixing in Gildea (2001), for example. Other approaches falling within this framework are more effective. In contrast to the results in Gildea (2001), we show F-measure parsing accuracy gains of as much as 2.5% for high accuracy lexicalized parsing through the use of out-of-domain treebanks, with the largest gains when the amount of indomain data is small. MAP adaptation can also be based on either supervised or unsupervised adaptation data. Even when no in-domain treebank is available, unsupervised techniques provide a substantial accuracy gain over unadapted grammars, as much as nearly 5% F-measure improvement.

112 citations

Proceedings ArticleDOI
01 Aug 2000
TL;DR: This work formally model a soft database as a noisy version of some unknown hard database and forms hardening as an optimization problem and gives a nontrivial nearly linear time algorithm for nding a local optimum.
Abstract: The web contains a large quantity of unstructured information. In many cases, it is possible to heuristically extract structured information, but the resulting databases are \soft": they contain inconsistencies and duplication, and lack unique, consistently-used object identi ers. Examples include large bibliographic databases harvested from raw scienti c papers or databases constructed by merging heterogeneous \hard" databases. Here we formally model a soft database as a noisy version of some unknown hard database. We then consider the hardening problem, i.e., the problem of inferring the most likely underlying hard database given a particular soft database. A key feature of our approach is that hardening is global | many sources of evidence for a given hard fact are taken into account. We formulate hardening as an optimization problem and give a nontrivial nearly linear time algorithm for nding a local optimum.

112 citations


Authors

Showing all 1881 results

NameH-indexPapersCitations
Yoshua Bengio2021033420313
Scott Shenker150454118017
Paul Shala Henry13731835971
Peter Stone130122979713
Yann LeCun121369171211
Louis E. Brus11334763052
Jennifer Rexford10239445277
Andreas F. Molisch9677747530
Vern Paxson9326748382
Lorrie Faith Cranor9232628728
Ward Whitt8942429938
Lawrence R. Rabiner8837870445
Thomas E. Graedel8634827860
William W. Cohen8538431495
Michael K. Reiter8438030267
Network Information
Related Institutions (5)
Microsoft
86.9K papers, 4.1M citations

94% related

Google
39.8K papers, 2.1M citations

91% related

Hewlett-Packard
59.8K papers, 1.4M citations

89% related

Bell Labs
59.8K papers, 3.1M citations

88% related

Performance
Metrics
No. of papers from the Institution in previous years
YearPapers
20225
202133
202069
201971
2018100
201791