scispace - formally typeset
Search or ask a question
Institution

AT&T Labs

Company
About: AT&T Labs is a based out in . It is known for research contribution in the topics: Network packet & The Internet. The organization has 1879 authors who have published 5595 publications receiving 483151 citations.


Papers
More filters
Journal ArticleDOI
01 Aug 2009
TL;DR: This paper applies Bayesian analysis to decide dependence between sources and design an algorithm that iteratively detects dependence and discovers truth from conflicting information and extends the model by considering accuracy of data sources and similarity between values.
Abstract: Many data management applications, such as setting up Web portals, managing enterprise data, managing community data, and sharing scientific data, require integrating data from multiple sources. Each of these sources provides a set of values and different sources can often provide conflicting values. To present quality data to users, it is critical that data integration systems can resolve conflicts and discover true values. Typically, we expect a true value to be provided by more sources than any particular false one, so we can take the value provided by the majority of the sources as the truth. Unfortunately, a false value can be spread through copying and that makes truth discovery extremely tricky. In this paper, we consider how to find true values from conflicting information when there are a large number of sources, among which some may copy from others.We present a novel approach that considers dependence between data sources in truth discovery. Intuitively, if two data sources provide a large number of common values and many of these values are rarely provided by other sources (e.g., particular false values), it is very likely that one copies from the other. We apply Bayesian analysis to decide dependence between sources and design an algorithm that iteratively detects dependence and discovers truth from conflicting information. We also extend our model by considering accuracy of data sources and similarity between values. Our experiments on synthetic data as well as real-world data show that our algorithm can significantly improve accuracy of truth discovery and is scalable when there are a large number of data sources.

439 citations

Proceedings ArticleDOI
27 Jun 2006
TL;DR: This work defines a variety of essential and practical cost metrics associated with ODB systems and looks at solutions that can handle dynamic scenarios, where owners periodically update the data residing at the servers, both for static and dynamic environments.
Abstract: In outsourced database (ODB)systems the database owner publishes its data through a number of remote servers, with the goal of enabling clients at the edge of the network to access and query the data more efficiently. As servers might be untrusted or can be compromised, query authentication becomes an essential component of ODB systems. Existing solutions for this problem concentrate mostly on static scenarios and are based on idealistic properties for certain cryptographic primitives. In this work, first we define a variety of essential and practical cost metrics associated with ODB systems. Then, we analytically evaluate a number of different approaches, in search for a solution that best leverages all metrics. Most importantly, we look at solutions that can handle dynamic scenarios, where owners periodically update the data residing at the servers. Finally, we discuss query freshness, a new dimension in data authentication that has not been explored before. A comprehensive experimental evaluation of the proposed and existing approaches is used to validate the analytical models and verify our claims. Our findings exhibit that the proposed solutions improve performance substantially over existing approaches, both for static and dynamic environments.

434 citations

Journal ArticleDOI
Anja Feldmann1, Ward Whitt1
TL;DR: An algorithm for approximating a long-tail distribution by a hyperexponential distribution (a finite mixture of exponentials) is developed, proving that, in prinicple, it is possible to approximate distributions from a large class, including the Pareto and Weibull distributions, arbitrarily closely by hyperexPonential distributions.

434 citations

Journal Article
TL;DR: Novel methods for robust parsing that integrate the flexibility of linguistically motivated lexical descriptions with the robustness of statistical techniques are proposed.
Abstract: In this paper, we have proposed novel methods for robust parsing that integrate the flexibility of linguistically motivated lexical descriptions with the robustness of statistical techniques. Our thesis is that the computation of linguistic structure can be localized if lexical items are associated with rich descriptions (supertags) that impose complex constraints in a local context. The supertags are designed such that only those elements on which the lexical item imposes constraints appear within a given supertag. Further, each lexical item is associated with as many supertags as the number of different syntactic contexts in which the lexical item can appear. This makes the number of different descriptions for each lexical item much larger than when the descriptions are less complex, thus increasing the local ambiguity for a parser. But this local ambiguity can be resolved by using statistical distributions of supertag co-occurrences collected from a corpus of parses. We have explored these ideas in the context of the Lexicalized Tree-Adjoining Grammar (LTAG) framework. The supertags in LTAG combine both phrase structure information and dependency information in a single representation. Supertag disambiguation results in a representation that is effectively a parse (an almost parse), and the parser need "only" combine the individual supertags. This method of parsing can also be used to parse sentence fragments such as in spoken utterances where the disambiguated supertag sequence may not combine into a single structure.

434 citations

Proceedings ArticleDOI
18 Jun 2014
TL;DR: PrivBayes, a differentially private method for releasing high-dimensional data that circumvents the curse of dimensionality, and introduces a novel approach that uses a surrogate function for mutual information to build the model more accurately.
Abstract: Privacy-preserving data publishing is an important problem that has been the focus of extensive study. The state-of-the-art solution for this problem is differential privacy, which offers a strong degree of privacy protection without making restrictive assumptions about the adversary. Existing techniques using differential privacy, however, cannot effectively handle the publication of high-dimensional data. In particular, when the input dataset contains a large number of attributes, existing methods require injecting a prohibitive amount of noise compared to the signal in the data, which renders the published data next to useless.To address the deficiency of the existing methods, this paper presents PrivBayes, a differentially private method for releasing high-dimensional data. Given a dataset D, PrivBayes first constructs a Bayesian network N, which (i) provides a succinct model of the correlations among the attributes in D and (ii) allows us to approximate the distribution of data in D using a set P of low-dimensional marginals of D. After that, PrivBayes injects noise into each marginal in P to ensure differential privacy and then uses the noisy marginals and the Bayesian network to construct an approximation of the data distribution in D. Finally, PrivBayes samples tuples from the approximate distribution to construct a synthetic dataset, and then releases the synthetic data. Intuitively, PrivBayes circumvents the curse of dimensionality, as it injects noise into the low-dimensional marginals in P instead of the high-dimensional dataset D. Private construction of Bayesian networks turns out to be significantly challenging, and we introduce a novel approach that uses a surrogate function for mutual information to build the model more accurately. We experimentally evaluate PrivBayes on real data and demonstrate that it significantly outperforms existing solutions in terms of accuracy.

433 citations


Authors

Showing all 1881 results

NameH-indexPapersCitations
Yoshua Bengio2021033420313
Scott Shenker150454118017
Paul Shala Henry13731835971
Peter Stone130122979713
Yann LeCun121369171211
Louis E. Brus11334763052
Jennifer Rexford10239445277
Andreas F. Molisch9677747530
Vern Paxson9326748382
Lorrie Faith Cranor9232628728
Ward Whitt8942429938
Lawrence R. Rabiner8837870445
Thomas E. Graedel8634827860
William W. Cohen8538431495
Michael K. Reiter8438030267
Network Information
Related Institutions (5)
Microsoft
86.9K papers, 4.1M citations

94% related

Google
39.8K papers, 2.1M citations

91% related

Hewlett-Packard
59.8K papers, 1.4M citations

89% related

Bell Labs
59.8K papers, 3.1M citations

88% related

Performance
Metrics
No. of papers from the Institution in previous years
YearPapers
20225
202133
202069
201971
2018100
201791