scispace - formally typeset
Search or ask a question
Author

Anupam Gupta

Bio: Anupam Gupta is an academic researcher from Carnegie Mellon University. The author has contributed to research in topics: Approximation algorithm & Steiner tree problem. The author has an hindex of 55, co-authored 314 publications receiving 11295 citations. Previous affiliations of Anupam Gupta include Cincinnati Children's Hospital Medical Center & University of California, Berkeley.


Papers
More filters
Journal ArticleDOI
TL;DR: A result of Johnson and Lindenstrauss shows that a set of n points in high dimensional Euclidean space can be mapped into an O(log n/e2)-dimensional Euclidesan space such that the distance between any two points changes by only a factor of (1 ± e).
Abstract: A result of Johnson and Lindenstrauss [13] shows that a set of n points in high dimensional Euclidean space can be mapped into an O(log n/e2)-dimensional Euclidean space such that the distance between any two points changes by only a factor of (1 ± e). In this note, we prove this theorem using elementary probabilistic techniques.

1,036 citations

Proceedings ArticleDOI
11 Oct 2003
TL;DR: This work considers both general doubling metrics, as well as more restricted families such as those arising from trees, from graphs excluding a fixed minor, and from snowflaked metrics, which contains many families of metrics that occur in applied settings.
Abstract: The doubling constant of a metric space (X, d) is the smallest value /spl lambda/ such that every ball in X can be covered by /spl lambda/ balls of half the radius. The doubling dimension of X is then defined as dim (X) = log/sub 2//spl lambda/. A metric (or sequence of metrics) is called doubling precisely when its doubling dimension is bounded. This is a robust class of metric spaces which contains many families of metrics that occur in applied settings. We give tight bounds for embedding doubling metrics into (low-dimensional) normed spaces. We consider both general doubling metrics, as well as more restricted families such as those arising from trees, from graphs excluding a fixed minor, and from snowflaked metrics. Our techniques include decomposition theorems for doubling metrics, and an analysis of a fractal in the plane according to T. J. Laakso (2002). Finally, we discuss some applications and point out a central open question regarding dimensionality reduction in L/sub 2/.

511 citations

Proceedings ArticleDOI
19 Apr 2006
TL;DR: A data-driven approach to measuring the predictive quality of a set of sensor locations, predicting the communication cost involved with these placements, and designing an algorithm with provable quality guarantees that optimizes the NP-hard tradeoff is presented.
Abstract: When monitoring spatial phenomena with wireless sensor networks, selecting the best sensor placements is a fundamental task. Not only should the sensors be informative, but they should also be able to communicate efficiently. In this paper, we present a data-driven approach that addresses the three central aspects of this problem: measuring the predictive quality of a set of sensor locations (regardless of whether sensors were ever placed at these locations), predicting the communication cost involved with these placements, and designing an algorithm with provable quality guarantees that optimizes the NP-hard tradeoff. Specifically, we use data from a pilot deployment to build non-parametric probabilistic models called Gaussian Processes (GPs) both for the spatial phenomena of interest and for the spatial variability of link qualities, which allows us to estimate predictive power and communication cost of un-sensed locations. Surprisingly, uncertainty in the representation of link qualities plays an important role in estimating communication costs. Using these models, we present a novel, polynomial-time, data-driven algorithm, pSPIEL, which selects Sensor Placements at Informative and cost-Effective Locations. Our approach exploits two important properties of this problem: submodularity, formalizing the intuition that adding a node to a small deployment can help more than adding a node to a large deployment; and locality, under which nodes that are far from each other provide almost independent information. Exploiting these properties, we prove strong approximation guarantees for our pSPlEL approach. We also provide extensive experimental validation of this practical approach on several real-world placement problems, and built a complete system implementation on 46 Tmote Sky motes, demonstrating significant advantages over existing methods.

495 citations

Proceedings ArticleDOI
06 Jul 2001
TL;DR: This work establishes a relation between this collection of network design problems and a variant of the facility location problem introduced by Karger and Minkoff, and provides optimal and approximate algorithms for several variants of this problem, depending on whether the traffic matrix is required to be symmetric.
Abstract: Consider a setting in which a group of nodes, situated in a large underlying network, wishes to reserve bandwidth on which to support communication. Virtual private networks (VPNs) are services that support such a construct; rather than building a new physical network on the group of nodes that must be connected, bandwidth in the underlying network is reserved for communication within the group, forming a virtual “sub-network.”Provisioning a virtual private network over a set off terminals gives rise to the following general network design problem. We have bounds on the cumulative amount of traffic each terminal can send and receive; we must choose a path for each pair of terminals, and a bandwidth allocation for each edge of the network, so that any traffic matrix consistent with the given upper bounds can be feasibly routed. Thus, we are seeking to design a network that can support a continuum of possible traffic scenarios.We provide optimal and approximate algorithms for several variants of this problem, depending on whether the traffic matrix is required to be symmetric, and on whether the designed network is required to be a tree (a natural constraint in a number of basic applications). We also establish a relation between this collection of network design problems and a variant of the facility location problem introduced by Karger and Minkoff; we extend their results by providing a stronger approximation algorithm for this latter problem.

318 citations

Journal Article
TL;DR: This paper presents the Submodular Saturation algorithm, a simple and efficient algorithm with strong theoretical approximation guarantees for cases where the possible objective functions exhibit submodularity, an intuitive diminishing returns property, and proves that better approximation algorithms do not exist unless NP-complete problems admit efficient algorithms.
Abstract: In many applications, one has to actively select among a set of expensive observations before making an informed decision. For example, in environmental monitoring, we want to select locations to measure in order to most effectively predict spatial phenomena. Often, we want to select observations which are robust against a number of possible objective functions. Examples include minimizing the maximum posterior variance in Gaussian Process regression, robust experimental design, and sensor placement for outbreak detection. In this paper, we present the Submodular Saturation algorithm, a simple and efficient algorithm with strong theoretical approximation guarantees for cases where the possible objective functions exhibit submodularity, an intuitive diminishing returns property. Moreover, we prove that better approximation algorithms do not exist unless NP-complete problems admit efficient algorithms. We show how our algorithm can be extended to handle complex cost functions (incorporating non-unit observation cost or communication and path costs). We also show how the algorithm can be used to near-optimally trade off expected-case (e.g., the Mean Square Prediction Error in Gaussian Process regression) and worst-case (e.g., maximum predictive variance) performance. We show that many important machine learning problems fit our robust submodular observation selection formalism, and provide extensive empirical evaluation on several real-world problems. For Gaussian Process regression, our algorithm compares favorably with state-of-the-art heuristics described in the geostatistics literature, while being simpler, faster and providing theoretical guarantees. For robust experimental design, our algorithm performs favorably compared to SDP-based algorithms. c ©2008 Andreas Krause, H. Brendan McMahan, Carlos Guestrin and Anupam Gupta. KRAUSE, MCMAHAN, GUESTRIN AND GUPTA

307 citations


Cited by
More filters
Journal ArticleDOI

7,335 citations

Journal ArticleDOI

6,278 citations

Book
11 Aug 2014
TL;DR: The preponderance of this monograph is devoted to fundamental techniques for achieving differential privacy, and application of these techniques in creative combinations, using the query-release problem as an ongoing example.
Abstract: The problem of privacy-preserving data analysis has a long history spanning multiple disciplines. As electronic data about individuals becomes increasingly detailed, and as technology enables ever more powerful collection and curation of these data, the need increases for a robust, meaningful, and mathematically rigorous definition of privacy, together with a computationally rich class of algorithms that satisfy this definition. Differential Privacy is such a definition.After motivating and discussing the meaning of differential privacy, the preponderance of this monograph is devoted to fundamental techniques for achieving differential privacy, and application of these techniques in creative combinations, using the query-release problem as an ongoing example. A key point is that, by rethinking the computational goal, one can often obtain far better results than would be achieved by methodically replacing each step of a non-private computation with a differentially private implementation. Despite some astonishingly powerful computational results, there are still fundamental limitations — not just on what can be achieved with differential privacy but on what can be achieved with any method that protects against a complete breakdown in privacy. Virtually all the algorithms discussed herein maintain differential privacy against adversaries of arbitrary computational power. Certain algorithms are computationally intensive, others are efficient. Computational complexity for the adversary and the algorithm are both discussed.We then turn from fundamentals to applications other than queryrelease, discussing differentially private methods for mechanism design and machine learning. The vast majority of the literature on differentially private algorithms considers a single, static, database that is subject to many analyses. Differential privacy in other models, including distributed databases and computations on data streams is discussed.Finally, we note that this work is meant as a thorough introduction to the problems and techniques of differential privacy, but is not intended to be an exhaustive survey — there is by now a vast amount of work in differential privacy, and we can cover only a small portion of it.

5,190 citations

Journal ArticleDOI
TL;DR: Monocle is described, an unsupervised algorithm that increases the temporal resolution of transcriptome dynamics using single-cell RNA-Seq data collected at multiple time points that revealed switch-like changes in expression of key regulatory factors, sequential waves of gene regulation, and expression of regulators that were not known to act in differentiation.
Abstract: Defining the transcriptional dynamics of a temporal process such as cell differentiation is challenging owing to the high variability in gene expression between individual cells. Time-series gene expression analyses of bulk cells have difficulty distinguishing early and late phases of a transcriptional cascade or identifying rare subpopulations of cells, and single-cell proteomic methods rely on a priori knowledge of key distinguishing markers. Here we describe Monocle, an unsupervised algorithm that increases the temporal resolution of transcriptome dynamics using single-cell RNA-Seq data collected at multiple time points. Applied to the differentiation of primary human myoblasts, Monocle revealed switch-like changes in expression of key regulatory factors, sequential waves of gene regulation, and expression of regulators that were not known to act in differentiation. We validated some of these predicted regulators in a loss-of function screen. Monocle can in principle be used to recover single-cell gene expression kinetics from a wide array of cellular processes, including differentiation, proliferation and oncogenic transformation.

4,119 citations