scispace - formally typeset
Search or ask a question

Showing papers by "Vahab Mirrokni published in 2014"


Proceedings ArticleDOI
18 Jun 2014
TL;DR: The composable core-sets the authors construct are small and accurate: their approximation factor almost matches that of the best "off-line" algorithms for the relevant optimization problems (up to a constant factor).
Abstract: In this paper we consider efficient construction of "composable core-sets" for basic diversity and coverage maximization problems. A core-set for a point-set in a metric space is a subset of the point-set with the property that an approximate solution to the whole point-set can be obtained given the core-set alone. A composable core-set has the property that for a collection of sets, the approximate solution to the union of the sets in the collection can be obtained given the union of the composable core-sets for the point sets in the collection. Using composable core-sets one can obtain efficient solutions to a wide variety of massive data processing applications, including nearest neighbor search, streaming algorithms and map-reduce computation.Our main results are algorithms for constructing composable core-sets for several notions of "diversity objective functions", a topic that attracted a significant amount of research over the last few years. The composable core-sets we construct are small and accurate: their approximation factor almost matches that of the best "off-line" algorithms for the relevant optimization problems (up to a constant factor). Moreover, we also show applications of our results to diverse nearest neighbor search, streaming algorithms and map-reduce computation. Finally, we show that for an alternative notion of diversity maximization based on the maximum coverage problem small composable core-sets do not exist.

138 citations


Journal ArticleDOI
TL;DR: This paper formalizes this combined optimization problem as a multiobjective stochastic control problem and derives an efficient policy for online ad allocation in settings with general joint distribution over placement quality and exchange prices and proves the asymptotic optimality of this policy.
Abstract: It is clear from the growing role of ad exchanges in the real-time sale of advertising slots that Web publishers are considering a new alternative to their more traditional reservation-based ad contracts. To make this choice, the publisher must trade off, in real-time, the short-term revenue from ad exchange with the long-term benefits of delivering good spots to the reservation ads. In this paper we formalize this combined optimization problem as a multiobjective stochastic control problem and derive an efficient policy for online ad allocation in settings with general joint distribution over placement quality and exchange prices. We prove the asymptotic optimality of this policy in terms of any arbitrary trade-off between the quality of delivered reservation ads and revenue from the exchange, and we show that our policy approximates any Pareto-optimal point on the quality-versus-revenue curve. Experimental results on data derived from real publisher inventory confirm that there are significant benefits ...

132 citations


Proceedings ArticleDOI
03 Nov 2014
TL;DR: This paper designs improved algorithms based on traditional MapReduce architecture for large scale data analysis that have provable theoretical guarantees, and easily outperform previously studied algorithms.
Abstract: Computing connected components of a graph lies at the core of many data mining algorithms, and is a fundamental subroutine in graph clustering. This problem is well studied, yet many of the algorithms with good theoretical guarantees perform poorly in practice, especially when faced with graphs with hundreds of billions of edges. In this paper, we design improved algorithms based on traditional MapReduce architecture for large scale data analysis. We also explore the effect of augmenting MapReduce with a distributed hash table (DHT) service. We show that these algorithms have provable theoretical guarantees, and easily outperform previously studied algorithms, sometimes by more than an order of magnitude. In particular, our iterative MapReduce algorithms run 3 to 15 times faster than the best previously studied algorithms, and the MapReduce implementation using a DHT is 10 to 30 times faster than the best previously studied algorithms. These are the fastest algorithms that easily scale to graphs with hundreds of billions of edges.

80 citations


Proceedings Article
08 Dec 2014
TL;DR: A new framework based on "mapping coresets" to tackle the "balanced clustering" problem and results in first distributed approximation algorithms for balanced clustering problems for a wide range of clustering objective functions such as k-center, k-median, and k-means.
Abstract: Large-scale clustering of data points in metric spaces is an important problem in mining big data sets. For many applications, we face explicit or implicit size constraints for each cluster which leads to the problem of clustering under capacity constraints or the "balanced clustering" problem. Although the balanced clustering problem has been widely studied, developing a theoretically sound distributed algorithm remains an open problem. In this paper we develop a new framework based on "mapping coresets" to tackle this issue. Our technique results in first distributed approximation algorithms for balanced clustering problems for a wide range of clustering objective functions such as k-center, k-median, and k-means.

73 citations


Proceedings ArticleDOI
07 Apr 2014
TL;DR: The goal in this paper is to study optimal mechanism design in settings plagued by competition and two-sided asymmetric information, and identify conditions under which the current practice of employing constant cuts is indeed optimal.
Abstract: E-commerce web-sites such as Ebay as well as advertising exchanges (AdX) such as DoubleClick's, RightMedia, or AdECN work as intermediaries who sell items (e.g. page-views) on behalf of a seller (e.g. a publisher) to buyers on the opposite side of the market (e.g., advertisers). These platforms often use fixed-percentage sharing schemes, according to which (i) the platform runs an auction amongst buyers, and (ii) gives the seller a constant-fraction (e.g., 80%) of the auction proceeds. In these settings, the platform faces asymmetric information regarding both the valuations of buyers for the item (as in a standard auction environment) as well as about the seller's opportunity cost of selling the item. Moreover, platforms often face intense competition from similar market places, and such competition is likely to favor auction rules that secure high payoffs to sellers. In such an environment, what selling mechanism should platforms employ? Our goal in this paper is to study optimal mechanism design in settings plagued by competition and two-sided asymmetric information, and identify conditions under which the current practice of employing constant cuts is indeed optimal. In particular, we first show that for a large class of competition games, platforms behave in equilibrium as if they maximize a a convex combination of seller's payoffs and platform's revenue, with weight α on the seller's payoffs (which is proxy for the intensity of competition in the market). We generalize the analysis of Myerson and Satterthwaite (1983), and derive the optimal direct-revelation mechanism for each α. As expected, the optimal mechanism applies a reserve price which is decreasing in α. Next, we present an indirect implementation based on ``sharing schemes". We show that constant cuts are optimal if and only if the opportunity cost of the seller has a power-form distribution, and derive a simple formula for computing the optimal constant cut as a function of the sellers' distribution of opportunity costs, and the market competition proxy α. Finally, for completeness, we study the case of a seller's optimal auction with a fixed profit for the platform, and derive the optimal direct and indirect implementations in this setting.

31 citations


Journal ArticleDOI
TL;DR: It is proved that while the MinPTS problem is NP-complete for a restricted family of graphs, it admits a constant-factor approximation algorithm for power-law graphs and the convergence properties of these algorithms are studied and it is shown that the non-progressive model converges in at most O(|E(G)|) steps.

18 citations


Proceedings ArticleDOI
01 Jun 2014
TL;DR: The study of the multiplicative bidding language adopted by major Internet search companies is initiated, and a foundational optimization problem is established that captures the core difficulty of bidding under this language.
Abstract: In this paper, we initiate the study of the multiplicative bidding language adopted by major Internet search companies. In multiplicative bidding, the effective bid on a particular search auction is the product of a base bid and bid adjustments that are dependent on features of the search (for example, the geographic location of the user, or the platform on which the search is conducted). We consider the task faced by the advertiser when setting these bid adjustments, and establish a foundational optimization problem that captures the core difficulty of bidding under this language. We give matching algorithmic and approximation hardness results for this problem; these results are against an information-theoretic bound, and thus have implications on the power of the multiplicative bidding language itself. Inspired by empirical studies of search engine price data, we then codify the relevant restrictions of the problem, and give further algorithmic and hardness results. Our main technical contribution is an O(log n)-approximation for the case of multiplicative prices and monotone values. We also provide empirical validations of our problem restrictions, and test our algorithms on real data against natural benchmarks. Our experiments show that they perform favorably compare with the baseline.

16 citations


Journal ArticleDOI
TL;DR: It is proved that overlapping clustering may be significantly better than non-overlapping clustering with respect to conductance, even in a theoretical setting, and it is found out that allowing overlap does not help for the problem of minimizing the sum of conductances.
Abstract: Graph clustering is an important problem with applications to bioinformatics, community discovery in social networks, distributed computing, and more. While most of the research in this area has focused on clustering using disjoint clusters, many real datasets have inherently overlapping clusters. We compare overlapping and non-overlapping clusterings in graphs in the context of minimizing their conductance. It is known that allowing clusters to overlap gives better results in practice. We prove that overlapping clustering may be significantly better than non-overlapping clustering with respect to conductance, even in a theoretical setting. For minimizing the maximum conductance over the clusters, we give examples demonstrating that allowing overlaps can yield significantly better clusterings, namely, one that has much smaller optimum. In addition for the min-max variant, the overlapping version admits a simple approximation algorithm, while our algorithm for the non-overlapping version is complex and yields a worse approximation ratio due to the presence of the additional constraint. Somewhat surprisingly, for the problem of minimizing the sum of conductances, we found out that allowing overlap does not help. We show how to apply a general technique to transform any overlapping clustering into a non-overlapping one with only a modest increase in the sum of conductances. This uncrossing technique is of independent interest and may find further applications in the future. We consider this work as a step toward rigorous comparison of overlapping and non-overlapping clusterings and hope that it stimulates further research in this area.

15 citations


Book ChapterDOI
08 Jul 2014
TL;DR: This paper focuses on tree networks, and designs a coordination mechanism with polylogarithmic price of anarchy for weighted jobs and shows that price of Anarchy is a function of the depth of the tree.
Abstract: While selfish routing has been studied extensively, the problem of designing better coordination mechanisms for routing over time in general graphs has remained an open problem. In this paper, we focus on tree networks (single source multiple destinations) with the goal of minimizing (weighted) average sojourn time of jobs, and provide the first coordination mechanisms with provable price of anarchy for this problem. Interestingly, we achieve our price of anarchy results using simple and strongly local policies such as Shortest Job First and the Smith’s Rule (also called HDF). In particular, for the case of unweighted jobs, we design a coordination mechanism with polylogarithmic price of anarchy. For weighted jobs, on the other hand, we show that price of anarchy is a function of the depth of the tree and accompany this result by a lower bound for the price of anarchy for the Smith Rule policy and other common strongly local scheduling policies.

14 citations


Proceedings ArticleDOI
01 Jun 2014
TL;DR: In this paper, the authors investigate the design of Pareto optimal and incentive compatible auctions for agents with constrained quasi-linear utilities, which captures more realistic models of liquidity constraints that the agents may have.
Abstract: Constraints on agent's ability to pay play a major role in auction design for any setting where the magnitude of financial transactions is sufficiently large. Those constraints have been traditionally modeled in mechanism design as hard budget, i.e., mechanism is not allowed to charge agents more than a certain amount. Yet, real auction systems (such as Google AdWords) allow more sophisticated constraints on agents' ability to pay, such as average budgets. In this work, we investigate the design of Pareto optimal and incentive compatible auctions for agents with constrained quasi-linear utilities, which captures more realistic models of liquidity constraints that the agents may have. Our result applies to a very general class of allocation constraints known as polymatroidal environments, encompassing many settings of interest such as multi-unit auctions, matching markets, video-on demand and advertisement systems. Our design is based Ausubel's clinching framework. Incentive compatibility and feasibility with respect to ability-to-pay constraints are direct consequences of the clinching framework. Pareto-optimality, on the other hand, is considerably more challenging, since the no-trade condition that characterizes it depends not only on whether agents have their budgets exhausted or not, but also on prices {at} which the goods are allocated. In order to get a handle on those prices, we introduce novel concepts of dropping prices and saturation. These concepts lead to our main structural result which is a characterization of the tight sets in the clinching auction outcome and its relation to dropping prices.

14 citations


Proceedings ArticleDOI
07 Apr 2014
TL;DR: This work presents a novel algorithmic framework that addresses both issues for the computation of several graph-theoretical similarity measures, including # common neighbors, and Personalized PageRank, and shows experimentally the accuracy of the approach with real-world data.
Abstract: We study the problem of computing similarity rankings in large-scale multi-categorical bipartite graphs, where the two sides of the graph represent actors and items, and the items are partitioned into an arbitrary set of categories. The problem has several real-world applications, including identifying competing advertisers and suggesting related queries in an online advertising system or finding users with similar interests and suggesting content to them. In these settings, we are interested in computing on-the-fly rankings of similar actors, given an actor and an arbitrary subset of categories of interest. Two main challenges arise: First, the bipartite graphs are huge and often lopsided (e.g. the system might receive billions of queries while presenting only millions of advertisers). Second, the sheer number of possible combinations of categories prevents the pre-computation of the results for all of them. We present a novel algorithmic framework that addresses both issues for the computation of several graph-theoretical similarity measures, including # common neighbors, and Personalized PageRank. We show how to tackle the imbalance in the graphs to speed up the computation and provide efficient real-time algorithms for computing rankings for an arbitrary subset of categories. Finally, we show experimentally the accuracy of our approach with real-world data, using both public graphs and a very large dataset from Google AdWords.

Posted Content
TL;DR: This work investigates the design of Pareto optimal and incentive compatible auctions for agents with constrained quasi-linear utilities, which captures more realistic models of liquidity constraints that the agents may have.
Abstract: Constraints on agent's ability to pay play a major role in auction design for any setting where the magnitude of financial transactions is sufficiently large. Those constraints have been traditionally modeled in mechanism design as \emph{hard budget}, i.e., mechanism is not allowed to charge agents more than a certain amount. Yet, real auction systems (such as Google AdWords) allow more sophisticated constraints on agents' ability to pay, such as \emph{average budgets}. In this work, we investigate the design of Pareto optimal and incentive compatible auctions for agents with \emph{constrained quasi-linear utilities}, which captures more realistic models of liquidity constraints that the agents may have. Our result applies to a very general class of allocation constraints known as polymatroidal environments, encompassing many settings of interest such as multi-unit auctions, matching markets, video-on-demand and advertisement systems. Our design is based Ausubel's \emph{clinching framework}. Incentive compatibility and feasibility with respect to ability-to-pay constraints are direct consequences of the clinching framework. Pareto-optimality, on the other hand, is considerably more challenging, since the no-trade condition that characterizes it depends not only on whether agents have their budgets exhausted or not, but also on prices {at} which the goods are allocated. In order to get a handle on those prices, we introduce novel concepts of dropping prices and saturation. These concepts lead to our main structural result which is a characterization of the tight sets in the clinching auction outcome and its relation to dropping prices.

Proceedings ArticleDOI
Anand Bhalgat1, Nitish Korula2, Hannadiy Leontyev2, Max Lin2, Vahab Mirrokni2 
24 Feb 2014
TL;DR: This work forms this problem as a hierarchical online matching problem where each incoming impression has a level indicating its importance, and designs practical solutions to this problem and study their performance on real data sets.
Abstract: Display ads on the Internet are often sold by publishers to advertisers in bundles of thousands or millions of impressions over a particular time period. The ad delivery systems assign ads to pages on behalf of publishers to satisfy these contracts, and at the same time, try to maximize the overall quality of assignment. This is usually modeled in the literature as an online allocation problem, where contracts are represented by overall delivery constraints. However an important aspect of these contracts is missed by the classical formulation: a majority of these contracts are not between advertisers and publishers; a set of publishers is typically represented by a middle-man and advertisers buy inventory from the middle man. As publishers vary in quality and importance, advertisers prefer these publishers differently. Similarly, as the inventory of ads is limited, ad-delivery engine needs to prefer a high-quality publisher over a low quality publisher for supplying ads. We formulate this problem as a hierarchical online matching problem where each incoming impression has a level indicating its importance, and study its theoretical properties. We also design practical solutions to this problem and study their performance on real data sets.

Book ChapterDOI
14 Dec 2014
TL;DR: Conise bidding strategies help advertisers deal with this challenge by introducing fewer variables to act on in the presence of multidimensional budget constraints.
Abstract: A major challenge faced by the marketers attempting to optimize their advertising campaigns is to deal with budget constraints. The problem is even harder in the face of multidimensional budget constraints, particularly in the presence of many decision variables involved, and the interplay among the decision variables through these such constraints. Concise bidding strategies help advertisers deal with this challenge by introducing fewer variables to act on.

01 Mar 2014
TL;DR: Although it is proved that finding the best target decomposition is NP-hard, a greedy algorithm is introduced that proposes a decomposition through iterative unification of the strongly connected components of the target.
Abstract: A (build) target specifies the information that is needed to automatically build a software artifact. This paper focuses on underutilized targets—an important dependency problem that we identified at Google. An underutilized target is one with files not needed by some of its dependents. Underutilized targets result in less modular code, overly large artifacts, slow builds, and unnecessary build and test triggers. To mitigate these problems, programmers decompose underutilized targets into smaller targets. However, manually decomposing a target is tedious and error-prone. Although we prove that finding the best target decomposition is NP-hard, we introduce a greedy algorithm that proposes a decomposition through iterative unification of the strongly connected components of the target. Our tool found that 19,994 of 40,000 Java library targets at Google can be decomposed to at least two targets. The results show that our tool is (1) efficient because it analyzes a target in two minutes on average and (2) effective because for each of 1,010 targets, it would save at least 50% of the total execution time of the tests triggered by the target.

Posted Content
TL;DR: In this article, the authors study the multiplicative bidding language adopted by major Internet search companies and give matching algorithmic and approximation hardness results for this problem; these results are against an information-theoretic bound.
Abstract: In this paper, we initiate the study of the multiplicative bidding language adopted by major Internet search companies. In multiplicative bidding, the effective bid on a particular search auction is the product of a base bid and bid adjustments that are dependent on features of the search (for example, the geographic location of the user, or the platform on which the search is conducted). We consider the task faced by the advertiser when setting these bid adjustments, and establish a foundational optimization problem that captures the core difficulty of bidding under this language. We give matching algorithmic and approximation hardness results for this problem; these results are against an information-theoretic bound, and thus have implications on the power of the multiplicative bidding language itself. Inspired by empirical studies of search engine price data, we then codify the relevant restrictions of the problem, and give further algorithmic and hardness results. Our main technical contribution is an $O(\log n)$-approximation for the case of multiplicative prices and monotone values. We also provide empirical validations of our problem restrictions, and test our algorithms on real data against natural benchmarks. Our experiments show that they perform favorably compared with the baseline.