scispace - formally typeset
Search or ask a question

Showing papers by "Vahab Mirrokni published in 2023"


Proceedings ArticleDOI
13 Jan 2023
TL;DR: In this paper , the authors studied the frequency moment estimation problem in the continuous release setting, where the union of outputs of the algorithm at every timestamp must be differentially private, and gave an ε -DP algorithm that achieves (1 + η ) -relative approximation ( ∀ η ∈ (0, 1)) with poly log( Tn ) additive error and uses poly log Tn · max(1, n 1 − 2 /p ) space.
Abstract: The streaming model of computation is a popular approach for working with large-scale data. In this setting, there is a stream of items and the goal is to compute the desired quantities (usually data statistics) while making a single pass through the stream and using as little space as possible. Motivated by the importance of data privacy, we develop differentially private streaming algorithms under the continual release setting, where the union of outputs of the algorithm at every timestamp must be differentially private. Specifically, we study the fundamental ℓ p ( p ∈ [0 , + ∞ )) frequency moment estimation problem under this setting, and give an ε -DP algorithm that achieves (1 + η ) -relative approximation ( ∀ η ∈ (0 , 1)) with poly log( Tn ) additive error and uses poly log( Tn ) · max(1 , n 1 − 2 /p ) space, where T is the length of the stream and n is the size of the universe of elements. Our space is near optimal up to poly-logarithmic factors even in the non-private setting. To obtain our results, we first reduce several primitives under the differentially private continual release model, such as counting distinct elements, heavy hitters and counting low frequency elements, to the simpler, counting/summing problems in the same setting. Based on these primitives, we develop a differentially private continual release level set estimation approach to address the ℓ p frequency moment estimation problem. We also provide a simple extension of our results to the harder sliding window model, where the statistics must be maintained over the past W data items.

11 citations


Journal ArticleDOI
TL;DR: In this article , the authors study how an advertiser maximizes total conversion (e.g. ad clicks) while satisfying aggregate return-on-investment (ROI) and budget constraints across all channels.
Abstract: In digital online advertising, advertisers procure ad impressions simultaneously on multiple platforms, or so-called channels, such as Google Ads, Meta Ads Manager, etc., each of which consists of numerous ad auctions. We study how an advertiser maximizes total conversion (e.g. ad clicks) while satisfying aggregate return-on-investment (ROI) and budget constraints across all channels. In practice, an advertiser does not have control over, and thus cannot globally optimize, which individual ad auctions she participates in for each channel, and instead authorizes a channel to procure impressions on her behalf: the advertiser can only utilize two levers on each channel, namely setting a per-channel budget and per-channel target ROI. In this work, we first analyze the effectiveness of each of these levers for solving the advertiser's global multi-channel problem. We show that when an advertiser only optimizes over per-channel ROIs, her total conversion can be arbitrarily worse than what she could have obtained in the global problem. Further, we show that the advertiser can achieve the global optimal conversion when she only optimizes over per-channel budgets. In light of this finding, under a bandit feedback setting that mimics real-world scenarios where advertisers have limited information on ad auctions in each channels and how channels procure ads, we present an efficient learning algorithm that produces per-channel budgets whose resulting conversion approximates that of the global optimal problem. Finally, we argue that all our results hold for both single-item and multi-item auctions from which channels procure impressions on advertisers' behalf.

5 citations


Journal ArticleDOI
TL;DR: In this paper , the authors studied the combinatorial optimization problem of finding an optimal core tensor shape, also called multilinear rank, for a size-constrained Tucker decomposition and gave an algorithm with provable approximation guarantees for its reconstruction error via connections to higher-order singular values.
Abstract: This work studies the combinatorial optimization problem of finding an optimal core tensor shape, also called multilinear rank, for a size-constrained Tucker decomposition. We give an algorithm with provable approximation guarantees for its reconstruction error via connections to higher-order singular values. Specifically, we introduce a novel Tucker packing problem, which we prove is NP-hard, and give a polynomial-time approximation scheme based on a reduction to the 2-dimensional knapsack problem with a matroid constraint. We also generalize our techniques to tree tensor network decompositions. We implement our algorithm using an integer programming solver, and show that its solution quality is competitive with (and sometimes better than) the greedy algorithm that uses the true Tucker decomposition loss at each step, while also running up to 1000x faster.

3 citations


Journal ArticleDOI
TL;DR: The notion of replicability was introduced by Impagliazzo et al. as discussed by the authors in the context of statistical clustering under the concept of shared internal randomness, where the output of a clustering algorithm is replicable if, with high probability, its output induces the exact same partition of the sample space after two executions on different inputs drawn from the same distribution.
Abstract: We design replicable algorithms in the context of statistical clustering under the recently introduced notion of replicability from Impagliazzo et al. [2022]. According to this definition, a clustering algorithm is replicable if, with high probability, its output induces the exact same partition of the sample space after two executions on different inputs drawn from the same distribution, when its internal randomness is shared across the executions. We propose such algorithms for the statistical $k$-medians, statistical $k$-means, and statistical $k$-centers problems by utilizing approximation routines for their combinatorial counterparts in a black-box manner. In particular, we demonstrate a replicable $O(1)$-approximation algorithm for statistical Euclidean $k$-medians ($k$-means) with $\operatorname{poly}(d)$ sample complexity. We also describe an $O(1)$-approximation algorithm with an additional $O(1)$-additive error for statistical Euclidean $k$-centers, albeit with $\exp(d)$ sample complexity. In addition, we provide experiments on synthetic distributions in 2D using the $k$-means++ implementation from sklearn as a black-box that validate our theoretical results.

3 citations


Journal ArticleDOI
12 Apr 2023
TL;DR: In this article , the authors present a new theoretical framework to measure re-identification risk in user representations, based on hypothesis testing, formally bounds the probability that an attacker may be able to obtain the identity of a user from their representation.
Abstract: Compact user representations (such as embeddings) form the backbone of personalization services. In this work, we present a new theoretical framework to measure re-identification risk in such user representations. Our framework, based on hypothesis testing, formally bounds the probability that an attacker may be able to obtain the identity of a user from their representation. As an application, we show how our framework is general enough to model important real-world applications such as the Chrome's Topics API for interest-based advertising. We complement our theoretical bounds by showing provably good attack algorithms for re-identification that we use to estimate the re-identification risk in the Topics API. We believe this work provides a rigorous and interpretable notion of re-identification risk and a framework to measure it that can be used to inform real-world applications.

2 citations


Book ChapterDOI
21 Mar 2023
TL;DR: Chan et al. as mentioned in this paper presented an algorithm for fully dynamic k-center in an arbitrary metric space that maintains an optimal 2 + ϵ-approximation in O(k \cdot polylog(n, Δ) amortized update time, where Δ is the aspect ratio of the metric space.
Abstract: In fully dynamic clustering problems, a clustering of a given data set in a metric space must be maintained while it is modified through insertions and deletions of individual points. In this paper, we resolve the complexity of fully dynamic $k$-center clustering against both adaptive and oblivious adversaries. Against oblivious adversaries, we present the first algorithm for fully dynamic $k$-center in an arbitrary metric space that maintains an optimal $(2+\epsilon)$-approximation in $O(k \cdot \mathrm{polylog}(n,\Delta))$ amortized update time. Here, $n$ is an upper bound on the number of active points at any time, and $\Delta$ is the aspect ratio of the metric space. Previously, the best known amortized update time was $O(k^2\cdot \mathrm{polylog}(n,\Delta))$, and is due to Chan, Gourqin, and Sozio (2018). Moreover, we demonstrate that our runtime is optimal up to $\mathrm{polylog}(n,\Delta)$ factors. In fact, we prove that even offline algorithms for $k$-clustering tasks in arbitrary metric spaces, including $k$-medians, $k$-means, and $k$-center, must make at least $\Omega(n k)$ distance queries to achieve any non-trivial approximation factor. This implies a lower bound of $\Omega(k)$ which holds even for the insertions-only setting. We also show deterministic lower and upper bounds for adaptive adversaries, demonstrate that an update time sublinear in $k$ is possible against oblivious adversaries for metric spaces which admit locally sensitive hash functions (LSH) and give the first fully dynamic $O(1)$-approximation algorithms for the closely related $k$-sum-of-radii and $k$-sum-of-diameter problems.

2 citations


Journal ArticleDOI
TL;DR: In this article , an advertiser repeatedly participates in second-price auctions, where the tuple of her value and the highest competing bid is drawn from an unknown time-varying distribution.
Abstract: Major Internet advertising platforms offer budget pacing tools as a standard service for advertisers to manage their ad campaigns. Given the inherent non-stationarity in an advertiser's value and also competing advertisers' values over time, a commonly used approach is to learn a target expenditure plan that specifies a target spend as a function of time, and then run a controller that tracks this plan. This raises the question: how many historical samples are required to learn a good expenditure plan? We study this question by considering an advertiser repeatedly participating in $T$ second-price auctions, where the tuple of her value and the highest competing bid is drawn from an unknown time-varying distribution. The advertiser seeks to maximize her total utility subject to her budget constraint. Prior work has shown the sufficiency of $T\log T$ samples per distribution to achieve the optimal $O(\sqrt{T})$-regret. We dramatically improve this state-of-the-art and show that just one sample per distribution is enough to achieve the near-optimal $\tilde O(\sqrt{T})$-regret, while still being robust to noise in the sampling distributions.

1 citations


Proceedings Article
TL;DR: In this paper , a learning algorithm for the seller that utilizes an episodic binary-search procedure to identify a revenue-optimal selling price is proposed, which enjoys low seller regret when within each episode, the budget and ROI constrained buyer approximately best responds to the posted price.
Abstract: Internet advertisers (buyers) repeatedly procure ad impressions from ad platforms (sellers) with the aim to maximize total conversion (i.e. ad value) while respecting both budget and return-on-investment (ROI) constraints for efficient utiliza-tion of limited monetary resources. Facing such a constrained buyer who aims to learn her optimal strategy to acquire impressions, we study from a seller’s perspective how to learn and price ad impressions through repeated posted price mechanisms to maximize revenue. For this two-sided learning setup, we propose a learning algorithm for the seller that utilizes an episodic binary-search procedure to identify a revenue-optimal selling price. We show that such a simple learning algorithm enjoys low seller regret when within each episode, the budget and ROI constrained buyer approximately best responds to the posted price. We present simple yet natural buyer’s bidding algorithms under which the buyer approximately best responds while satisfying budget and ROI constraints, leading to a low regret for our proposed seller pricing algorithm. The design of our seller algorithm is motivated by the fact that the seller’s revenue function admits a bell-shaped structure when the buyer best responds to prices under budget and ROI constraints, enabling our seller algorithm to identify revenue-optimal selling prices efficiently.

1 citations


Journal ArticleDOI
TL;DR: In this article , the authors design learning rate schedules that minimize regret for SGD-based online learning in the presence of a changing data distribution, and they fully characterize the optimal learning rate schedule for online linear regression via a novel analysis with stochastic differential equations.
Abstract: We design learning rate schedules that minimize regret for SGD-based online learning in the presence of a changing data distribution. We fully characterize the optimal learning rate schedule for online linear regression via a novel analysis with stochastic differential equations. For general convex loss functions, we propose new learning rate schedules that are robust to distribution shift, and we give upper and lower bounds for the regret that only differ by constants. For non-convex loss functions, we define a notion of regret based on the gradient norm of the estimated models and propose a learning schedule that minimizes an upper bound on the total expected regret. Intuitively, one expects changing loss landscapes to require more exploration, and we confirm that optimal learning rate schedules typically increase in the presence of distribution shift. Finally, we provide experiments for high-dimensional regression models and neural networks to illustrate these learning rate schedules and their cumulative regret.

1 citations


Journal ArticleDOI
TL;DR: In this paper , the authors explore the possibility of training machine learning models with aggregated data labels, rather than individual labels, and show that aggregate learning can be an effective method for preserving user privacy while maintaining model accuracy.
Abstract: Protecting user privacy is a major concern for many machine learning systems that are deployed at scale and collect from a diverse set of population. One way to address this concern is by collecting and releasing data labels in an aggregated manner so that the information about a single user is potentially combined with others. In this paper, we explore the possibility of training machine learning models with aggregated data labels, rather than individual labels. Specifically, we consider two natural aggregation procedures suggested by practitioners: curated bags where the data points are grouped based on common features and random bags where the data points are grouped randomly in bag of similar sizes. For the curated bag setting and for a broad range of loss functions, we show that we can perform gradient-based learning without any degradation in performance that may result from aggregating data. Our method is based on the observation that the sum of the gradients of the loss function on individual data examples in a curated bag can be computed from the aggregate label without the need for individual labels. For the random bag setting, we provide a generalization risk bound based on the Rademacher complexity of the hypothesis class and show how empirical risk minimization can be regularized to achieve the smallest risk bound. In fact, in the random bag setting, there is a trade-off between size of the bag and the achievable error rate as our bound indicates. Finally, we conduct a careful empirical study to confirm our theoretical findings. In particular, our results suggest that aggregate learning can be an effective method for preserving user privacy while maintaining model accuracy.

Journal ArticleDOI
TL;DR: In this paper , the authors study the benefits of coordinating budget and RoS pacing services from an empirical and theoretical perspective, and establish the superiority of joint optimization both theoretically as well as empirically based on data from a large advertising platform.
Abstract: Budget pacing is a popular service that has been offered by major internet advertising platforms since their inception. Budget pacing systems seek to optimize advertiser returns subject to budget constraints through smooth spending of advertiser budgets. In the past few years, autobidding products that provide real-time bidding as a service to advertisers have seen a prominent rise in adoption. A popular autobidding stategy is value maximization subject to return-on-spend (ROS) constraints. For historical or business reasons, the algorithms that govern these two services, namely budget pacing and RoS pacing, are not necessarily always a single unified and coordinated entity that optimizes a global objective subject to both constraints. The purpose of this work is to study the benefits of coordinating budget and RoS pacing services from an empirical and theoretical perspective. We compare (a) a sequential algorithm that first constructs the advertiser's ROS-pacing bid and then lowers that bid for budget pacing, with (b) the optimal joint algorithm that optimizes advertiser returns subject to both budget and ROS constraints. We establish the superiority of joint optimization both theoretically as well as empirically based on data from a large advertising platform. In the process, we identify a third algorithm with minimal interaction between services that retains the theoretical properties of the joint optimization algorithm and performs almost as well empirically as the joint optimization algorithm. This algorithm eases the transition from a sequential to a fully joint implementation by minimizing the amount of interaction between the two services.

Proceedings ArticleDOI
17 Jun 2023
TL;DR: Bateni et al. as discussed by the authors proposed a single pass streaming algorithm for capacitated clustering in Euclidean space, where k is the number of clusters, d is the dimension and Δ is the maximum relative range of a coordinate.
Abstract: Clustering of data points in metric space is among the most fundamental problems in computer science with plenty of applications in data mining, information retrieval and machine learning. Many of these applications deal with large datasets, and hence researchers focused on designing algorithms for these problems in large scale settings such as the streaming setting. One of the sweet versions of clustering problems is balanced clustering (or more generally capacitated clustering), where we do not desire to have some giant and several small clusters. Despite the importance of the context, the best known streaming algorithm for capacitated clustering is far from optimal. The state-of-the-art streaming algorithm for capacitated clustering gives an O(1)-approximate solution, requires three passes and only handles insertions (Bateni et al. NeurIPS'14). We develop the first single pass streaming algorithm for the capacitated clustering problems that includes capacitated k-median and capacitated k-means in Euclidean space, using only poly(k dε-1 log Δ) space, where k is the number of clusters, d is the dimension and Δ is the maximum relative range of a coordinate. Our algorithm gives (1 + ε)-approximation and only violates the capacity constraint by a (1 + ε) factor. Interestingly, unlike the previous algorithm, our algorithm handles both insertions and deletions of points. To provide this result we introduce a decomposition of the space via some curved half-spaces which might be of independent interest.

Journal ArticleDOI
TL;DR: In this article , a differentially private approximation algorithm for hierarchical clustering under the rigorous framework introduced by Dasgupta was proposed, which achieves a polynomial-time approximation with O(|V|^2/ \epsilon)additive error.
Abstract: Hierarchical Clustering is a popular unsupervised machine learning method with decades of history and numerous applications. We initiate the study of differentially private approximation algorithms for hierarchical clustering under the rigorous framework introduced by (Dasgupta, 2016). We show strong lower bounds for the problem: that any $\epsilon$-DP algorithm must exhibit $O(|V|^2/ \epsilon)$-additive error for an input dataset $V$. Then, we exhibit a polynomial-time approximation algorithm with $O(|V|^{2.5}/ \epsilon)$-additive error, and an exponential-time algorithm that meets the lower bound. To overcome the lower bound, we focus on the stochastic block model, a popular model of graphs, and, with a separation assumption on the blocks, propose a private $1+o(1)$ approximation algorithm which also recovers the blocks exactly. Finally, we perform an empirical study of our algorithms and validate their performance.

Proceedings ArticleDOI
01 Feb 2023
TL;DR: In this article , an auction-dependent and bidder-dependent cost multipliers that guarantee approximation ratios of 1/2 and 1/4 respectively in terms of the social welfare were proposed.
Abstract: We study autobidding ad auctions with user costs, where each bidder is value-maximizing subject to a return-over-investment (ROI) constraint, and the seller aims to maximize the social welfare taking into consideration the user’s cost of viewing an ad. We show that in the worst case, the approximation ratio of social welfare by running the vanilla VCG auctions with user costs could as bad as 0. To improve the performance of VCG, We propose a new variant of VCG based on properly chosen cost multipliers, and prove that there exist auction-dependent and bidder-dependent cost multipliers that guarantee approximation ratios of 1/2 and 1/4 respectively in terms of the social welfare.

Journal ArticleDOI
TL;DR: In this article , the authors study the stochastic linear bandit problem under the additional requirements of differential privacy, robustness and batched observations, and present differentially private and robust variants of the arm elimination algorithm using logarithmic batch queries under two privacy models.
Abstract: In this paper, we study the stochastic linear bandit problem under the additional requirements of differential privacy, robustness and batched observations. In particular, we assume an adversary randomly chooses a constant fraction of the observed rewards in each batch, replacing them with arbitrary numbers. We present differentially private and robust variants of the arm elimination algorithm using logarithmic batch queries under two privacy models and provide regret bounds in both settings. In the first model, every reward in each round is reported by a potentially different client, which reduces to standard local differential privacy (LDP). In the second model, every action is"owned"by a different client, who may aggregate the rewards over multiple queries and privatize the aggregate response instead. To the best of our knowledge, our algorithms are the first simultaneously providing differential privacy and adversarial robustness in the stochastic linear bandits problem.


Journal ArticleDOI
TL;DR: In this paper , the authors study the setting in which data owners train machine learning models collaboratively under a privacy notion called joint differential privacy (JDP-SGD), where the model trained for each data owner uses $j$'s data without privacy consideration and other owners' data with differential privacy guarantees.
Abstract: In this paper, we study the setting in which data owners train machine learning models collaboratively under a privacy notion called joint differential privacy [Kearns et al., 2018]. In this setting, the model trained for each data owner $j$ uses $j$'s data without privacy consideration and other owners' data with differential privacy guarantees. This setting was initiated in [Jain et al., 2021] with a focus on linear regressions. In this paper, we study this setting for stochastic convex optimization (SCO). We present an algorithm that is a variant of DP-SGD [Song et al., 2013; Abadi et al., 2016] and provides theoretical bounds on its population loss. We compare our algorithm to several baselines and discuss for what parameter setups our algorithm is more preferred. We also empirically study joint differential privacy in the multi-class classification problem over two public datasets. Our empirical findings are well-connected to the insights from our theoretical results.

Proceedings ArticleDOI
01 Jun 2023
TL;DR: In this paper , a non-parametric causal model of user actions in a personalized recommendation system is introduced, and the authors derive new experimental designs that intervene in the personalization system to generate the variation necessary to separately identify the causal effect mediated through user learning and personalization.
Abstract: In online platforms, the impact of a treatment on an observed outcome may change over time as 1) users learn about the intervention, and 2) the system personalization, such as individualized recommendations, change over time. We introduce a non-parametric causal model of user actions in a personalized system. We show that the Cookie-Cookie-Day (CCD) experiment, designed for the measurement of the user learning effect, is biased when there is personalization. We derive new experimental designs that intervene in the personalization system to generate the variation necessary to separately identify the causal effect mediated through user learning and personalization. Making parametric assumptions allows for the estimation of long-term causal effects based on medium-term experiments. In simulations, we show that our new designs successfully recover the dynamic causal effects of interest.