scispace - formally typeset
Search or ask a question

Showing papers by "Yishay Mansour published in 2013"


Proceedings ArticleDOI
06 Jan 2013
TL;DR: A regret minimization algorithm for setting the reserve price in a sequence of second-price auctions, under the assumption that all bids are independently drawn from the same unknown and arbitrary distribution, achieves a regret of Õ(√T) in asequence of T auctions.
Abstract: We show a regret minimization algorithm for setting the reserve price in second-price auctions. We make the assumption that all bidders draw their bids from the same unknown and arbitrary distribution. Our algorithm is computationally efficient, and achieves a regret of O(√T), even when the number of bidders is stochastic with a known distribution.

130 citations


Proceedings Article
05 Dec 2013
TL;DR: This work characterization of regret in the directed observability model in terms of the dominating and independence numbers of the observability graph (which must be accessible before selecting an action) and in the undirected case it is shown that the learner can achieve optimal regret without even accessing the observable graph before selected an action.
Abstract: We consider the partial observability model for multi-armed bandits, introduced by Mannor and Shamir [14]. Our main result is a characterization of regret in the directed observability model in terms of the dominating and independence numbers of the observability graph (which must be accessible before selecting an action). In the undirected case, we show that the learner can achieve optimal regret without even accessing the observability graph before selecting an action. Both results are shown using variants of the Exp3 algorithm operating on the observability graph in a time-efficient manner.

71 citations


Book ChapterDOI
21 Aug 2013
TL;DR: In this paper, a polylogarithmic local computation matching algorithm is presented which guarantees a (1 - e)-approximation to the maximum matching in graphs of bounded degree.
Abstract: We present a polylogarithmic local computation matching algorithm which guarantees a (1 - e)-approximation to the maximum matching in graphs of bounded degree

57 citations


Journal ArticleDOI
TL;DR: In this article, a theory of how well-motivated multiagent dynamics can make use of global information about the game, which might be common knowledge or injected into the system by a helpful central agency, is initiated.
Abstract: Many natural games have a dramatic difference between the quality of their best and worst Nash equilibria, even in pure strategies. Yet, nearly all results to date on dynamics in games show only convergence to some equilibrium, especially within a polynomial number of steps. In this work we initiate a theory of how well-motivated multiagent dynamics can make use of global information about the game---which might be common knowledge or injected into the system by a helpful central agency---and show that in a wide range of interesting games this can allow the dynamics to quickly reach (within a polynomial number of steps) states of cost comparable to the best Nash equilibrium. We present several natural models for dynamics that can use such additional information and analyze their ability to reach low-cost states for two important and widely studied classes of potential games: network design with fair cost-sharing and party affiliation games (which include consensus and cut games). From the perspective of a...

37 citations


Posted Content
TL;DR: In this article, the authors considered the partial observability model for multi-armed bandits, introduced by Mannor and Shamir, and characterized regret in directed observability in terms of the dominating and independence numbers of the observability graph.
Abstract: We consider the partial observability model for multi-armed bandits, introduced by Mannor and Shamir. Our main result is a characterization of regret in the directed observability model in terms of the dominating and independence numbers of the observability graph. We also show that in the undirected case, the learner can achieve optimal regret without even accessing the observability graph before selecting an action. Both results are shown using variants of the Exp3 algorithm operating on the observability graph in a time-efficient manner.

29 citations


Proceedings Article
16 Jun 2013
TL;DR: It is shown in this model that an ontology, which specifies the relationships between multiple outputs, in some cases is sufficient to completely learn a classification using a large unlabeled data source.
Abstract: We present and analyze a theoretical model designed to understand and explain the effectiveness of ontologies for learning multiple related tasks from primarily unlabeled data. We present both information-theoretic results as well as efficient algorithms. We show in this model that an ontology, which specifies the relationships between multiple outputs, in some cases is sufficient to completely learn a classification using a large unlabeled data source.

28 citations


Posted Content
TL;DR: A polylogarithmic local computation matching algorithm which guarantees a (1 - e)-approximation to the maximum matching in graphs of bounded degree is presented.
Abstract: We present a polylogarithmic local computation matching algorithm which guarantees a $(1-\eps)$-approximation to the maximum matching in graphs of bounded degree.

22 citations


Proceedings Article
13 Jun 2013
TL;DR: This work studies regret minimization bounds in which the dependence on the number of experts is replaced by measures of the realized complexity of the expert class, which serves as a measure of complexity.
Abstract: We study regret minimization bounds in which the dependence on the number of experts is replaced by measures of the realized complexity of the expert class. The measures we consider are defined in retrospect given the realized losses. We concentrate on two interesting cases. In the first, our measure of complexity is the number of different “leading experts”, namely, experts that were best at some point in time. We derive regret bounds that depend only on this measure, independent of the total number of experts. We also consider a case where all experts remain grouped in just a few clusters in terms of their realized cumulative losses. Here too, our regret bounds depend only on the number of clusters determined in retrospect, which serves as a measure of complexity. Our results are obtained as special cases of a more general analysis for a setting of branching experts, where the set of experts may grow over time according to a tree-like structure, determined by an adversary. For this setting of branching experts, we give algorithms and analysis that cover both the full information and the bandit scenarios.

20 citations


Journal ArticleDOI
TL;DR: In this paper, the authors consider a non-myopic version of Cournot competition, where each firm selects either profit maximization (as in the classical model) or revenue maximization by masquerading as a firm with zero production costs.

19 citations


Posted Content
TL;DR: In this article, a simple decomposition of the expected distortion is presented, showing that K-means and EM must implicitly manage a trade-off between how similar the data assigned to each cluster are, and how the data are balanced among the clusters.
Abstract: Assignment methods are at the heart of many algorithms for unsupervised learning and clustering - in particular, the well-known K-means and Expectation-Maximization (EM) algorithms. In this work, we study several different methods of assignment, including the "hard" assignments used by K-means and the ?soft' assignments used by EM. While it is known that K-means minimizes the distortion on the data and EM maximizes the likelihood, little is known about the systematic differences of behavior between the two algorithms. Here we shed light on these differences via an information-theoretic analysis. The cornerstone of our results is a simple decomposition of the expected distortion, showing that K-means (and its extension for inferring general parametric densities from unlabeled sample data) must implicitly manage a trade-off between how similar the data assigned to each cluster are, and how the data are balanced among the clusters. How well the data are balanced is measured by the entropy of the partition defined by the hard assignments. In addition to letting us predict and verify systematic differences between K-means and EM on specific examples, the decomposition allows us to give a rather general argument showing that K ?means will consistently find densities with less "overlap" than EM. We also study a third natural assignment method that we call posterior assignment, that is close in spirit to the soft assignments of EM, but leads to a surprisingly different algorithm.

17 citations


Proceedings ArticleDOI
16 Jun 2013
TL;DR: The algorithmic problem of maximizing revenue in a network using differential pricing, where the prices offered to neighboring vertices cannot be substantially different, is introduced and it is shown that the optimal pricing can be computed efficiently, even for arbitrary revenue functions.
Abstract: We introduce and study the algorithmic problem of maximizing revenue in a network using differential pricing, where the prices offered to neighboring vertices cannot be substantially different. Our most surprising result is that the optimal pricing can be computed efficiently, even for arbitrary revenue functions. In contrast, we show that if one is allowed to introduce discontinuities (by deleting vertices) the optimization problem becomes computationally hard, and we exhibit algorithms for special classes of graphs. We also study a stochastic model, and show that a similar contrast exists there: For pricing without discontinuities the benefit of differential pricing over a single price is negligible, while for differential pricing with discontinuities the difference is substantial.

Posted Content
TL;DR: It is proved a frequentist regret bound for Thompson sampling in a very general setting involving parameter, action and observation spaces and a likelihood function over them, and the first nontrivial regret bounds for nonlinear MAX reward feedback from subsets are derived.
Abstract: We consider stochastic multi-armed bandit problems with complex actions over a set of basic arms, where the decision maker plays a complex action rather than a basic arm in each round. The reward of the complex action is some function of the basic arms' rewards, and the feedback observed may not necessarily be the reward per-arm. For instance, when the complex actions are subsets of the arms, we may only observe the maximum reward over the chosen subset. Thus, feedback across complex actions may be coupled due to the nature of the reward function. We prove a frequentist regret bound for Thompson sampling in a very general setting involving parameter, action and observation spaces and a likelihood function over them. The bound holds for discretely-supported priors over the parameter space and without additional structural properties such as closed-form posteriors, conjugate prior structure or independence across arms. The regret bound scales logarithmically with time but, more importantly, with an improved constant that non-trivially captures the coupling across complex actions due to the structure of the rewards. As applications, we derive improved regret bounds for classes of complex bandit problems involving selecting subsets of arms, including the first nontrivial regret bounds for nonlinear MAX reward feedback from subsets.

Posted Content
TL;DR: Local Computation Mechanism Design (LCD) as discussed by the authors is a technique for designing game theoretic mechanisms which run in polylogarithmic time and space, where each query can reply to each query with a global feasible solution.
Abstract: We introduce the notion of Local Computation Mechanism Design - designing game theoretic mechanisms which run in polylogarithmic time and space. Local computation mechanisms reply to each query in polylogarithmic time and space, and the replies to different queries are consistent with the same global feasible solution. In addition, the computation of the payments is also done in polylogarithmic time and space. Furthermore, the mechanisms need to maintain incentive compatibility with respect to the allocation and payments. We present local computation mechanisms for a variety of classical game-theoretical problems: 1. stable matching, 2. job scheduling, 3. combinatorial auctions for unit-demand and k-minded bidders, and 4. the housing allocation problem. For stable matching, some of our techniques may have general implications. Specifically, we show that when the men's preference lists are bounded, we can achieve an arbitrarily good approximation to the stable matching within a fixed number of iterations of the Gale-Shapley algorithm.

Proceedings ArticleDOI
16 Jun 2013
TL;DR: In this article, the authors study a mechanism design model in which agents arrive sequentially and each in turn chooses one action from a set of actions with unknown rewards, and characterize the optimal disclosure policy of a planner whose goal is to maximize social welfare.
Abstract: We study a novel mechanism design model in which agents arrive sequentially and each in turn chooses one action from a set of actions with unknown rewards. The information revealed by the principal affects the incentives of an agent to explore and generate new information. We characterize the optimal disclosure policy of a planner whose goal is to maximizes social welfare. One interpretation for our result is the implementation of what is known as the 'wisdom of the crowd'. This topic has become more relevant with the rapid adaptation of the Internet over the past decade.

Posted Content
TL;DR: A simple generalization of finite-horizon value iteration that computes a Nash strategy for each player in general-sum stochastic games and an algorithm for computing near-Nash equilibria in large or infinite state spaces.
Abstract: Stochastic games generalize Markov decision processes (MDPs) to a multiagent setting by allowing the state transitions to depend jointly on all player actions, and having rewards determined by multiplayer matrix games at each state. We consider the problem of computing Nash equilibria in stochastic games, the analogue of planning in MDPs. We begin by providing a generalization of finite-horizon value iteration that computes a Nash strategy for each player in generalsum stochastic games. The algorithm takes an arbitrary Nash selection function as input, which allows the translation of local choices between multiple Nash equilibria into the selection of a single global Nash equilibrium. Our main technical result is an algorithm for computing near-Nash equilibria in large or infinite state spaces. This algorithm builds on our finite-horizon value iteration algorithm, and adapts the sparse sampling methods of Kearns, Mansour and Ng (1999) to stochastic games. We conclude by descrbing a counterexample showing that infinite-horizon discounted value iteration, which was shown by shaplely to converge in the zero-sum case (a result we give extend slightly here), does not converge in the general-sum case.

Posted Content
TL;DR: This work examines some restricted settings in which perfectly reconstruct the hidden structure solely on the basis of observed sample data.
Abstract: In the literature on graphical models, there has been increased attention paid to the problems of learning hidden structure (see Heckerman [H96] for survey) and causal mechanisms from sample data [H96, P88, S93, P95, F98]. In most settings we should expect the former to be difficult, and the latter potentially impossible without experimental intervention. In this work, we examine some restricted settings in which perfectly reconstruct the hidden structure solely on the basis of observed sample data.

Posted Content
TL;DR: In this article, the authors prove the first non-trivial, worst-case, upper bound on the number of iterations required by policy iteration to converge to the optimal policy.
Abstract: Decision-making problems in uncertain or stochastic domains are often formulated as Markov decision processes (MDPs). Policy iteration (PI) is a popular algorithm for searching over policy-space, the size of which is exponential in the number of states. We are interested in bounds on the complexity of PI that do not depend on the value of the discount factor. In this paper we prove the first such non-trivial, worst-case, upper bounds on the number of iterations required by PI to converge to the optimal policy. Our analysis also sheds new light on the manner in which PI progresses through the space of policies.

Posted Content
TL;DR: In this paper, the authors formulate a general model which unifies the treatment of probe scheduling mechanisms, stochastic or deterministic, and different cost objectives - minimizing average detection time (SUM) or worst-case detection times (MAX).
Abstract: Most discovery systems for silent failures work in two phases: a continuous monitoring phase that detects presence of failures through probe packets and a localization phase that pinpoints the faulty element(s). This separation is important because localization requires significantly more resources than detection and should be initiated only when a fault is present. We focus on improving the efficiency of the detection phase, where the goal is to balance the overhead with the cost associated with longer failure detection times. We formulate a general model which unifies the treatment of probe scheduling mechanisms, stochastic or deterministic, and different cost objectives - minimizing average detection time (SUM) or worst-case detection time (MAX). We then focus on two classes of schedules. {\em Memoryless schedules} -- a subclass of stochastic schedules which is simple and suitable for distributed deployment. We show that the optimal memorlyess schedulers can be efficiently computed by convex programs (for SUM objectives) or linear programs (for MAX objectives), and surprisingly perhaps, are guaranteed to have expected detection times that are not too far off the (NP hard) stochastic optima. {\em Deterministic schedules} allow us to bound the maximum (rather than expected) cost of undetected faults, but like stochastic schedules, are NP hard to optimize. We develop novel efficient deterministic schedulers with provable approximation ratios. An extensive simulation study on real networks, demonstrates significant performance gains of our memoryless and deterministic schedulers over previous approaches. Our unified treatment also facilitates a clear comparison between different objectives and scheduling mechanisms.

Posted Content
TL;DR: This work analyzes the behavior of agents that incrementally adapt their strategy through gradient ascent on expected payoff, in the simple setting of two-player, two-action, iterated general-sum games, and shows that either the agents will converge to Nash equilibrium, or if the strategies themselves do not converge, then their average payoffs will nevertheless converge to the payoffs of a Nash equilibrium.
Abstract: Multi-agent games are becoming an increasing prevalent formalism for the study of electronic commerce and auctions. The speed at which transactions can take place and the growing complexity of electronic marketplaces makes the study of computationally simple agents an appealing direction. In this work, we analyze the behavior of agents that incrementally adapt their strategy through gradient ascent on expected payoff, in the simple setting of two-player, two-action, iterated general-sum games, and present a surprising result. We show that either the agents will converge to Nash equilibrium, or if the strategies themselves do not converge, then their average payoffs will nevertheless converge to the payoffs of a Nash equilibrium.

Book ChapterDOI
21 Aug 2013
TL;DR: The modeling considered both SUM e and MAX e objectives, which correspond to average or worst-case cover times over elements (weighted by priority), and both one-time testing, where the goal is to detect if a fault is currently present, and continuous testing, performed in the background in order to detect presence of failures soon after they occur.
Abstract: A test scheduling instance is specified by a set of elements, a set of tests, which are subsets of elements, and numeric priorities assigned to elements. The schedule is a sequence of test invocations with the goal of covering all elements. This formulation had been used to model problems in multiple application domains from network failure detection to broadcast scheduling. The modeling considered both SUM e and MAX e objectives, which correspond to average or worst-case cover times over elements (weighted by priority), and both one-time testing, where the goal is to detect if a fault is currently present, and continuous testing, performed in the background in order to detect presence of failures soon after they occur. Since all variants are NP hard, the focus is on approximations.

Proceedings ArticleDOI
06 May 2013
TL;DR: The robustness of trading agents to deviations from the game's specified environment is investigated and it is indicated that most agents, especially the top-scoring ones, are surprisingly robust.
Abstract: We study the empirical behavior of trading agents participating in the Ad-Auction game of the Trading Agent Competition (TAC-AA). Aiming to understand the applicability of optimal trading strategies in synthesized environments to real-life settings, we investigate the robustness of the agents to deviations from the game's specified environment. Our results indicate that most agents, especially the top-scoring ones, are surprisingly robust. In addition, using the game logs, we derive for each agent a strategic fingerprint and show that it almost uniquely identifies it. Finally, we show that although the Machine Learning modeling in TAC-AA is inherently inaccurate, further improvement in modeling accuracy is likely to have only a limited contribution to the overall performance of TAC-AA agents.