scispace - formally typeset
Search or ask a question

Showing papers by "Yishay Mansour published in 2005"


Proceedings ArticleDOI
23 Oct 2005
TL;DR: These reductions imply that for a wide variety of revenue-maximizing pricing problems, given an optimal algorithm for the standard algorithmic problem, it can be converted into a (1 + /spl epsi/)-approximation for the incentive-compatible mechanism design problem, so long as the number of bidders is sufficiently large.
Abstract: We use techniques from sample-complexity in machine learning to reduce problems of incentive-compatible mechanism design to standard algorithmic questions, for a wide variety of revenue-maximizing pricing problems Our reductions imply that for these problems, given an optimal (or /spl beta/-approximation) algorithm for the standard algorithmic problem, we can convert it into a (1 + /spl epsi/)-approximation (or /spl beta/(1 +/spl epsi/)-approximation) for the incentive-compatible mechanism design problem, so long as the number of bidders is sufficiently large as a function of an appropriate measure of complexity of the comparison class of solutions We apply these results to the problem of auctioning a digital good, the attribute auction problem, and to the problem of item-pricing in unlimited-supply combinatorial auctions From a learning perspective, these settings present several challenges: in particular the loss function is discontinuous and asymmetric, and the range of bidders' valuations may be large

141 citations


Proceedings ArticleDOI
23 Oct 2005
TL;DR: This work gives the first algorithm that (under distributional assumptions) efficiently learns halfspaces in the notoriously difficult agnostic framework of Kearns, Schapire, & Sellie, where a learner is given access to labeled examples drawn from a distribution, without restriction on the labels.
Abstract: We give the first algorithm that (under distributional assumptions) efficiently learns halfspaces in the notoriously difficult agnostic framework of Kearns, Schapire, & Sellie, where a learner is given access to labeled examples drawn from a distribution, without restriction on the labels (e.g. adversarial noise). The algorithm constructs a hypothesis whose error rate on future examples is within an additive /spl epsi/ of the optimal halfspace, in time poly(n) for any constant /spl epsi/ > 0, under the uniform distribution over {-1, 1}/sup n/ or the unit sphere in /spl Ropf//sup n/ , as well as under any log-concave distribution over /spl Ropf/ /sup n/. It also agnostically learns Boolean disjunctions in time 2/sup O~(/spl radic/n)/ with respect to any distribution. The new algorithm, essentially L/sub 1/ polynomial regression, is a noise-tolerant arbitrary distribution generalization of the "low degree" Fourier algorithm of Linial, Mansour, & Nisan. We also give a new algorithm for PAC learning halfspaces under the uniform distribution on the unit sphere with the current best bounds on tolerable rate of "malicious noise".

122 citations


Proceedings ArticleDOI
23 Jan 2005
TL;DR: This work considers n anonymous selfish users that route their communication through m parallel links, and shows that if the users have different weights then there exists a set of weights such that every Nash rerouting terminates in Ω(√n) stages with high probability.
Abstract: We consider n anonymous selfish users that route their communication through m parallel links. The users are allowed to reroute, concurrently, from overloaded links to underloaded links. The different rerouting decisions are concurrent, randomized and independent. The rerouting process terminates when the system reaches a Nash equilibrium, in which no user can improve its state.We study the convergence rate of several migration policies. The first is a very natural policy, which balances the expected load on the links, for the case that all users are identical and apply it, we show that the rerouting terminates in expected O(log log n + log m) stages. Later, we consider the Nash rerouting policies class, in which every rerouting stage is a Nash equilibrium and the users are greedy with respect to the next load they observe. We show a similar termination bounds for this class. We study the structural properties of the Nash rerouting policies, and derive both existence result and an efficient algorithm for the case that the number of links is small. We also show that if the users have different weights then there exists a set of weights such that every Nash rerouting terminates in Ω(√n) stages with high probability.

103 citations


Journal ArticleDOI
TL;DR: This work considers the setting of a network providing differentiated services and analyzes and compares different queue policies for this problem using the competitive analysis approach, where thebenefit of the online policy is compared to the benefit of an optimal offline policy.

95 citations


Journal ArticleDOI
TL;DR: In this article, the authors examine noisy radio (broadcast) networks in which every bit transmitted has a certain probability of being flipped, and show a protocol to compute any threshold function using only a linear number of transmissions.
Abstract: In this paper, we examine noisy radio (broadcast) networks in which every bit transmitted has a certain probability of being flipped. Each processor has some initial input bit, and the goal is to compute a function of these input bits. In this model, we show a protocol to compute any threshold function using only a linear number of transmissions.

81 citations


Proceedings ArticleDOI
22 May 2005
TL;DR: An extension of the "standard" learning models to settings where observing the value of an attribute has an associated cost (which might be different for different attributes).
Abstract: We study an extension of the "standard" learning models to settings where observing the value of an attribute has an associated cost (which might be different for different attributes). Our model assumes that the correct classification is given by some target function f from a class of functions cal F; most of our results discuss the ability to learn a clause (an OR function of a subset of the variables) in various settings:Offline: We are given both the function f and the distribution D that is used to generate an input x. The goal is to design a strategy to decide what attribute of x to observe next so as to minimize the expected evaluation cost of f(x). (In this setting there is no "learning" to be done but only an optimization problem to be solved; this problem to be NP-hard and hence approximation algorithms are presented.)Distributional online: We study two types of "learning" problems; one where the target function f is known to the learner but the distribution D is unknown (and the goal is to minimize the expected cost including the cost that stems from "learning" D), and the other where f is unknown (except that f∈cal F) but D is known (and the goal is to minimize the expected cost while limiting the prediction error involved in "learning" f).Adversarial online: We are given f, however the inputs are selected adversarially. The goal is to compare the learner's cost to that of the best fixed evaluation order (i.e., we analyze the learner's performance by a competitive analysis).

71 citations


Proceedings Article
30 Jul 2005
TL;DR: This work considers the most realistic reinforcement learning setting in which an agent starts in an unknown environment and must follow one continuous and uninterrupted chain of experience with no access to "resets" or "offline" simulation and provides algorithms for general connected POMDPs that obtain near optimal average reward.
Abstract: We consider the most realistic reinforcement learning setting in which an agent starts in an unknown environment (the POMDP) and must follow one continuous and uninterrupted chain of experience with no access to "resets" or "offline" simulation. We provide algorithms for general connected POMDPs that obtain near optimal average reward. One algorithm we present has a convergence rate which depends exponentially on a certain horizon time of an optimal policy, but has no dependence on the number of (unobservable) states. The main building block of our algorithms is an implementation of an approximate reset strategy, which we show always exists in every POMDP. An interesting aspect of our algorithms is how they use this strategy when balancing exploration and exploitation.

62 citations


Journal ArticleDOI
TL;DR: In this paper, the authors consider a network providing differentiated services (DiffServ), which allow Internet Service Providers (ISPs) to offer different levels of quality of service (QoS) to different traffic streams.
Abstract: We consider a network providing Differentiated Services (DiffServ), which allow Internet Service Providers (ISPs) to offer different levels of Quality of Service (QoS) to different traffic streams. We study two types of buffering policies that are used in network switches supporting QoS. In the FIFO type, packets must be transmitted in the order they arrive. In the uniform bounded delay type, there is a maximal delay time associated with the switch and each packet must be transmitted within this time, or otherwise it is dropped. In both models the buffer space is limited, and packets are lost when the buffer overflows. Each packet has an intrinsic value, and the goal is to maximize the total value of transmitted packets. Our main contribution is an algorithm for the FIFO model with arbitrary packet values that for the first time achieves a competitive ratio better than 2, namely 2-e for a constant e gt; 0. We also describe an algorithm for the uniform bounded delay model which simulates our algorithm for the FIFO model, and show that it achieves the same competitive ratio.

58 citations


Book ChapterDOI
27 Jun 2005
TL;DR: This work derives a simple and new forecasting strategy with regret at most order of Q*, the largest absolute value of any payoff, and devise a refined analysis of the weighted majority forecaster, which yields bounds of the same flavour.
Abstract: This work studies external regret in sequential prediction games with arbitrary payoffs (nonnegative or non-positive). External regret measures the difference between the payoff obtained by the forecasting strategy and the payoff of the best action. We focus on two important parameters: M, the largest absolute value of any payoff, and Q*, the sum of squared payoffs of the best action. Given these parameters we derive first a simple and new forecasting strategy with regret at most order of $\sqrt{Q^{*}({\rm ln}N)}+M {\rm ln} N$, where N is the number of actions. We extend the results to the case where the parameters are unknown and derive similar bounds. We then devise a refined analysis of the weighted majority forecaster, which yields bounds of the same flavour. The proof techniques we develop are finally applied to the adversarial multi-armed bandit setting, and we prove bounds on the performance of an online algorithm in the case where there is no lower bound on the probability of each action.

50 citations


Book ChapterDOI
27 Jun 2005
TL;DR: In this paper, the authors give a simple generic reduction that, given an algorithm for the external regret problem, converts it to an efficient online algorithm for internal regret, and derive a quantitative regret bound for a very general setting of regret, which includes an arbitrary set of modification rules (that possibly modify the online algorithm).
Abstract: External regret compares the performance of an online algorithm, selecting among N actions, to the performance of the best of those actions in hindsight. Internal regret compares the loss of an online algorithm to the loss of a modified online algorithm, which consistently replaces one action by another. In this paper, we give a simple generic reduction that, given an algorithm for the external regret problem, converts it to an efficient online algorithm for the internal regret problem. We provide methods that work both in the full information model, in which the loss of every action is observed at each time step, and the partial information (bandit) model, where at each time step only the loss of the selected action is observed. The importance of internal regret in game theory is due to the fact that in a general game, if each player has sublinear internal regret, then the empirical frequencies converge to a correlated equilibrium. For external regret we also derive a quantitative regret bound for a very general setting of regret, which includes an arbitrary set of modification rules (that possibly modify the online algorithm) and an arbitrary set of time selection functions (each giving different weight to each time step). The regret for a given time selection and modification rule is the difference between the cost of the online algorithm and the cost of the modified online algorithm, where the costs are weighted by the time selection function. This can be viewed as a generalization of the previously-studied sleeping experts setting.

47 citations


Proceedings ArticleDOI
30 Nov 2005
TL;DR: A new method is proposed, based on machine learning techniques, to improve generation success by learning the relationship between the initial state vector and generation success and significantly reduced generation failures and enabled faster coverage.
Abstract: The initial state of a design under verification has a major impact on the ability of stimuli generators to successfully generate the requested stimuli. For complexity reasons, most stimuli generators use sequential solutions without planning ahead. Therefore, in many cases, they fail to produce a consistent stimuli due to an inadequate selection of the initial state. We propose a new method, based on machine learning techniques, to improve generation success by learning the relationship between the initial state vector and generation success. We applied the proposed method in two different settings, with the objective of improving generation success and coverage in processor and system level generation. In both settings, the proposed method significantly reduced generation failures and enabled faster coverage

Book ChapterDOI
17 Apr 2005
TL;DR: It is shown that the optimal RTO that maximizes the TCP throughput need to depend also on the TCP window size, and the larger the TCPwindow size, the longer the ideal RTO.
Abstract: Delay spikes on Internet paths can cause spurious TCP timeouts leading to significant throughput degradation. However, if TCP is too slow to detect that a retransmission is necessary, it can stay idle for a long time instead of transmitting. The goal is to find a Retransmission Timeout (RTO) value that balances the throughput degradation between both of these cases. In the current TCP implementations, RTO is a function of the Round Trip Time (RTT) alone. We show that the optimal RTO that maximizes the TCP throughput need to depend also on the TCP window size. Intuitively, the larger the TCP window size, the longer the optimal RTO. We derive the optimal RTO for several RTT distributions. An important advantage of our algorithm is that it can be easily implemented based on the existing TCP timeout mechanism.

Book ChapterDOI
01 Dec 2005
TL;DR: Several high-probability concentration bounds for learning unigram language models are shown, including a combined estimator, which has an error of approximately O(m-2/5), for any k, and bound the log-loss a priori, as a function of various parameters of the distribution.
Abstract: We show several high-probability concentration bounds for learning unigram language models. One interesting quantity is the probability of all words appearing exactly k times in a sample of size m. A standard estimator for this quantity is the Good-Turing estimator. The existing analysis on its error shows a high-probability bound of approximately O(k / m1/2). We improve its dependency on k to O(k1/4 / m1/2 + k / m). We also analyze the empirical frequencies estimator, showing that with high probability its error is bounded by approximately O( 1 / k + k1/2 / m). We derive a combined estimator, which has an error of approximately O(m-2/5), for any k.A standard measure for the quality of a learning algorithm is its expected per-word log-loss. The leave-one-out method can be used for estimating the log-loss of the unigram model. We show that its error has a high-probability bound of approximately O(1 / m1/2), for any underlying distribution.We also bound the log-loss a priori, as a function of various parameters of the distribution.

Proceedings Article
26 Jul 2005
TL;DR: A planning algorithm is provided which is exponential only in the multiplicity automata rank rather than the number of states of the POMDP, and which is efficient whenever the predictive state representation is logarithmic in the standard POM DP representation.
Abstract: Planning and learning in Partially Observable MDPs (POMDPs) are among the most challenging tasks in both the AI and Operation Research communities. Although solutions to these problems are intractable in general, there might be special cases, such as structured POMDPs, which can be solved efficiently. A natural and possibly efficient way to represent a POMDP is through the predictive state representation (PSR) — a representation which recently has been receiving increasing attention. In this work, we relate POMDPs to multiplicity automata — showing that POMDPs can be represented by multiplicity automata with no increase in the representation size. Furthermore, we show that the size of the multiplicity automaton is equal to the rank of the predictive state representation. Therefore, we relate both the predictive state representation and POMDPs to the well-founded multiplicity automata literature. Based on the multiplicity automata representation, we provide a planning algorithm which is exponential only in the multiplicity automata rank rather than the number of states of the POMDP. As a result, whenever the predictive state representation is logarithmic in the standard POMDP representation, our planning algorithm is efficient.

Journal ArticleDOI
TL;DR: A novel congestion control algorithm is proposed that achieves high bandwidth utilization providing fairness among competing connections and is sufficiently responsive to changes of available bandwidth, where parameters may change dynamically, with respect to the current network conditions.
Abstract: The main objectives of a congestion control algorithm are high bandwidth utilization, fairness and responsiveness in a changing environment. However, these objectives are contradicting in particular situations since the algorithm constantly has to probe available bandwidth, which may affect its stability. This paper proposes a novel congestion control algorithm that achieves high bandwidth utilization providing fairness among competing connections and, on the other hand, is sufficiently responsive to changes of available bandwidth. The main idea of the algorithm is to use adaptive setting for the additive increase/multiplicative decrease (AIMD) congestion control scheme, where parameters may change dynamically, with respect to the current network conditions.

Journal ArticleDOI
TL;DR: This work considers the problem of combining algorithms designed for each of these objectives in a way that is good under both measures simultaneously, and shows how to derive a combined algorithm with competitive ratio O(cRcA) for rejection and O(a) for acceptance.
Abstract: Resource allocation and admission control are critical tasks in a communication network that often must be performed online. Algorithms for these types of problems have been considered both under benefit models (e.g., with a goal of approximately maximizing the number of requests accepted) and under cost models (e.g., with a goal of approximately minimizing the number of requests rejected). Unfortunately, algorithms designed for these two measures can often be quite different, even polar opposites. In this work we consider the problem of combining algorithms designed for each of these objectives in a way that is good under both measures simultaneously. More formally, we are given an algorithm A that is cA competitive with respect to the number of accepted requests and an algorithm R that is cR competitive with respect to the number of rejected requests. We show how to derive a combined algorithm with competitive ratio O(cRcA) for rejection and O(cA) for acceptance. We also build on known techniques to show that given a collection of k algorithms, we can