scispace - formally typeset
Search or ask a question

Showing papers by "Yishay Mansour published in 2002"


Journal ArticleDOI
TL;DR: This paper presents a new algorithm that, given only a generative model (a natural and common type of simulator) for an arbitrary MDP, performs on-line, near-optimal planning with a per-state running time that has no dependence on the number of states.
Abstract: A critical issue for the application of Markov decision processes (MDPs) to realistic problems is how the complexity of planning scales with the size of the MDP In stochastic environments with very large or infinite state spaces, traditional planning and reinforcement learning algorithms may be inapplicable, since their running time typically grows linearly with the state space size in the worst case In this paper we present a new algorithm that, given only a generative model (a natural and common type of simulator) for an arbitrary MDP, performs on-line, near-optimal planning with a per-state running time that has no dependence on the number of states The running time is exponential in the horizon time (which depends only on the discount factor γ and the desired degree of approximation to the optimal policy) Our algorithm thus provides a different complexity trade-off than classical algorithms such as value iteration—rather than scaling linearly in both horizon time and state space size, our running time trades an exponential dependence on the former in exchange for no dependence on the latter Our algorithm is based on the idea of sparse sampling We prove that a randomly sampled look-ahead tree that covers only a vanishing fraction of the full look-ahead tree nevertheless suffices to compute near-optimal actions from any state of an MDP Practical implementations of the algorithm are discussed, and we draw ties to our related recent results on finding a near-best strategy from a given class of strategies in very large partially observable MDPs (Kearns, Mansour, & Ng Neural information processing systems 13, to appear)

416 citations


Book ChapterDOI
08 Jul 2002
TL;DR: The bandit problem is revisited and considered under the PAC model, and it is shown that given n arms, it suffices to pull the arms O(n/?2 log 1/?) times to find an ?-optimal arm with probability of at least 1 - ?.
Abstract: The bandit problem is revisited and considered under the PAC model. Our main contribution in this part is to show that given n arms, it suffices to pull the arms O(n/?2 log 1/?) times to find an ?-optimal arm with probability of at least 1 - ?. This is in contrast to the naive bound of O(n/?2 log n/?). We derive another algorithm whose complexity depends on the specific setting of the rewards, rather than the worst case setting. We also provide a matching lower bound. We show how given an algorithm for the PAC model Multi-armed Bandit problem, one can derive a batch learningalg orithm for Markov Decision Processes. This is done essentially by simulatingV alue Iteration, and in each iteration invokingt he multi-armed bandit algorithm. Using our PAC algorithm for the multi-armed bandit problem we improve the dependence on the number of actions.

392 citations


Journal Article
TL;DR: It is shown that there exist (e, k)-wise independent distributions whose statistical distance is at least nO(k) ċ e from any k-wise independent distribution.
Abstract: We say that a distribution over {0,1}n is (e,k)-wise independent if its restriction to every k coordinates results in a distribution that is e-close to the uniform distribution. A natural question regarding (e, k)-wise independent distributions is how close they are to some k-wise independent distribution. We show that there exist (e, k)-wise independent distributions whose statistical distance is at least nO(k) ċ e from any k-wise independent distribution. In addition, we show that for any (e,k)-wise independent distribution there exists some k-wise independent distribution, whose statistical distance is nO(k) ċ e.

52 citations


Proceedings Article
01 Aug 2002
TL;DR: This work introduces a general representation of large-population games in which each player's influence on the others is centralized and limited, but may otherwise be arbitrary.
Abstract: We introduce a general representation of large-population games in which each player's influence on the others is centralized and limited, but may otherwise be arbitrary. This representation significantly generalizes the class known as congestion games in a natural way. Our main results are provably correct and efficient algorithms for computing and learning approximate Nash equilibria in this general framework.

45 citations


Journal ArticleDOI
TL;DR: It is shown that under the same weak learning assumption used for decision tree learning there exists a greedy BP-growth algorithm whose training error is guaranteed to decline as 2−b`|T|, where |T| is the size of the branching program and b is a constant determined by the weak learning hypothesis.

40 citations


Proceedings ArticleDOI
07 Nov 2002
TL;DR: The harmonic policy is proposed, a new scheduling policy based on a system of inequalities and thresholds that achieves high throughput and easily adapts to changing load conditions and its throughput competitive ratio is almost optimal.
Abstract: We introduce a new general scheme for shared memory nonpreemptive scheduling policies. Our scheme utilizes a system of inequalities and thresholds and accepts a packet if it does not violate any of the inequalities. We demonstrate that many of the existing policies can be described using our scheme, thus validating its generality. We propose a new scheduling policy, based on our general scheme, which we call the harmonic policy. Our simulations show that the harmonic policy both achieves high throughput and easily adapts to changing load conditions. We also perform a theoretical analysis of the harmonic policy and demonstrate that its throughput competitive ratio is almost optimal.

32 citations


Journal ArticleDOI
TL;DR: This new approach yields simple learning algorithms for multivariate polynomials and decision trees over finite fields under any constant bounded product distribution and gives a learning algorithm for an O(log n)-depth decision tree from membership queries only and a new learning algorithm of any multivariatePolynomial over sufficiently large fields from membership query only.
Abstract: In this paper we develop a new approach for learning decision trees and multivariate polynomials via interpolation of multivariate polynomials. This new approach yields simple learning algorithms for multivariate polynomials and decision trees over finite fields under any constant bounded product distribution. The output hypothesis is a (single) multivariate polynomial that is an $\epsilon$-approximation of the target under any constant bounded product distribution. The new approach demonstrates the learnability of many classes under any constant bounded product distribution and using membership queries, such as j-disjoint disjunctive normal forms (DNFs) and multivariate polynomials with bounded degree over any field. The technique shows how to interpolate multivariate polynomials with bounded term size from membership queries only. This, in particular, gives a learning algorithm for an O(log n)-depth decision tree from membership queries only and a new learning algorithm of any multivariate polynomial over sufficiently large fields from membership queries only. We show that our results for learning from membership queries only are the best possible.

29 citations


Proceedings ArticleDOI
06 Nov 2002
TL;DR: It is shown that gateway selection using predictors can reduce the degradations to half of that obtained by routing all the connections through the best gateway.
Abstract: We study the patterns and predictability of Internet End-to-End service degradations, where a degradation is a significant deviation of the round trip time between a client and a server. We use simultaneous RTT measurements collected from several locations to a large representative set of Web sites and study the duration and extent of degradations. We combine these measurements with BGP cluster information to learn on the location of the cause.We evaluate a number of predictors based upon Hidden Markov Models and Markov Models. Predictors typically exhibit a tradeoff between two types of errors, false positives (incorrect degradation prediction) and false negatives (a degradation is not predicted). The costs of these error-types is application dependent, but we capture the entire spectrum using a precision versus recall tradeoff. Using this methodology, we learn what information is most valuable for prediction (recency versus quantity of past measurements). Surprisingly, we also conclude that predictors that utilize history in a very simple way perform as well as more sophisticated ones.One important application of prediction is gateway selection, which is applicable when a LAN is connected through multiple gateways to one or several ISP's. Gateway selection can boost reliability and survivability by selecting for each connection the (hopefully) best gateway. We show that gateway selection using our predictors can reduce the degradations to half of that obtained by routing all the connections through the best gateway.

22 citations


Journal Article
TL;DR: This work presents competitive policies for a wide range of cost functions, describing the QoS of a video stream, and considers online policies for selective frame discard and analyzes their performance by means of competitive analysis.
Abstract: Many multimedia applications require transmission of streaming video from a server to a client across an internetwork. In many cases loss may be unavoidable due to congestion or heterogeneous nature of the network. We explore how discard policies can be used in order to maximize the quality of service (QoS) perceived by the client. In our model the QoS of a video stream is measured in terms of a cost function, which takes into account the discarded frames. In this paper we consider online policies for selective frame discard and analyze their performance by means of competitive analysis. In competitive analysis the performance of a given online policy is compared with that of an optimal offline policy. In this work we present competitive policies for a wide range of cost functions, describing the QoS of a video stream.

3 citations