Showing papers by "Yishay Mansour published in 2002"

PDF

Open Access

Journal Article•DOI•

A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes

[...]

Michael Kearns¹, Yishay Mansour², Andrew Y. Ng³•Institutions (3)

University of Pennsylvania¹, Tel Aviv University², University of California, Berkeley³

01 Nov 2002-Machine Learning

TL;DR: This paper presents a new algorithm that, given only a generative model (a natural and common type of simulator) for an arbitrary MDP, performs on-line, near-optimal planning with a per-state running time that has no dependence on the number of states.

...read moreread less

Abstract: A critical issue for the application of Markov decision processes (MDPs) to realistic problems is how the complexity of planning scales with the size of the MDP In stochastic environments with very large or infinite state spaces, traditional planning and reinforcement learning algorithms may be inapplicable, since their running time typically grows linearly with the state space size in the worst case In this paper we present a new algorithm that, given only a generative model (a natural and common type of simulator) for an arbitrary MDP, performs on-line, near-optimal planning with a per-state running time that has no dependence on the number of states The running time is exponential in the horizon time (which depends only on the discount factor γ and the desired degree of approximation to the optimal policy) Our algorithm thus provides a different complexity trade-off than classical algorithms such as value iteration—rather than scaling linearly in both horizon time and state space size, our running time trades an exponential dependence on the former in exchange for no dependence on the latter Our algorithm is based on the idea of sparse sampling We prove that a randomly sampled look-ahead tree that covers only a vanishing fraction of the full look-ahead tree nevertheless suffices to compute near-optimal actions from any state of an MDP Practical implementations of the algorithm are discussed, and we draw ties to our related recent results on finding a near-best strategy from a given class of strategies in very large partially observable MDPs (Kearns, Mansour, & Ng Neural information processing systems 13, to appear)

...read moreread less

416 citations

Book Chapter•DOI•

PAC Bounds for Multi-armed Bandit and Markov Decision Processes

[...]

Eyal Even-Dar¹, Shie Mannor², Yishay Mansour¹•Institutions (2)

Tel Aviv University¹, Technion – Israel Institute of Technology²

08 Jul 2002

TL;DR: The bandit problem is revisited and considered under the PAC model, and it is shown that given n arms, it suffices to pull the arms O(n/?2 log 1/?) times to find an ?-optimal arm with probability of at least 1 - ?.

...read moreread less

Abstract: The bandit problem is revisited and considered under the PAC model. Our main contribution in this part is to show that given n arms, it suffices to pull the arms O(n/?2 log 1/?) times to find an ?-optimal arm with probability of at least 1 - ?. This is in contrast to the naive bound of O(n/?2 log n/?). We derive another algorithm whose complexity depends on the specific setting of the rewards, rather than the worst case setting. We also provide a matching lower bound. We show how given an algorithm for the PAC model Multi-armed Bandit problem, one can derive a batch learningalg orithm for Markov Decision Processes. This is done essentially by simulatingV alue Iteration, and in each iteration invokingt he multi-armed bandit algorithm. Using our PAC algorithm for the multi-armed bandit problem we improve the dependence on the number of actions.

...read moreread less

392 citations

Journal Article•

Almost k-wise independence versus k-wise independence.

[...]

Noga Alon¹, Oded Goldreich², Yishay Mansour¹•Institutions (2)

Tel Aviv University¹, Weizmann Institute of Science²

01 Jan 2002-Electronic Colloquium on Computational Complexity

TL;DR: It is shown that there exist (e, k)-wise independent distributions whose statistical distance is at least nO(k) ċ e from any k-wise independent distribution.

...read moreread less

Abstract: We say that a distribution over {0,1}n is (e,k)-wise independent if its restriction to every k coordinates results in a distribution that is e-close to the uniform distribution. A natural question regarding (e, k)-wise independent distributions is how close they are to some k-wise independent distribution. We show that there exist (e, k)-wise independent distributions whose statistical distance is at least nO(k) ċ e from any k-wise independent distribution. In addition, we show that for any (e,k)-wise independent distribution there exists some k-wise independent distribution, whose statistical distance is nO(k) ċ e.

...read moreread less

52 citations

Proceedings Article•

Efficient nash computation in large population games with bounded influence

[...]

Michael Kearns¹, Yishay Mansour²•Institutions (2)

University of Pennsylvania¹, Tel Aviv University²

01 Aug 2002

TL;DR: This work introduces a general representation of large-population games in which each player's influence on the others is centralized and limited, but may otherwise be arbitrary.

...read moreread less

Abstract: We introduce a general representation of large-population games in which each player's influence on the others is centralized and limited, but may otherwise be arbitrary. This representation significantly generalizes the class known as congestion games in a natural way. Our main results are provably correct and efficient algorithms for computing and learning approximate Nash equilibria in this general framework.

...read moreread less

45 citations

Journal Article•DOI•

Boosting Using Branching Programs

[...]

Yishay Mansour¹, David McAllester²•Institutions (2)

Tel Aviv University¹, AT&T²

01 Feb 2002-Journal of Computer and System Sciences

TL;DR: It is shown that under the same weak learning assumption used for decision tree learning there exists a greedy BP-growth algorithm whose training error is guaranteed to decline as 2−b`|T|, where |T| is the size of the branching program and b is a constant determined by the weak learning hypothesis.

...read moreread less

40 citations

Proceedings Article•DOI•

Harmonic buffer management policy for shared memory switches

[...]

Alexander Kesselman¹, Yishay Mansour¹•Institutions (1)

Tel Aviv University¹

07 Nov 2002

TL;DR: The harmonic policy is proposed, a new scheduling policy based on a system of inequalities and thresholds that achieves high throughput and easily adapts to changing load conditions and its throughput competitive ratio is almost optimal.

...read moreread less

Abstract: We introduce a new general scheme for shared memory nonpreemptive scheduling policies. Our scheme utilizes a system of inequalities and thresholds and accepts a packet if it does not violate any of the inequalities. We demonstrate that many of the existing policies can be described using our scheme, thus validating its generality. We propose a new scheduling policy, based on our general scheme, which we call the harmonic policy. Our simulations show that the harmonic policy both achieves high throughput and easily adapts to changing load conditions. We also perform a theoretical analysis of the harmonic policy and demonstrate that its throughput competitive ratio is almost optimal.

...read moreread less

32 citations

Journal Article•DOI•

Simple Learning Algorithms for Decision Trees and Multivariate Polynomials

[...]

Nader H. Bshouty, Yishay Mansour

01 Jun 2002-SIAM Journal on Computing

TL;DR: This new approach yields simple learning algorithms for multivariate polynomials and decision trees over finite fields under any constant bounded product distribution and gives a learning algorithm for an O(log n)-depth decision tree from membership queries only and a new learning algorithm of any multivariatePolynomial over sufficiently large fields from membership query only.

...read moreread less

Abstract: In this paper we develop a new approach for learning decision trees and multivariate polynomials via interpolation of multivariate polynomials. This new approach yields simple learning algorithms for multivariate polynomials and decision trees over finite fields under any constant bounded product distribution. The output hypothesis is a (single) multivariate polynomial that is an $\epsilon$-approximation of the target under any constant bounded product distribution. The new approach demonstrates the learnability of many classes under any constant bounded product distribution and using membership queries, such as j-disjoint disjunctive normal forms (DNFs) and multivariate polynomials with bounded degree over any field. The technique shows how to interpolate multivariate polynomials with bounded term size from membership queries only. This, in particular, gives a learning algorithm for an O(log n)-depth decision tree from membership queries only and a new learning algorithm of any multivariate polynomial over sufficiently large fields from membership queries only. We show that our results for learning from membership queries only are the best possible.

...read moreread less

29 citations

Proceedings Article•DOI•

Predicting and bypassing end-to-end internet service degradations

[...]

Anat Bremler-Barr¹, Edith Cohen², Haim Kaplan¹, Yishay Mansour¹•Institutions (2)

Tel Aviv University¹, AT&T²

06 Nov 2002

TL;DR: It is shown that gateway selection using predictors can reduce the degradations to half of that obtained by routing all the connections through the best gateway.

...read moreread less

Abstract: We study the patterns and predictability of Internet End-to-End service degradations, where a degradation is a significant deviation of the round trip time between a client and a server. We use simultaneous RTT measurements collected from several locations to a large representative set of Web sites and study the duration and extent of degradations. We combine these measurements with BGP cluster information to learn on the location of the cause.We evaluate a number of predictors based upon Hidden Markov Models and Markov Models. Predictors typically exhibit a tradeoff between two types of errors, false positives (incorrect degradation prediction) and false negatives (a degradation is not predicted). The costs of these error-types is application dependent, but we capture the entire spectrum using a precision versus recall tradeoff. Using this methodology, we learn what information is most valuable for prediction (recency versus quantity of past measurements). Surprisingly, we also conclude that predictors that utilize history in a very simple way perform as well as more sophisticated ones.One important application of prediction is gateway selection, which is applicable when a LAN is connected through multiple gateways to one or several ISP's. Gateway selection can boost reliability and survivability by selecting for each connection the (hopefully) best gateway. We show that gateway selection using our predictors can reduce the degradations to half of that obtained by routing all the connections through the best gateway.

...read moreread less

22 citations

Journal Article•

QoS-Competitive Video Buffering

[...]

Alexander Kesselman, Yishay Mansour

01 Jan 2002-Computing and Informatics \/ Computers and Artificial Intelligence

TL;DR: This work presents competitive policies for a wide range of cost functions, describing the QoS of a video stream, and considers online policies for selective frame discard and analyzes their performance by means of competitive analysis.

...read moreread less

Abstract: Many multimedia applications require transmission of streaming video from a server to a client across an internetwork. In many cases loss may be unavoidable due to congestion or heterogeneous nature of the network. We explore how discard policies can be used in order to maximize the quality of service (QoS) perceived by the client. In our model the QoS of a video stream is measured in terms of a cost function, which takes into account the discarded frames. In this paper we consider online policies for selective frame discard and analyze their performance by means of competitive analysis. In competitive analysis the performance of a given online policy is compared with that of an optimal offline policy. In this work we present competitive policies for a wide range of cost functions, describing the QoS of a video stream.

...read moreread less

3 citations