scispace - formally typeset
Search or ask a question
Author

Paul Fischer

Bio: Paul Fischer is an academic researcher from Technical University of Denmark. The author has contributed to research in topics: Computational learning theory & Time complexity. The author has an hindex of 14, co-authored 61 publications receiving 5836 citations. Previous affiliations of Paul Fischer include Technical University of Dortmund.


Papers
More filters
Journal ArticleDOI
TL;DR: This work shows that the optimal logarithmic regret is also achievable uniformly over time, with simple and efficient policies, and for all reward distributions with bounded support.
Abstract: Reinforcement learning policies face the exploration versus exploitation dilemma, i.e. the search for a balance between exploring the environment to find profitable actions while taking the empirically best action as often as possible. A popular measure of a policy's success in addressing this dilemma is the regret, that is the loss due to the fact that the globally optimal policy is not followed all the times. One of the simplest examples of the exploration/exploitation dilemma is the multi-armed bandit problem. Lai and Robbins were the first ones to show that the regret for this problem has to grow at least logarithmically in the number of plays. Since then, policies which asymptotically achieve this regret have been devised by Lai and Robbins and many others. In this work we show that the optimal logarithmic regret is also achievable uniformly over time, with simple and efficient policies, and for all reward distributions with bounded support.

6,361 citations

Book
01 Jan 1868
TL;DR: In this article, the authors present a review of the work of the Second Empire to establish-Imperialism in Mexico and the results of the labours of a Scientific Mission originally sent out under the shadow of the French Army.
Abstract: AbstractTHE ill-fated attempt of the Second Empire to estab-Imperialism in Mexico has had at least one good result in the work now before us, in which the labours of a Scientific Mission originally sent out under the shadow of the French Army are given to the world. The materials accumulated by M. Bocourt and his Fellow-Naturalists, were deposited in the National Museum of the Jardin des Plantes, and the elaboration of them entrusted to special workers in the different branches of science. In 1870 three livraisons were issued, each forming the commencement of a separate section of the work, as planned out under the direction of M. Milne-Edwards. These relate to the terrestrial and fluviatile Molluscs, by MM. Fischer and Crosse; to the Orthopterous Insects and Myriapods, by M. Henri de Saussure; and to the Reptiles and Batrachians, by MM. Auguste Duméril and Bocourt. The fall of the Empire and German occupation stopped the immediate progress of the work, but we are glad to see it has now been resumed. A second livraison of the section devoted to the Myriapods, prepared by MM. H. de Saussure and Humbert, has been lately issued, and we believe it is fully intended to bring the work to a conclusion. It will be observed that authors engaged on the various sections are all well-known authorities on the subjects of which they treat, and that the figures and illustrations are of an elaborate character. We are the more glad to call the attention of our readers to the revival of this work, because it does not appear to be very generally known to naturalists, and because it has lately been the subject of a most unjustifiable attack in an English scientific periodical.* After a general condemnation of the work we are there informed that it is “a lamentable exhibition of the very backward state of zoological science in the French capital.” As to the justice of this remark we need only appeal to the recent numbers of the “Annales des Sciences Naturelles” and the “Nouvelles Annales du Musée,” which are replete with zoological memoirs of the highest interest, and to the great work on fossil birds, by Alphonse Milne-Edwards, recently completed, which is alone sufficient to refute such a sweeping accusation. That the spirit of scientific enterprise is still alive in France is, moreover, sufficiently manifest by the grand researches of Pêre David in Chinese Tibet, and of Grandi-didier in Madagascar, while there is certainly no lack of scientific experts to bring their discoveries before the public. A more baseless and unjust attack was certainly never penned against the savants of a sister nation.Mission Scientifique au Mexique et dans l'Amerique Centrale. Recherches Zoologiques publiées sous la direction de M. Milne-Edwards. Livraisons 4. (Paris: 1870–72.)

56 citations

Journal ArticleDOI
TL;DR: It is proved that 2-term is learnable by a conjunction of a 2-CNF and a 1-DNF and that k-RSE, the class of ring-sum-expansions containing only monomials of length at most k, can be learned from positive (negative) examples alone.
Abstract: The problem of learning ring-sum-expansions from examples is studied. Ring-sum-expansions (RSE) are representations of Boolean functions over the base $\{ \wedge , \oplus ,1 \}$, which reflect arithmetic operations in $GF(2)$. k-RSE is the class of ring-sum-expansions containing only monomials of length at most k. k-term is the class of ring-sum-expansions having at most k monomials. It is shown that k-RSE, $k \geq 1$, is learnable while k-term-RSE, $k \geq 2$, is not learnable if $RP e NP$. Without using a complexity-theoretical hypothesis, it is proven that k-RSE, $k \geq 1$, and k-term-RSE, $k \geq 2$ cannot be learned from positive (negative) examples alone. However, if the restriction that the hypothesis which is output by the learning algorithm is also a k-RSE is suspended, then k-RSE is learnable from positive (negative) examples only. Moreover, it is proved that 2-term is learnable by a conjunction of a 2-CNF and a 1-DNF. Finally the paper presents learning (on-line prediction) algorithms for k-...

47 citations

Journal ArticleDOI
TL;DR: This paper presents a generic algorithm using randomized hypotheses that can tolerate noise rates slightly larger than ε/(1 + ε) while using samples of size d
Abstract: In this paper, we prove various results about PAC learning in the presence of malicious noise. Our main interest is the sample size behavior of learning algorithms. We prove the first nontrivial sample complexity lower bound in this model by showing that order of e/D2 + d/D (up to logarithmic factors) examples are necessary for PAC learning any target class of {0,1}-valued functions of VC dimension d, where e is the desired accuracy and e = e/(1 + e) - D the malicious noise rate (it is well known that any nontrivial target class cannot be PAC learned with accuracy e and malicious noise rate e ≥ e/(1 + e), this irrespective to sample complexity). We also show that this result cannot be significantly improved in general by presenting efficient learning algorithms for the class of all subsets of d elements and the class of unions of at most d intervals on the real line. This is especialy interesting as we can also show that the popular minimum disagreement strategy needs samples of size d e/D2, hence is not optimal with respect to sample size. We then discuss the use of randomized hypotheses. For these the bound e/(1 + e) on the noise rate is no longer true and is replaced by 2e/(1 + 2e). In fact, we present a generic algorithm using randomized hypotheses that can tolerate noise rates slightly larger than e/(1 + e) while using samples of size d/e as in the noise-free case. Again one observes a quadratic powerlaw (in this case de/D2, D = 2e/(1 + 2e) - e) as D goes to zero. We show upper and lower bounds of this order.

40 citations


Cited by
More filters
Book
01 Jan 1988
TL;DR: This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.
Abstract: Reinforcement learning, one of the most active research areas in artificial intelligence, is a computational approach to learning whereby an agent tries to maximize the total amount of reward it receives when interacting with a complex, uncertain environment. In Reinforcement Learning, Richard Sutton and Andrew Barto provide a clear and simple account of the key ideas and algorithms of reinforcement learning. Their discussion ranges from the history of the field's intellectual foundations to the most recent developments and applications. The only necessary mathematical background is familiarity with elementary concepts of probability. The book is divided into three parts. Part I defines the reinforcement learning problem in terms of Markov decision processes. Part II provides basic solution methods: dynamic programming, Monte Carlo methods, and temporal-difference learning. Part III presents a unified view of the solution methods and incorporates artificial neural networks, eligibility traces, and planning; the two final chapters present case studies and consider the future of reinforcement learning.

37,989 citations

01 Jan 2015
TL;DR: In this article, the authors show that the DQN algorithm suffers from substantial overestimation in some games in the Atari 2600 domain, and they propose a specific adaptation to the algorithm and show that this algorithm not only reduces the observed overestimations, but also leads to much better performance on several games.
Abstract: The popular Q-learning algorithm is known to overestimate action values under certain conditions. It was not previously known whether, in practice, such overestimations are common, whether they harm performance, and whether they can generally be prevented. In this paper, we answer all these questions affirmatively. In particular, we first show that the recent DQN algorithm, which combines Q-learning with a deep neural network, suffers from substantial overestimations in some games in the Atari 2600 domain. We then show that the idea behind the Double Q-learning algorithm, which was introduced in a tabular setting, can be generalized to work with large-scale function approximation. We propose a specific adaptation to the DQN algorithm and show that the resulting algorithm not only reduces the observed overestimations, as hypothesized, but that this also leads to much better performance on several games.

4,301 citations

Book
01 Jan 2006
TL;DR: In this paper, the authors provide a comprehensive treatment of the problem of predicting individual sequences using expert advice, a general framework within which many related problems can be cast and discussed, such as repeated game playing, adaptive data compression, sequential investment in the stock market, sequential pattern analysis, and several other problems.
Abstract: This important text and reference for researchers and students in machine learning, game theory, statistics and information theory offers a comprehensive treatment of the problem of predicting individual sequences. Unlike standard statistical approaches to forecasting, prediction of individual sequences does not impose any probabilistic assumption on the data-generating mechanism. Yet, prediction algorithms can be constructed that work well for all possible sequences, in the sense that their performance is always nearly as good as the best forecasting strategy in a given reference class. The central theme is the model of prediction using expert advice, a general framework within which many related problems can be cast and discussed. Repeated game playing, adaptive data compression, sequential investment in the stock market, sequential pattern analysis, and several other problems are viewed as instances of the experts' framework and analyzed from a common nonstochastic standpoint that often reveals new and intriguing connections.

3,615 citations

Book ChapterDOI
18 Sep 2006
TL;DR: In this article, a bandit-based Monte-Carlo planning algorithm is proposed for large state-space Markovian decision problems (MDPs), which is one of the few viable approaches to find near-optimal solutions.
Abstract: For large state-space Markovian Decision Problems Monte-Carlo planning is one of the few viable approaches to find near-optimal solutions. In this paper we introduce a new algorithm, UCT, that applies bandit ideas to guide Monte-Carlo planning. In finite-horizon or discounted MDPs the algorithm is shown to be consistent and finite sample bounds are derived on the estimation error due to sampling. Experimental results show that in several domains, UCT is significantly more efficient than its alternatives.

2,695 citations

Journal ArticleDOI
TL;DR: A survey of the literature to date of Monte Carlo tree search, intended to provide a snapshot of the state of the art after the first five years of MCTS research, outlines the core algorithm's derivation, impart some structure on the many variations and enhancements that have been proposed, and summarizes the results from the key game and nongame domains.
Abstract: Monte Carlo tree search (MCTS) is a recently proposed search method that combines the precision of tree search with the generality of random sampling. It has received considerable interest due to its spectacular success in the difficult problem of computer Go, but has also proved beneficial in a range of other domains. This paper is a survey of the literature to date, intended to provide a snapshot of the state of the art after the first five years of MCTS research. We outline the core algorithm's derivation, impart some structure on the many variations and enhancements that have been proposed, and summarize the results from the key game and nongame domains to which MCTS methods have been applied. A number of open research questions indicate that the field is ripe for future work.

2,682 citations