Home
/
Authors
/
Paul Fischer

Author

Paul Fischer

Other affiliations: Technical University of Dortmund

Bio: Paul Fischer is an academic researcher from Technical University of Denmark. The author has contributed to research in topics: Computational learning theory & Time complexity. The author has an hindex of 14, co-authored 61 publications receiving 5836 citations. Previous affiliations of Paul Fischer include Technical University of Dortmund.

Papers published on a yearly basis

2023
2018
2017
2014
2013
2012
2011
2009
2008
2005
2004
2002
2001
2000
1999
1998
1997
1996
1995
1994
1993
1992
1990
1986
1890
1882
1870
1868
1865
1861

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Finite-time Analysis of the Multiarmed Bandit Problem

[...]

Peter Auer¹, Nicolò Cesa-Bianchi², Paul Fischer³•Institutions (3)

Graz University of Technology¹, University of Milan², Technical University of Dortmund³

01 May 2002-Machine Learning

TL;DR: This work shows that the optimal logarithmic regret is also achievable uniformly over time, with simple and efficient policies, and for all reward distributions with bounded support.

...read moreread less

Abstract: Reinforcement learning policies face the exploration versus exploitation dilemma, i.e. the search for a balance between exploring the environment to find profitable actions while taking the empirically best action as often as possible. A popular measure of a policy's success in addressing this dilemma is the regret, that is the loss due to the fact that the globally optimal policy is not followed all the times. One of the simplest examples of the exploration/exploitation dilemma is the multi-armed bandit problem. Lai and Robbins were the first ones to show that the regret for this problem has to grow at least logarithmically in the number of plays. Since then, policies which asymptotically achieve this regret have been devised by Lai and Robbins and many others. In this work we show that the optimal logarithmic regret is also achievable uniformly over time, with simple and efficient policies, and for all reward distributions with bounded support.

...read moreread less

6,361 citations

Proceedings Article•

Finite-Time Regret Bounds for the Multiarmed Bandit Problem

[...]

Nicolò Cesa-Bianchi, Paul Fischer

24 Jul 1998

82 citations

Book•

Mission scientifique au Mexique et dans l'Amérique Centrale :

[...]

Mission scientifique au Mexique et dans l'Amérique Centrale., J. M. A. Aubin, Brasseur de Bourbourg, P. Brocchi, Hippolyte Crosse, Auguste Dollfus, Auguste Henri André Duméril, Paul Fischer, Eugène Fournier, E. Guillemin-Tarayre, E. T. Hamy, Aloïs Humbert, H. Milne-Edwards, Eugène de. Mont-Serrat, Henri de Saussure, Léon Vaillant - Show less +12 more

01 Jan 1868

TL;DR: In this article, the authors present a review of the work of the Second Empire to establish-Imperialism in Mexico and the results of the labours of a Scientific Mission originally sent out under the shadow of the French Army.

...read moreread less

Abstract: AbstractTHE ill-fated attempt of the Second Empire to estab-Imperialism in Mexico has had at least one good result in the work now before us, in which the labours of a Scientific Mission originally sent out under the shadow of the French Army are given to the world. The materials accumulated by M. Bocourt and his Fellow-Naturalists, were deposited in the National Museum of the Jardin des Plantes, and the elaboration of them entrusted to special workers in the different branches of science. In 1870 three livraisons were issued, each forming the commencement of a separate section of the work, as planned out under the direction of M. Milne-Edwards. These relate to the terrestrial and fluviatile Molluscs, by MM. Fischer and Crosse; to the Orthopterous Insects and Myriapods, by M. Henri de Saussure; and to the Reptiles and Batrachians, by MM. Auguste Duméril and Bocourt. The fall of the Empire and German occupation stopped the immediate progress of the work, but we are glad to see it has now been resumed. A second livraison of the section devoted to the Myriapods, prepared by MM. H. de Saussure and Humbert, has been lately issued, and we believe it is fully intended to bring the work to a conclusion. It will be observed that authors engaged on the various sections are all well-known authorities on the subjects of which they treat, and that the figures and illustrations are of an elaborate character. We are the more glad to call the attention of our readers to the revival of this work, because it does not appear to be very generally known to naturalists, and because it has lately been the subject of a most unjustifiable attack in an English scientific periodical.* After a general condemnation of the work we are there informed that it is “a lamentable exhibition of the very backward state of zoological science in the French capital.” As to the justice of this remark we need only appeal to the recent numbers of the “Annales des Sciences Naturelles” and the “Nouvelles Annales du Musée,” which are replete with zoological memoirs of the highest interest, and to the great work on fossil birds, by Alphonse Milne-Edwards, recently completed, which is alone sufficient to refute such a sweeping accusation. That the spirit of scientific enterprise is still alive in France is, moreover, sufficiently manifest by the grand researches of Pêre David in Chinese Tibet, and of Grandi-didier in Madagascar, while there is certainly no lack of scientific experts to bring their discoveries before the public. A more baseless and unjust attack was certainly never penned against the savants of a sister nation.Mission Scientifique au Mexique et dans l'Amerique Centrale. Recherches Zoologiques publiées sous la direction de M. Milne-Edwards. Livraisons 4. (Paris: 1870–72.)

...read moreread less

56 citations

Journal Article•DOI•

On learning ring-sum-expansions

[...]

Paul Fischer, Hans Ulrich Simon

01 Feb 1992-SIAM Journal on Computing

TL;DR: It is proved that 2-term is learnable by a conjunction of a 2-CNF and a 1-DNF and that k-RSE, the class of ring-sum-expansions containing only monomials of length at most k, can be learned from positive (negative) examples alone.

...read moreread less

Abstract: The problem of learning ring-sum-expansions from examples is studied. Ring-sum-expansions (RSE) are representations of Boolean functions over the base $\{ \wedge , \oplus ,1 \}$, which reflect arithmetic operations in $GF(2)$. k-RSE is the class of ring-sum-expansions containing only monomials of length at most k. k-term is the class of ring-sum-expansions having at most k monomials. It is shown that k-RSE, $k \geq 1$, is learnable while k-term-RSE, $k \geq 2$, is not learnable if $RP e NP$. Without using a complexity-theoretical hypothesis, it is proven that k-RSE, $k \geq 1$, and k-term-RSE, $k \geq 2$ cannot be learned from positive (negative) examples alone. However, if the restriction that the hypothesis which is output by the learning algorithm is also a k-RSE is suspended, then k-RSE is learnable from positive (negative) examples only. Moreover, it is proved that 2-term is learnable by a conjunction of a 2-CNF and a 1-DNF. Finally the paper presents learning (on-line prediction) algorithms for k-...

...read moreread less

47 citations

Journal Article•DOI•

Sample-efficient strategies for learning in the presence of noise

[...]

Nicolò Cesa-Bianchi¹, Eli Dichterman², Paul Fischer, Eli Shamir³, Hans Ulrich Simon⁴ - Show less +1 more•Institutions (4)

University of Milan¹, IBM², Hebrew University of Jerusalem³, Ruhr University Bochum⁴

01 Sep 1999-Journal of the ACM

TL;DR: This paper presents a generic algorithm using randomized hypotheses that can tolerate noise rates slightly larger than ε/(1 + ε) while using samples of size d

...read moreread less

Abstract: In this paper, we prove various results about PAC learning in the presence of malicious noise. Our main interest is the sample size behavior of learning algorithms. We prove the first nontrivial sample complexity lower bound in this model by showing that order of e/D2 + d/D (up to logarithmic factors) examples are necessary for PAC learning any target class of {0,1}-valued functions of VC dimension d, where e is the desired accuracy and e = e/(1 + e) - D the malicious noise rate (it is well known that any nontrivial target class cannot be PAC learned with accuracy e and malicious noise rate e ≥ e/(1 + e), this irrespective to sample complexity). We also show that this result cannot be significantly improved in general by presenting efficient learning algorithms for the class of all subsets of d elements and the class of unions of at most d intervals on the real line. This is especialy interesting as we can also show that the popular minimum disagreement strategy needs samples of size d e/D2, hence is not optimal with respect to sample size. We then discuss the use of randomized hypotheses. For these the bound e/(1 + e) on the noise rate is no longer true and is replaced by 2e/(1 + 2e). In fact, we present a generic algorithm using randomized hypotheses that can tolerate noise rates slightly larger than e/(1 + e) while using samples of size d/e as in the noise-free case. Again one observes a quadratic powerlaw (in this case de/D2, D = 2e/(1 + 2e) - e) as D goes to zero. We show upper and lower bounds of this order.

...read moreread less

40 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13

Collapse

Cited by

PDF

Open Access

More filters

Book•

Reinforcement Learning: An Introduction

[...]

Richard S. Sutton¹, Andrew G. Barto•Institutions (1)

Massachusetts Institute of Technology¹

01 Jan 1988

TL;DR: This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.

...read moreread less

Abstract: Reinforcement learning, one of the most active research areas in artificial intelligence, is a computational approach to learning whereby an agent tries to maximize the total amount of reward it receives when interacting with a complex, uncertain environment. In Reinforcement Learning, Richard Sutton and Andrew Barto provide a clear and simple account of the key ideas and algorithms of reinforcement learning. Their discussion ranges from the history of the field's intellectual foundations to the most recent developments and applications. The only necessary mathematical background is familiarity with elementary concepts of probability. The book is divided into three parts. Part I defines the reinforcement learning problem in terms of Markov decision processes. Part II provides basic solution methods: dynamic programming, Monte Carlo methods, and temporal-difference learning. Part III presents a unified view of the solution methods and incorporates artificial neural networks, eligibility traces, and planning; the two final chapters present case studies and consider the future of reinforcement learning.

...read moreread less

37,989 citations

Deep reinforcement learning with double Q-learning

[...]

H Van Hasselt, Arthur Guez, David Silver

01 Jan 2015

TL;DR: In this article, the authors show that the DQN algorithm suffers from substantial overestimation in some games in the Atari 2600 domain, and they propose a specific adaptation to the algorithm and show that this algorithm not only reduces the observed overestimations, but also leads to much better performance on several games.

...read moreread less

Abstract: The popular Q-learning algorithm is known to overestimate action values under certain conditions. It was not previously known whether, in practice, such overestimations are common, whether they harm performance, and whether they can generally be prevented. In this paper, we answer all these questions affirmatively. In particular, we first show that the recent DQN algorithm, which combines Q-learning with a deep neural network, suffers from substantial overestimations in some games in the Atari 2600 domain. We then show that the idea behind the Double Q-learning algorithm, which was introduced in a tabular setting, can be generalized to work with large-scale function approximation. We propose a specific adaptation to the DQN algorithm and show that the resulting algorithm not only reduces the observed overestimations, as hypothesized, but that this also leads to much better performance on several games.

...read moreread less

4,301 citations

Book•

Prediction, learning, and games

[...]

Nicolò Cesa-Bianchi¹, Gábor Lugosi²•Institutions (2)

University of Milan¹, Pompeu Fabra University²

01 Jan 2006

TL;DR: In this paper, the authors provide a comprehensive treatment of the problem of predicting individual sequences using expert advice, a general framework within which many related problems can be cast and discussed, such as repeated game playing, adaptive data compression, sequential investment in the stock market, sequential pattern analysis, and several other problems.

...read moreread less

Abstract: This important text and reference for researchers and students in machine learning, game theory, statistics and information theory offers a comprehensive treatment of the problem of predicting individual sequences. Unlike standard statistical approaches to forecasting, prediction of individual sequences does not impose any probabilistic assumption on the data-generating mechanism. Yet, prediction algorithms can be constructed that work well for all possible sequences, in the sense that their performance is always nearly as good as the best forecasting strategy in a given reference class. The central theme is the model of prediction using expert advice, a general framework within which many related problems can be cast and discussed. Repeated game playing, adaptive data compression, sequential investment in the stock market, sequential pattern analysis, and several other problems are viewed as instances of the experts' framework and analyzed from a common nonstochastic standpoint that often reveals new and intriguing connections.

...read moreread less

3,615 citations

Book Chapter•DOI•

Bandit based monte-carlo planning

[...]

Levente Kocsis¹, Csaba Szepesvári¹•Institutions (1)

Hungarian Academy of Sciences¹

18 Sep 2006

TL;DR: In this article, a bandit-based Monte-Carlo planning algorithm is proposed for large state-space Markovian decision problems (MDPs), which is one of the few viable approaches to find near-optimal solutions.

...read moreread less

Abstract: For large state-space Markovian Decision Problems Monte-Carlo planning is one of the few viable approaches to find near-optimal solutions. In this paper we introduce a new algorithm, UCT, that applies bandit ideas to guide Monte-Carlo planning. In finite-horizon or discounted MDPs the algorithm is shown to be consistent and finite sample bounds are derived on the estimation error due to sampling. Experimental results show that in several domains, UCT is significantly more efficient than its alternatives.

...read moreread less

2,695 citations

Journal Article•DOI•

A Survey of Monte Carlo Tree Search Methods

[...]

Cameron Browne¹, Edward J. Powley², Daniel Whitehouse², Simon M. Lucas³, Peter I. Cowling², Philipp Rohlfshagen³, S. Tavener¹, Diego Perez³, Spyridon Samothrakis³, Simon Colton¹ - Show less +6 more•Institutions (3)

Imperial College London¹, University of Bradford², University of Essex³

03 Feb 2012-IEEE Transactions on Computational Intelligence and AI in Games

TL;DR: A survey of the literature to date of Monte Carlo tree search, intended to provide a snapshot of the state of the art after the first five years of MCTS research, outlines the core algorithm's derivation, impart some structure on the many variations and enhancements that have been proposed, and summarizes the results from the key game and nongame domains.

...read moreread less

Abstract: Monte Carlo tree search (MCTS) is a recently proposed search method that combines the precision of tree search with the generality of random sampling. It has received considerable interest due to its spectacular success in the difficult problem of computer Go, but has also proved beneficial in a range of other domains. This paper is a survey of the literature to date, intended to provide a snapshot of the state of the art after the first five years of MCTS research. We outline the core algorithm's derivation, impart some structure on the many variations and enhancements that have been proposed, and summarize the results from the key game and nongame domains to which MCTS methods have been applied. A number of open research questions indicate that the field is ripe for future work.

...read moreread less

2,682 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse