Home
/
Authors
/
Sutton Monro

Author

Sutton Monro

University of North Carolina at Chapel Hill

Bio: Sutton Monro is an academic researcher from University of North Carolina at Chapel Hill. The author has contributed to research in topics: Minimax approximation algorithm & Approximation error. The author has an hindex of 1, co-authored 1 publications receiving 7621 citations.

Papers

PDF

Open Access

More filters

Journal Article•DOI•

A Stochastic Approximation Method

[...]

Herbert Robbins¹, Sutton Monro¹•Institutions (1)

University of North Carolina at Chapel Hill¹

01 Sep 1951-Annals of Mathematical Statistics

TL;DR: In this article, a method for making successive experiments at levels x1, x2, ··· in such a way that xn will tend to θ in probability is presented.

...read moreread less

Abstract: Let M(x) denote the expected value at level x of the response to a certain experiment. M(x) is assumed to be a monotone function of x but is unknown to the experimenter, and it is desired to find the solution x = θ of the equation M(x) = α, where a is a given constant. We give a method for making successive experiments at levels x1, x2, ··· in such a way that xn will tend to θ in probability.

...read moreread less

9,312 citations

Cited by

PDF

Open Access

More filters

Pattern Recognition and Machine Learning

[...]

Christopher M. Bishop¹•Institutions (1)

Microsoft¹

01 Jan 2006

TL;DR: Probability distributions of linear models for regression and classification are given in this article, along with a discussion of combining models and combining models in the context of machine learning and classification.

...read moreread less

Abstract: Probability Distributions.- Linear Models for Regression.- Linear Models for Classification.- Neural Networks.- Kernel Methods.- Sparse Kernel Machines.- Graphical Models.- Mixture Models and EM.- Approximate Inference.- Sampling Methods.- Continuous Latent Variables.- Sequential Data.- Combining Models.

...read moreread less

10,141 citations

Journal Article•DOI•

The self-organizing map

[...]

Teuvo Kohonen¹•Institutions (1)

Helsinki University of Technology¹

01 Sep 1990

TL;DR: The self-organizing map, an architecture suggested for artificial neural networks, is explained by presenting simulation experiments and practical applications, and an algorithm which order responses spatially is reviewed, focusing on best matching cell selection and adaptation of the weight vectors.

...read moreread less

Abstract: The self-organized map, an architecture suggested for artificial neural networks, is explained by presenting simulation experiments and practical applications. The self-organizing map has the property of effectively creating spatially organized internal representations of various features of input signals and their abstractions. One result of this is that the self-organization process can discover semantic relationships in sentences. Brain maps, semantic maps, and early work on competitive learning are reviewed. The self-organizing map algorithm (an algorithm which order responses spatially) is reviewed, focusing on best matching cell selection and adaptation of the weight vectors. Suggestions for applying the self-organizing map algorithm, demonstrations of the ordering process, and an example of hierarchical clustering of data are presented. Fine tuning the map by learning vector quantization is addressed. The use of self-organized maps in practical speech recognition and a simulation experiment on semantic mapping are discussed. >

...read moreread less

7,883 citations

Posted Content•

ADADELTA: An Adaptive Learning Rate Method

[...]

Matthew D. Zeiler

22 Dec 2012-arXiv: Learning

TL;DR: A novel per-dimension learning rate method for gradient descent called ADADELTA that dynamically adapts over time using only first order information and has minimal computational overhead beyond vanilla stochastic gradient descent is presented.

...read moreread less

Abstract: We present a novel per-dimension learning rate method for gradient descent called ADADELTA. The method dynamically adapts over time using only first order information and has minimal computational overhead beyond vanilla stochastic gradient descent. The method requires no manual tuning of a learning rate and appears robust to noisy gradient information, different model architecture choices, various data modalities and selection of hyperparameters. We show promising results compared to other methods on the MNIST digit classification task using a single machine and on a large scale voice dataset in a distributed cluster environment.

...read moreread less

6,189 citations

Journal Article•DOI•

Transformed Up‐Down Methods in Psychoacoustics

[...]

H. Levitt

01 Feb 1971-Journal of the Acoustical Society of America

TL;DR: A broad class of up‐down methods used in psychoacoustics with due emphasis on the related problems of parameter estimation and the efficient placing of observations is described, including examples where conventional techniques are inapplicable.

...read moreread less

Abstract: During the past decade a number of variations in the simple up‐down procedure have been used in psychoacoustic testing. A broad class of these methods is described with due emphasis on the related problems of parameter estimation and the efficient placing of observations. The advantages of up‐down methods are many, including simplicity, high efficiency, robustness, small‐sample reliability, and relative freedom from restrictive assumptions. Several applications of these procedures in psychoacoustics are described, including examples where conventional techniques are inapplicable.

...read moreread less

5,306 citations

Posted Content•

An overview of gradient descent optimization algorithms

[...]

Sebastian Ruder

15 Sep 2016-arXiv: Learning

TL;DR: This article looks at different variants of gradient descent, summarize challenges, introduce the most common optimization algorithms, review architectures in a parallel and distributed setting, and investigate additional strategies for optimizing gradient descent.

...read moreread less

Abstract: Gradient descent optimization algorithms, while increasingly popular, are often used as black-box optimizers, as practical explanations of their strengths and weaknesses are hard to come by. This article aims to provide the reader with intuitions with regard to the behaviour of different algorithms that will allow her to put them to use. In the course of this overview, we look at different variants of gradient descent, summarize challenges, introduce the most common optimization algorithms, review architectures in a parallel and distributed setting, and investigate additional strategies for optimizing gradient descent.

...read moreread less

4,157 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse