Open AccessPosted Content
Efficient Learning of Generalized Linear and Single Index Models with Isotonic Regression
Reads0
Chats0
TLDR
Algorithms for learning GLMs and SIMs, which are both computationally and statistically efficient and modify the isotonic regression step in Isotron to fit a Lipschitz monotonic function, and provide an efficient O(n log(n) algorithm for this step, improving upon the previous O( n2) algorithm.Abstract:
Generalized Linear Models (GLMs) and Single Index Models (SIMs) provide powerful generalizations of linear regression, where the target variable is assumed to be a (possibly unknown) 1-dimensional function of a linear predictor. In general, these problems entail non-convex estimation procedures, and, in practice, iterative local search heuristics are often used. Kalai and Sastry (2009) recently provided the first provably efficient method for learning SIMs and GLMs, under the assumptions that the data are in fact generated under a GLM and under certain monotonicity and Lipschitz constraints. However, to obtain provable performance, the method requires a fresh sample every iteration. In this paper, we provide algorithms for learning GLMs and SIMs, which are both computationally and statistically efficient. We also provide an empirical study, demonstrating their feasibility in practice.read more
Citations
More filters
Journal ArticleDOI
Theoretical Insights Into the Optimization Landscape of Over-Parameterized Shallow Neural Networks
TL;DR: In this paper, the problem of learning a shallow neural network that best fits a training data set was studied in the over-parameterized regime, where the numbers of observations are fewer than the number of parameters in the model.
Proceedings Article
Globally optimal gradient descent for a ConvNet with Gaussian inputs
Alon Brutzkus,Amir Globerson +1 more
TL;DR: This work provides the first global optimality guarantee of gradient descent on a convolutional neural network with ReLU activations, and shows that learning is NP-complete in the general case, but that when the input distribution is Gaussian, gradient descent converges to the global optimum in polynomial time.
Posted Content
Failures of Gradient-Based Deep Learning
TL;DR: This work describes four types of simple problems, for which the gradient-based algorithms commonly used in deep learning either fail or suffer from significant difficulties.
Proceedings Article
Beyond UCB: Optimal and Efficient Contextual Bandits with Regression Oracles
TL;DR: This work describes the minimax rates for contextual bandits with general, potentially nonparametric function classes, and shows that the first universal and optimal reduction from contextual bandits to online regression is provided, which requires no distributional assumptions beyond realizability.
Proceedings Article
Learning One-hidden-layer ReLU Networks via Gradient Descent
TL;DR: It is proved that tensor initialization followed by gradient descent can converge to the ground-truth parameters at a linear rate up to some statistical error.
References
More filters
Journal ArticleDOI
Generalized Linear Models
TL;DR: This is the rst book on generalized linear models written by authors not mostly associated with the biological sciences, and it is thoroughly enjoyable to read.
Book
Kernel Methods for Pattern Analysis
TL;DR: This book provides an easy introduction for students and researchers to the growing field of kernel-based pattern analysis, demonstrating with examples how to handcraft an algorithm or a kernel for a new specific application, and covering all the necessary conceptual and mathematical tools to do so.
Journal ArticleDOI
Generalized linear models. 2nd ed.
Peter McCullagh,John A. Nelder +1 more
TL;DR: A class of statistical models that generalizes classical linear models-extending them to include many other models useful in statistical analysis, of particular interest for statisticians in medicine, biology, agriculture, social science, and engineering.
Journal ArticleDOI
Generalized Linear Models
TL;DR: Generalized linear models, 2nd edn By P McCullagh and J A Nelder as mentioned in this paper, 2nd edition, New York: Manning and Hall, 1989 xx + 512 pp £30
Book ChapterDOI
Rademacher and gaussian complexities: risk bounds and structural results
TL;DR: In this paper, the authors investigate the use of data-dependent estimates of the complexity of a function class, called Rademacher and Gaussian complexities, in a decision theoretic setting and prove general risk bounds in terms of these complexities.