scispace - formally typeset
N

Nicolas Loizou

Researcher at University of Edinburgh

Publications -  46
Citations -  1587

Nicolas Loizou is an academic researcher from University of Edinburgh. The author has contributed to research in topics: Computer science & Stochastic gradient descent. The author has an hindex of 15, co-authored 40 publications receiving 986 citations. Previous affiliations of Nicolas Loizou include Université de Montréal.

Papers
More filters
Proceedings Article

Stochastic Gradient Push for Distributed Deep Learning

TL;DR: Stochastic Gradient Push is studied, it is proved that SGP converges to a stationary point of smooth, non-convex objectives at the same sub-linear rate as SGD, and that all nodes achieve consensus.
Proceedings Article

A Unified Theory of Decentralized SGD with Changing Topology and Local Updates

TL;DR: In this article, a unified convergence analysis of decentralized SGD methods is presented for smooth SGD problems and the convergence rates interpolate between heterogeneous (non-identically distributed data) and iid-data settings, recovering linear convergence rates in many special cases, for instance for over-parametrized models.
Proceedings Article

SGD: General Analysis and Improved Rates

TL;DR: This theorem describes the convergence of an infinite array of variants of SGD, each of which is associated with a specific probability law governing the data selection rule used to form mini-batches, and can determine the mini-batch size that optimizes the total complexity.
Posted Content

SGD: General Analysis and Improved Rates

TL;DR: In this paper, the convergence of SGD under the arbitrary sampling paradigm is analyzed, and it is shown that the optimal mini-batch size is a function of the expected smoothness.
Journal ArticleDOI

Momentum and stochastic momentum for stochastic gradient, Newton, proximal point and subspace descent methods

TL;DR: A novel concept, which is called stochastic momentum, aimed at decreasing the cost of performing the momentum step is proposed, and it is proved that in some sparse data regimes and for sufficiently small momentum parameters, these methods enjoy better overall complexity than methods with deterministic momentum.