C
Carlo Luschi
Publications - 8
Citations - 546
Carlo Luschi is an academic researcher. The author has contributed to research in topics: Artificial neural network & Curse of dimensionality. The author has an hindex of 3, co-authored 8 publications receiving 395 citations.
Papers
More filters
Posted Content
Revisiting Small Batch Training for Deep Neural Networks
Dominic Masters,Carlo Luschi +1 more
TL;DR: The collected experimental results show that increasing the mini-batch size progressively reduces the range of learning rates that provide stable convergence and acceptable test performance, which contrasts with recent work advocating the use ofmini-batch sizes in the thousands.
Posted Content
Improving Neural Network Training in Low Dimensional Random Bases
TL;DR: This work improves on recent random subspace approaches as follows and shows that keeping the random projection fixed throughout training is detrimental to optimization, and proposes re-drawing the random sub space at each step, which yields significantly better performance.
Journal Article
Parallel Training of Deep Networks with Local Updates
Michael Laskin,Luke Metz,Seth Nabarro,Mark Saroufim,Badreddine Noune,Carlo Luschi,Jascha Sohl-Dickstein,Pieter Abbeel +7 more
TL;DR: This paper investigates how to continue scaling compute efficiently beyond the point of diminishing returns for large batches through local parallelism, a framework which parallelizes training of individual layers in deep networks by replacing global back Propagation with truncated layer-wise backpropagation.
Posted Content
Proxy-Normalizing Activations to Match Batch Normalization while Removing Batch Dependence
TL;DR: Proxy Normalization as mentioned in this paper normalizes post-activation using a proxy distribution, which can be combined with layer normalization or group normalization and consistently matches or exceeds batch normalization's performance.
Posted Content
Making EfficientNet More Efficient: Exploring Batch-Independent Normalization, Group Convolutions and Reduced Resolution Training.
TL;DR: In this article, the authors focus on improving the practical efficiency of the EfficientNet models on a new class of accelerator, the Graphcore IPU, by extending this family of models in the following ways: (i) generalising depthwise convolutions to group convolutions; (ii) adding proxy-normalized activations to match batch normalization performance with batch-independent statistics; (iii) reducing compute by lowering the training resolution and inexpensively fine-tuning at higher resolution.