scispace - formally typeset
C

Carlo Luschi

Publications -  8
Citations -  546

Carlo Luschi is an academic researcher. The author has contributed to research in topics: Artificial neural network & Curse of dimensionality. The author has an hindex of 3, co-authored 8 publications receiving 395 citations.

Papers
More filters
Posted Content

Revisiting Small Batch Training for Deep Neural Networks

Dominic Masters, +1 more
- 20 Apr 2018 - 
TL;DR: The collected experimental results show that increasing the mini-batch size progressively reduces the range of learning rates that provide stable convergence and acceptable test performance, which contrasts with recent work advocating the use ofmini-batch sizes in the thousands.
Posted Content

Improving Neural Network Training in Low Dimensional Random Bases

TL;DR: This work improves on recent random subspace approaches as follows and shows that keeping the random projection fixed throughout training is detrimental to optimization, and proposes re-drawing the random sub space at each step, which yields significantly better performance.
Journal Article

Parallel Training of Deep Networks with Local Updates

TL;DR: This paper investigates how to continue scaling compute efficiently beyond the point of diminishing returns for large batches through local parallelism, a framework which parallelizes training of individual layers in deep networks by replacing global back Propagation with truncated layer-wise backpropagation.
Posted Content

Proxy-Normalizing Activations to Match Batch Normalization while Removing Batch Dependence

TL;DR: Proxy Normalization as mentioned in this paper normalizes post-activation using a proxy distribution, which can be combined with layer normalization or group normalization and consistently matches or exceeds batch normalization's performance.
Posted Content

Making EfficientNet More Efficient: Exploring Batch-Independent Normalization, Group Convolutions and Reduced Resolution Training.

TL;DR: In this article, the authors focus on improving the practical efficiency of the EfficientNet models on a new class of accelerator, the Graphcore IPU, by extending this family of models in the following ways: (i) generalising depthwise convolutions to group convolutions; (ii) adding proxy-normalized activations to match batch normalization performance with batch-independent statistics; (iii) reducing compute by lowering the training resolution and inexpensively fine-tuning at higher resolution.