Search or ask a question

Showing papers by "Thomas Unterthiner published in 2022"

PDF

Open Access

Proceedings Article•

GradMax: Growing Neural Networks using Gradient Information

[...]

Utku Evci, M. Vladymyrov, Thomas Unterthiner, Bart van Merriënboer, Fabian Pedregosa - Show less +1 more

13 Jan 2022

TL;DR: This work presents a method that adds new neurons during training without impacting what is already learned, while improving the training dynamics, and calls it Gradient Maximizing Growth (GradMax).

...read moreread less

Abstract: The architecture and the parameters of neural networks are often optimized independently, which requires costly retraining of the parameters whenever the architecture is modified. In this work we instead focus on growing the architecture without requiring costly retraining. We present a method that adds new neurons during training without impacting what is already learned, while improving the training dynamics. We achieve the latter by maximizing the gradients of the new weights and find the optimal initialization efficiently by means of the singular value decomposition (SVD). We call this technique Gradient Maximizing Growth (GradMax) and demonstrate its effectiveness in variety of vision tasks and architectures.

...read moreread less

22 citations

Journal Article•DOI•

Accurate Machine Learned Quantum-Mechanical Force Fields for Biomolecular Simulations

[...]

Oliver T. Unke, M. Stohr, Stefan Ganscha, Thomas Unterthiner, Hartmut Maennel, Sergii Kashubin, Daniel Ahlin, Michael Gastegger, Leonardo Medrano Sandonas, Alexandre Tkatchenko, Klaus-Robert Mueller - Show less +7 more

17 May 2022-arXiv.org

TL;DR: In this article , a general approach to constructing accurate MLFFs for large-scale molecular simulations (GEMS) by training on "bottom-up" and "top-down" molecular fragments of varying size was proposed.

...read moreread less

Abstract: Molecular dynamics (MD) simulations allow atomistic insights into chemical and biological processes. Accurate MD simulations require computationally demanding quantum-mechanical calculations, being practically limited to short timescales and few atoms. For larger systems, efﬁcient, but much less reliable empirical force ﬁelds are used. Recently, machine learned force ﬁelds (MLFFs) emerged as an alternative means to execute MD simulations, offering similar accuracy as ab initio methods at orders-of-magnitude speedup. Until now, MLFFs mainly capture short-range interactions in small molecules or periodic materials, due to the increased complexity of constructing models and obtaining reliable reference data for large molecules, where long-ranged many-body effects become important. This work proposes a general approach to constructing accurate MLFFs for large-scale molecular simulations (GEMS) by training on “bottom-up” and “top-down” molecular fragments of varying size, from which the relevant physicochemical interactions can be learned. GEMS is applied to study the dynamics of alanine-based peptides and the 46-residue protein crambin in aqueous solution, allowing nanosecond-scale MD simulations of > 25k atoms at essentially ab initio quality. Our ﬁndings suggest that structural motifs in peptides and proteins are more ﬂexible than previously thought, indicating that simulations at ab initio accuracy might be necessary to understand dynamic biomolecular processes such as protein (mis)folding, drug–protein binding, or allosteric regulation.

...read moreread less

9 citations

Funnel Vision Transformer for image classification

[...]

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit - Show less +7 more

TL;DR: This project examines a similar idea on the ImageNet dataset that gradually shrink the image patch length dimension of Vision Transformer as the layers go deeper, in order to save the computational resources (Funnel-ViT).

...read moreread less

Abstract: Vision Transformer(ViT) [6] adopts the Transformer architecture on the image classification tasks and outperforms the state-of-the-art convolutional networks with substan-tially fewer computational resources. However, it’s still expensive to train Transformer either on a very large pretraining dataset or with a large model size. So model efficiency is still an important area to explore. Spatial compression is a common technique widely used in convolutional networks for image classification tasks, which indicates the spatial information redundancy for classification tasks. In addition, inspired by the success of Funnel-Transformer [4] in NLP, this project examines a similar idea on the ImageNet dataset that gradually shrink the image patch length dimension of Vision Transformer as the layers go deeper, in order to save the computational resources (Funnel-ViT). The results show that with with a small pretraining accuracy compromise ( < 1% ), we can save 40% memory, get 37.5% speedup with three funnel blocks, and get 0.6% fine-tuning accuracy improvement. The saved resources can even be re-invested to a wider and deeper Funnel-ViT model to further reduce the pre-training accuracy loss to 0.1%.

...read moreread less