scispace - formally typeset
Search or ask a question

Showing papers by "Thomas Unterthiner published in 2022"


Proceedings Article
13 Jan 2022
TL;DR: This work presents a method that adds new neurons during training without impacting what is already learned, while improving the training dynamics, and calls it Gradient Maximizing Growth (GradMax).
Abstract: The architecture and the parameters of neural networks are often optimized independently, which requires costly retraining of the parameters whenever the architecture is modified. In this work we instead focus on growing the architecture without requiring costly retraining. We present a method that adds new neurons during training without impacting what is already learned, while improving the training dynamics. We achieve the latter by maximizing the gradients of the new weights and find the optimal initialization efficiently by means of the singular value decomposition (SVD). We call this technique Gradient Maximizing Growth (GradMax) and demonstrate its effectiveness in variety of vision tasks and architectures.

22 citations


Journal ArticleDOI
TL;DR: In this article , a general approach to constructing accurate MLFFs for large-scale molecular simulations (GEMS) by training on "bottom-up" and "top-down" molecular fragments of varying size was proposed.
Abstract: Molecular dynamics (MD) simulations allow atomistic insights into chemical and biological processes. Accurate MD simulations require computationally demanding quantum-mechanical calculations, being practically limited to short timescales and few atoms. For larger systems, efficient, but much less reliable empirical force fields are used. Recently, machine learned force fields (MLFFs) emerged as an alternative means to execute MD simulations, offering similar accuracy as ab initio methods at orders-of-magnitude speedup. Until now, MLFFs mainly capture short-range interactions in small molecules or periodic materials, due to the increased complexity of constructing models and obtaining reliable reference data for large molecules, where long-ranged many-body effects become important. This work proposes a general approach to constructing accurate MLFFs for large-scale molecular simulations (GEMS) by training on “bottom-up” and “top-down” molecular fragments of varying size, from which the relevant physicochemical interactions can be learned. GEMS is applied to study the dynamics of alanine-based peptides and the 46-residue protein crambin in aqueous solution, allowing nanosecond-scale MD simulations of > 25k atoms at essentially ab initio quality. Our findings suggest that structural motifs in peptides and proteins are more flexible than previously thought, indicating that simulations at ab initio accuracy might be necessary to understand dynamic biomolecular processes such as protein (mis)folding, drug–protein binding, or allosteric regulation.

9 citations


TL;DR: This project examines a similar idea on the ImageNet dataset that gradually shrink the image patch length dimension of Vision Transformer as the layers go deeper, in order to save the computational resources (Funnel-ViT).
Abstract: Vision Transformer(ViT) [6] adopts the Transformer architecture on the image classification tasks and outperforms the state-of-the-art convolutional networks with substan-tially fewer computational resources. However, it’s still expensive to train Transformer either on a very large pretraining dataset or with a large model size. So model efficiency is still an important area to explore. Spatial compression is a common technique widely used in convolutional networks for image classification tasks, which indicates the spatial information redundancy for classification tasks. In addition, inspired by the success of Funnel-Transformer [4] in NLP, this project examines a similar idea on the ImageNet dataset that gradually shrink the image patch length dimension of Vision Transformer as the layers go deeper, in order to save the computational resources (Funnel-ViT). The results show that with with a small pretraining accuracy compromise ( < 1% ), we can save 40% memory, get 37.5% speedup with three funnel blocks, and get 0.6% fine-tuning accuracy improvement. The saved resources can even be re-invested to a wider and deeper Funnel-ViT model to further reduce the pre-training accuracy loss to 0.1%.