scispace - formally typeset
V

Valerii Likhosherstov

Researcher at University of Cambridge

Publications -  30
Citations -  884

Valerii Likhosherstov is an academic researcher from University of Cambridge. The author has contributed to research in topics: Computer science & Ising model. The author has an hindex of 5, co-authored 22 publications receiving 309 citations. Previous affiliations of Valerii Likhosherstov include Google.

Papers
More filters
Posted Content

Rethinking Attention with Performers

TL;DR: Performers, Transformer architectures which can estimate regular (softmax) full-rank-attention Transformers with provable accuracy, but using only linear space and time complexity, without relying on any priors such as sparsity or low-rankness are introduced.
Proceedings Article

Rethinking Attention with Performers

TL;DR: Performers as mentioned in this paper uses Fast Attention Via positive Orthogonal Random features (FAVOR+) to approximate softmax attention-kernels, which can estimate regular (softmax) full-rank attention Transformers with provable accuracy, but using only linear (as opposed to quadratic) space and time complexity.
Posted Content

Masked Language Modeling for Proteins via Linearly Scalable Long-Context Transformers

TL;DR: A new Transformer architecture, Performer, based on Fast Attention Via Orthogonal Random features (FAVOR), which demonstrates its effectiveness on the challenging task of protein sequence modeling and provides strong theoretical guarantees: unbiased estimation of the attention matrix and uniform convergence.
Posted Content

Sub-Linear Memory: How to Make Performers SLiM.

TL;DR: A thorough analysis of recent Transformer mechanisms with linear self-attention, Performers, results in a remarkable computational flexibility: forward and backward propagation can be performed with no approximations using sublinear memory as a function of $L$ (in addition to negligible storage for the input sequence), at a cost of greater time complexity in the parallel setting.
Posted Content

An Ode to an ODE

TL;DR: A new paradigm for Neural ODE algorithms, called ODEtoODE, where time-dependent parameters of the main flow evolve according to a matrix flow on the orthogonal group O(d), which provides stability and effectiveness of training and provably solves the gradient vanishing-explosion problem.