V
Valerii Likhosherstov
Researcher at University of Cambridge
Publications - 30
Citations - 884
Valerii Likhosherstov is an academic researcher from University of Cambridge. The author has contributed to research in topics: Computer science & Ising model. The author has an hindex of 5, co-authored 22 publications receiving 309 citations. Previous affiliations of Valerii Likhosherstov include Google.
Papers
More filters
Posted Content
Rethinking Attention with Performers
Krzysztof Choromanski,Valerii Likhosherstov,David Dohan,Xingyou Song,Andreea Gane,Tamas Sarlos,Peter Hawkins,Jared Davis,Afroz Mohiuddin,Lukasz Kaiser,David Belanger,Lucy J. Colwell,Adrian Weller +12 more
TL;DR: Performers, Transformer architectures which can estimate regular (softmax) full-rank-attention Transformers with provable accuracy, but using only linear space and time complexity, without relying on any priors such as sparsity or low-rankness are introduced.
Proceedings Article
Rethinking Attention with Performers
Krzysztof Choromanski,Valerii Likhosherstov,David Dohan,Xingyou Song,Andreea Gane,Tamas Sarlos,Peter Hawkins,Jared Davis,Afroz Mohiuddin,Lukasz Kaiser,David Belanger,Lucy J. Colwell,Adrian Weller +12 more
TL;DR: Performers as mentioned in this paper uses Fast Attention Via positive Orthogonal Random features (FAVOR+) to approximate softmax attention-kernels, which can estimate regular (softmax) full-rank attention Transformers with provable accuracy, but using only linear (as opposed to quadratic) space and time complexity.
Posted Content
Masked Language Modeling for Proteins via Linearly Scalable Long-Context Transformers
Krzysztof Choromanski,Valerii Likhosherstov,David Dohan,Xingyou Song,Jared Davis,Tamas Sarlos,David Belanger,Lucy J. Colwell,Adrian Weller +8 more
TL;DR: A new Transformer architecture, Performer, based on Fast Attention Via Orthogonal Random features (FAVOR), which demonstrates its effectiveness on the challenging task of protein sequence modeling and provides strong theoretical guarantees: unbiased estimation of the attention matrix and uniform convergence.
Posted Content
Sub-Linear Memory: How to Make Performers SLiM.
TL;DR: A thorough analysis of recent Transformer mechanisms with linear self-attention, Performers, results in a remarkable computational flexibility: forward and backward propagation can be performed with no approximations using sublinear memory as a function of $L$ (in addition to negligible storage for the input sequence), at a cost of greater time complexity in the parallel setting.
Posted Content
An Ode to an ODE
Krzysztof Choromanski,Jared Davis,Valerii Likhosherstov,Xingyou Song,Jean-Jacques E. Slotine,Jake Varley,Honglak Lee,Adrian Weller,Vikas Sindhwani +8 more
TL;DR: A new paradigm for Neural ODE algorithms, called ODEtoODE, where time-dependent parameters of the main flow evolve according to a matrix flow on the orthogonal group O(d), which provides stability and effectiveness of training and provably solves the gradient vanishing-explosion problem.